Document (#13136)

Author
Salton, G.
Allan, J.
Singhal, A.
Title
Automatic text decomposition and structuring
Source
Information processing and management. 32(1996) no.2, S.127-138
Year
1996
Abstract
Sophisticated text similarity measurements are used to determine relationships between natural language text and text excerpts. The resulting linked hypertext maps can be decomposed into text segments and text theme, and these decompositions are usable to identify different text types and text structures, leading to improved text access and utilization. Gives examples of text decomposition for expository and non expository texts
Theme
Automatisches Indexieren

Similar documents (author)

  1. Salton, G.; Allan, J.; Buckley, C.; Singhal, A.: Automatic analysis, theme generation, and summarization of machine readable texts (1994) 4.73
    4.732438 = sum of:
      4.732438 = sum of:
        1.0923858 = weight(author_txt:salton in 2949) [ClassicSimilarity], result of:
          1.0923858 = score(doc=2949,freq=1.0), product of:
            0.44876555 = queryWeight, product of:
              7.7894444 = idf(docFreq=49, maxDocs=44421)
              0.05761201 = queryNorm
            2.4342015 = fieldWeight in 2949, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              7.7894444 = idf(docFreq=49, maxDocs=44421)
              0.3125 = fieldNorm(doc=2949)
        1.5810528 = weight(author_txt:allan in 2949) [ClassicSimilarity], result of:
          1.5810528 = score(doc=2949,freq=1.0), product of:
            0.57420427 = queryWeight, product of:
              1.1311585 = boost
              8.811096 = idf(docFreq=17, maxDocs=44421)
              0.05761201 = queryNorm
            2.7534676 = fieldWeight in 2949, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.811096 = idf(docFreq=17, maxDocs=44421)
              0.3125 = fieldNorm(doc=2949)
        2.0589993 = weight(author_txt:singhal in 2949) [ClassicSimilarity], result of:
          2.0589993 = score(doc=2949,freq=1.0), product of:
            0.684762 = queryWeight, product of:
              1.2352648 = boost
              9.622026 = idf(docFreq=7, maxDocs=44421)
              0.05761201 = queryNorm
            3.0068831 = fieldWeight in 2949, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              9.622026 = idf(docFreq=7, maxDocs=44421)
              0.3125 = fieldNorm(doc=2949)
    
  2. Salton, G.; Allan, J.: Selective text utilization and text traversal (1995) 2.85
    2.851668 = sum of:
      2.851668 = product of:
        4.2775016 = sum of:
          1.7478172 = weight(author_txt:salton in 6873) [ClassicSimilarity], result of:
            1.7478172 = score(doc=6873,freq=1.0), product of:
              0.44876555 = queryWeight, product of:
                7.7894444 = idf(docFreq=49, maxDocs=44421)
                0.05761201 = queryNorm
              3.8947222 = fieldWeight in 6873, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7894444 = idf(docFreq=49, maxDocs=44421)
                0.5 = fieldNorm(doc=6873)
          2.5296845 = weight(author_txt:allan in 6873) [ClassicSimilarity], result of:
            2.5296845 = score(doc=6873,freq=1.0), product of:
              0.57420427 = queryWeight, product of:
                1.1311585 = boost
                8.811096 = idf(docFreq=17, maxDocs=44421)
                0.05761201 = queryNorm
              4.405548 = fieldWeight in 6873, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.811096 = idf(docFreq=17, maxDocs=44421)
                0.5 = fieldNorm(doc=6873)
        0.6666667 = coord(2/3)
    
  3. Salton, G.; Buckley, C.; Allan, J.: Automatic structuring of text files (1992) 2.14
    2.138751 = sum of:
      2.138751 = product of:
        3.2081263 = sum of:
          1.3108629 = weight(author_txt:salton in 6506) [ClassicSimilarity], result of:
            1.3108629 = score(doc=6506,freq=1.0), product of:
              0.44876555 = queryWeight, product of:
                7.7894444 = idf(docFreq=49, maxDocs=44421)
                0.05761201 = queryNorm
              2.9210417 = fieldWeight in 6506, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7894444 = idf(docFreq=49, maxDocs=44421)
                0.375 = fieldNorm(doc=6506)
          1.8972634 = weight(author_txt:allan in 6506) [ClassicSimilarity], result of:
            1.8972634 = score(doc=6506,freq=1.0), product of:
              0.57420427 = queryWeight, product of:
                1.1311585 = boost
                8.811096 = idf(docFreq=17, maxDocs=44421)
                0.05761201 = queryNorm
              3.304161 = fieldWeight in 6506, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.811096 = idf(docFreq=17, maxDocs=44421)
                0.375 = fieldNorm(doc=6506)
        0.6666667 = coord(2/3)
    
  4. Buckley, C.; Allan, J.; Salton, G.: Automatic routing and retrieval using Smart : TREC-2 (1995) 2.14
    2.138751 = sum of:
      2.138751 = product of:
        3.2081263 = sum of:
          1.3108629 = weight(author_txt:salton in 6699) [ClassicSimilarity], result of:
            1.3108629 = score(doc=6699,freq=1.0), product of:
              0.44876555 = queryWeight, product of:
                7.7894444 = idf(docFreq=49, maxDocs=44421)
                0.05761201 = queryNorm
              2.9210417 = fieldWeight in 6699, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7894444 = idf(docFreq=49, maxDocs=44421)
                0.375 = fieldNorm(doc=6699)
          1.8972634 = weight(author_txt:allan in 6699) [ClassicSimilarity], result of:
            1.8972634 = score(doc=6699,freq=1.0), product of:
              0.57420427 = queryWeight, product of:
                1.1311585 = boost
                8.811096 = idf(docFreq=17, maxDocs=44421)
                0.05761201 = queryNorm
              3.304161 = fieldWeight in 6699, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.811096 = idf(docFreq=17, maxDocs=44421)
                0.375 = fieldNorm(doc=6699)
        0.6666667 = coord(2/3)
    
  5. Salton, G.; Allen, J.; Buckley, C.; Singhal, A.: Automatic analysis, theme generation, and summarization of machine-readable data (1994) 2.10
    2.1009235 = sum of:
      2.1009235 = product of:
        3.151385 = sum of:
          1.0923858 = weight(author_txt:salton in 1236) [ClassicSimilarity], result of:
            1.0923858 = score(doc=1236,freq=1.0), product of:
              0.44876555 = queryWeight, product of:
                7.7894444 = idf(docFreq=49, maxDocs=44421)
                0.05761201 = queryNorm
              2.4342015 = fieldWeight in 1236, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7894444 = idf(docFreq=49, maxDocs=44421)
                0.3125 = fieldNorm(doc=1236)
          2.0589993 = weight(author_txt:singhal in 1236) [ClassicSimilarity], result of:
            2.0589993 = score(doc=1236,freq=1.0), product of:
              0.684762 = queryWeight, product of:
                1.2352648 = boost
                9.622026 = idf(docFreq=7, maxDocs=44421)
                0.05761201 = queryNorm
              3.0068831 = fieldWeight in 1236, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.622026 = idf(docFreq=7, maxDocs=44421)
                0.3125 = fieldNorm(doc=1236)
        0.6666667 = coord(2/3)
    

Similar documents (content)

  1. Salton, G.; Buckley, C.: Approaches to global text analysis (1990) 0.19
    0.18937205 = sum of:
      0.18937205 = product of:
        0.94686025 = sum of:
          0.050950535 = weight(abstract_txt:linked in 4900) [ClassicSimilarity], result of:
            0.050950535 = score(doc=4900,freq=1.0), product of:
              0.08392088 = queryWeight, product of:
                1.0842713 = boost
                5.5508647 = idf(docFreq=468, maxDocs=44421)
                0.013943488 = queryNorm
              0.6071258 = fieldWeight in 4900, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5508647 = idf(docFreq=468, maxDocs=44421)
                0.109375 = fieldNorm(doc=4900)
          0.053378783 = weight(abstract_txt:texts in 4900) [ClassicSimilarity], result of:
            0.053378783 = score(doc=4900,freq=1.0), product of:
              0.08656653 = queryWeight, product of:
                1.1012298 = boost
                5.6376824 = idf(docFreq=429, maxDocs=44421)
                0.013943488 = queryNorm
              0.6166215 = fieldWeight in 4900, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6376824 = idf(docFreq=429, maxDocs=44421)
                0.109375 = fieldNorm(doc=4900)
          0.061161637 = weight(abstract_txt:leading in 4900) [ClassicSimilarity], result of:
            0.061161637 = score(doc=4900,freq=1.0), product of:
              0.09478878 = queryWeight, product of:
                1.1523421 = boost
                5.899349 = idf(docFreq=330, maxDocs=44421)
                0.013943488 = queryNorm
              0.6452413 = fieldWeight in 4900, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.899349 = idf(docFreq=330, maxDocs=44421)
                0.109375 = fieldNorm(doc=4900)
          0.2998973 = weight(abstract_txt:excerpts in 4900) [ClassicSimilarity], result of:
            0.2998973 = score(doc=4900,freq=2.0), product of:
              0.21714139 = queryWeight, product of:
                1.7441115 = boost
                8.928879 = idf(docFreq=15, maxDocs=44421)
                0.013943488 = queryNorm
              1.3811154 = fieldWeight in 4900, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.928879 = idf(docFreq=15, maxDocs=44421)
                0.109375 = fieldNorm(doc=4900)
          0.481472 = weight(abstract_txt:text in 4900) [ClassicSimilarity], result of:
            0.481472 = score(doc=4900,freq=6.0), product of:
              0.4447348 = queryWeight, product of:
                7.8932076 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.013943488 = queryNorm
              1.0826046 = fieldWeight in 4900, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.109375 = fieldNorm(doc=4900)
        0.2 = coord(5/25)
    
  2. Rittschof, K.A.; Kulhavy, R.W.; Stock, W.A.; Verdi, M.P.; Doran, J.M.: Thematic maps improve memory for facts and inferences : a test of the stimulus order hypothesis (1994) 0.15
    0.14775978 = sum of:
      0.14775978 = product of:
        0.9234987 = sum of:
          0.043553196 = weight(abstract_txt:maps in 2157) [ClassicSimilarity], result of:
            0.043553196 = score(doc=2157,freq=1.0), product of:
              0.094595306 = queryWeight, product of:
                1.1511655 = boost
                5.8933253 = idf(docFreq=332, maxDocs=44421)
                0.013943488 = queryNorm
              0.46041605 = fieldWeight in 2157, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8933253 = idf(docFreq=332, maxDocs=44421)
                0.078125 = fieldNorm(doc=2157)
          0.15185742 = weight(abstract_txt:theme in 2157) [ClassicSimilarity], result of:
            0.15185742 = score(doc=2157,freq=5.0), product of:
              0.12720092 = queryWeight, product of:
                1.334898 = boost
                6.8339334 = idf(docFreq=129, maxDocs=44421)
                0.013943488 = queryNorm
              1.1938391 = fieldWeight in 2157, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.8339334 = idf(docFreq=129, maxDocs=44421)
                0.078125 = fieldNorm(doc=2157)
          0.41414398 = weight(abstract_txt:expository in 2157) [ClassicSimilarity], result of:
            0.41414398 = score(doc=2157,freq=1.0), product of:
              0.53493434 = queryWeight, product of:
                3.8714013 = boost
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.013943488 = queryNorm
              0.7741959 = fieldWeight in 2157, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.078125 = fieldNorm(doc=2157)
          0.3139441 = weight(abstract_txt:text in 2157) [ClassicSimilarity], result of:
            0.3139441 = score(doc=2157,freq=5.0), product of:
              0.4447348 = queryWeight, product of:
                7.8932076 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.013943488 = queryNorm
              0.70591307 = fieldWeight in 2157, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.078125 = fieldNorm(doc=2157)
        0.16 = coord(4/25)
    
  3. Liu, S.: Decomposing DDC synthesized numbers (1997) 0.11
    0.10672144 = sum of:
      0.10672144 = product of:
        0.667009 = sum of:
          0.029844673 = weight(abstract_txt:automatic in 968) [ClassicSimilarity], result of:
            0.029844673 = score(doc=968,freq=1.0), product of:
              0.07352485 = queryWeight, product of:
                1.0148925 = boost
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.013943488 = queryNorm
              0.40591276 = fieldWeight in 968, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.078125 = fieldNorm(doc=968)
          0.21889108 = weight(abstract_txt:decomposed in 968) [ClassicSimilarity], result of:
            0.21889108 = score(doc=968,freq=2.0), product of:
              0.22029178 = queryWeight, product of:
                1.7567182 = boost
                8.993418 = idf(docFreq=14, maxDocs=44421)
                0.013943488 = queryNorm
              0.9936416 = fieldWeight in 968, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.993418 = idf(docFreq=14, maxDocs=44421)
                0.078125 = fieldNorm(doc=968)
          0.20707199 = weight(abstract_txt:decompositions in 968) [ClassicSimilarity], result of:
            0.20707199 = score(doc=968,freq=1.0), product of:
              0.26746717 = queryWeight, product of:
                1.9357007 = boost
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.013943488 = queryNorm
              0.7741959 = fieldWeight in 968, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.078125 = fieldNorm(doc=968)
          0.21120122 = weight(abstract_txt:decomposition in 968) [ClassicSimilarity], result of:
            0.21120122 = score(doc=968,freq=1.0), product of:
              0.3414527 = queryWeight, product of:
                3.0930235 = boost
                7.917278 = idf(docFreq=43, maxDocs=44421)
                0.013943488 = queryNorm
              0.6185373 = fieldWeight in 968, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.917278 = idf(docFreq=43, maxDocs=44421)
                0.078125 = fieldNorm(doc=968)
        0.16 = coord(4/25)
    
  4. Rafols, I.; Leydesdorff, L.: Content-based and algorithmic classifications of journals : perspectives on the dynamics of scientific communication and indexer effects (2009) 0.10
    0.10322493 = sum of:
      0.10322493 = product of:
        0.51612467 = sum of:
          0.022840034 = weight(abstract_txt:structures in 82) [ClassicSimilarity], result of:
            0.022840034 = score(doc=82,freq=1.0), product of:
              0.07138288 = queryWeight, product of:
                5.1194425 = idf(docFreq=721, maxDocs=44421)
                0.013943488 = queryNorm
              0.31996515 = fieldWeight in 82, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1194425 = idf(docFreq=721, maxDocs=44421)
                0.0625 = fieldNorm(doc=82)
          0.03484256 = weight(abstract_txt:maps in 82) [ClassicSimilarity], result of:
            0.03484256 = score(doc=82,freq=1.0), product of:
              0.094595306 = queryWeight, product of:
                1.1511655 = boost
                5.8933253 = idf(docFreq=332, maxDocs=44421)
                0.013943488 = queryNorm
              0.36833283 = fieldWeight in 82, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8933253 = idf(docFreq=332, maxDocs=44421)
                0.0625 = fieldNorm(doc=82)
          0.1238235 = weight(abstract_txt:decomposed in 82) [ClassicSimilarity], result of:
            0.1238235 = score(doc=82,freq=1.0), product of:
              0.22029178 = queryWeight, product of:
                1.7567182 = boost
                8.993418 = idf(docFreq=14, maxDocs=44421)
                0.013943488 = queryNorm
              0.5620886 = fieldWeight in 82, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.993418 = idf(docFreq=14, maxDocs=44421)
                0.0625 = fieldNorm(doc=82)
          0.1656576 = weight(abstract_txt:decompositions in 82) [ClassicSimilarity], result of:
            0.1656576 = score(doc=82,freq=1.0), product of:
              0.26746717 = queryWeight, product of:
                1.9357007 = boost
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.013943488 = queryNorm
              0.61935675 = fieldWeight in 82, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.0625 = fieldNorm(doc=82)
          0.16896099 = weight(abstract_txt:decomposition in 82) [ClassicSimilarity], result of:
            0.16896099 = score(doc=82,freq=1.0), product of:
              0.3414527 = queryWeight, product of:
                3.0930235 = boost
                7.917278 = idf(docFreq=43, maxDocs=44421)
                0.013943488 = queryNorm
              0.49482986 = fieldWeight in 82, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.917278 = idf(docFreq=43, maxDocs=44421)
                0.0625 = fieldNorm(doc=82)
        0.2 = coord(5/25)
    
  5. Salton, G.; Buckley, C.; Allan, J.: Automatic structuring of text files (1992) 0.08
    0.08018318 = sum of:
      0.08018318 = product of:
        0.5011449 = sum of:
          0.03581361 = weight(abstract_txt:automatic in 6506) [ClassicSimilarity], result of:
            0.03581361 = score(doc=6506,freq=1.0), product of:
              0.07352485 = queryWeight, product of:
                1.0148925 = boost
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.013943488 = queryNorm
              0.48709533 = fieldWeight in 6506, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.09375 = fieldNorm(doc=6506)
          0.043671884 = weight(abstract_txt:linked in 6506) [ClassicSimilarity], result of:
            0.043671884 = score(doc=6506,freq=1.0), product of:
              0.08392088 = queryWeight, product of:
                1.0842713 = boost
                5.5508647 = idf(docFreq=468, maxDocs=44421)
                0.013943488 = queryNorm
              0.52039355 = fieldWeight in 6506, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5508647 = idf(docFreq=468, maxDocs=44421)
                0.09375 = fieldNorm(doc=6506)
          0.08469926 = weight(abstract_txt:structuring in 6506) [ClassicSimilarity], result of:
            0.08469926 = score(doc=6506,freq=1.0), product of:
              0.13051341 = queryWeight, product of:
                1.3521676 = boost
                6.922344 = idf(docFreq=118, maxDocs=44421)
                0.013943488 = queryNorm
              0.64896977 = fieldWeight in 6506, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.922344 = idf(docFreq=118, maxDocs=44421)
                0.09375 = fieldNorm(doc=6506)
          0.33696017 = weight(abstract_txt:text in 6506) [ClassicSimilarity], result of:
            0.33696017 = score(doc=6506,freq=4.0), product of:
              0.4447348 = queryWeight, product of:
                7.8932076 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.013943488 = queryNorm
              0.7576654 = fieldWeight in 6506, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.09375 = fieldNorm(doc=6506)
        0.16 = coord(4/25)