Document (#18146)

Author
Salton, G.
Title
Automatic text structuring and summarization
Source
Information processing and management. 33(1997) no.2, S.193-207
Year
1997
Abstract
Applies the ideas from the automatic link generation research to automatic text summarisation. Using techniques for inter-document link generation, generates intra-document links between passages of a document. Based on the intra-document linkage pattern of a text, characterises the structure of the text. Applies the knowledge of text structure to do automatic text summarisation by passage extraction. Evaluates a set of 50 summaries generated using these techniques by comparing the to paragraph extracts constructed by humans. The automatic summarisation methods perform well, especially in view of the fact that the summaries generates by 2 humans for the same article are surprisingly dissimilar
Footnote
Contribution to a special issue on methods and tools for the automatic construction of hypertext
Theme
Automatisches Abstracting
Hypertext

Similar documents (author)

  1. Salton, G.: Another look at automatic text-retrieval systems (1986) 4.87
    4.8684025 = sum of:
      4.8684025 = weight(author_txt:salton in 1355) [ClassicSimilarity], result of:
        4.8684025 = score(doc=1355,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            7.7894444 = idf(docFreq=49, maxDocs=44421)
            0.12837885 = queryNorm
          4.868403 = fieldWeight in 1355, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            7.7894444 = idf(docFreq=49, maxDocs=44421)
            0.625 = fieldNorm(doc=1355)
    
  2. Salton, G.: ¬A new comparison between conventional indexing (MEDLARS) and automatic text processing (SMART) (1972) 4.87
    4.8684025 = sum of:
      4.8684025 = weight(author_txt:salton in 2324) [ClassicSimilarity], result of:
        4.8684025 = score(doc=2324,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            7.7894444 = idf(docFreq=49, maxDocs=44421)
            0.12837885 = queryNorm
          4.868403 = fieldWeight in 2324, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            7.7894444 = idf(docFreq=49, maxDocs=44421)
            0.625 = fieldNorm(doc=2324)
    
  3. Salton, G.: Future prospects for text-based information retrieval (1990) 4.87
    4.8684025 = sum of:
      4.8684025 = weight(author_txt:salton in 2326) [ClassicSimilarity], result of:
        4.8684025 = score(doc=2326,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            7.7894444 = idf(docFreq=49, maxDocs=44421)
            0.12837885 = queryNorm
          4.868403 = fieldWeight in 2326, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            7.7894444 = idf(docFreq=49, maxDocs=44421)
            0.625 = fieldNorm(doc=2326)
    
  4. Salton, G.: Fast document classification in automatic information retrieval (1978) 4.87
    4.8684025 = sum of:
      4.8684025 = weight(author_txt:salton in 2330) [ClassicSimilarity], result of:
        4.8684025 = score(doc=2330,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            7.7894444 = idf(docFreq=49, maxDocs=44421)
            0.12837885 = queryNorm
          4.868403 = fieldWeight in 2330, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            7.7894444 = idf(docFreq=49, maxDocs=44421)
            0.625 = fieldNorm(doc=2330)
    
  5. Salton, G.: Expert systems and information retrieval (1987) 4.87
    4.8684025 = sum of:
      4.8684025 = weight(author_txt:salton in 2836) [ClassicSimilarity], result of:
        4.8684025 = score(doc=2836,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            7.7894444 = idf(docFreq=49, maxDocs=44421)
            0.12837885 = queryNorm
          4.868403 = fieldWeight in 2836, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            7.7894444 = idf(docFreq=49, maxDocs=44421)
            0.625 = fieldNorm(doc=2836)
    

Similar documents (content)

  1. Szlávik, Z.; Tombros, A.; Lalmas, M.: Summarisation of the logical structure of XML documents (2012) 0.21
    0.20745628 = sum of:
      0.20745628 = product of:
        1.0372814 = sum of:
          0.010989721 = weight(abstract_txt:using in 3731) [ClassicSimilarity], result of:
            0.010989721 = score(doc=3731,freq=1.0), product of:
              0.05086552 = queryWeight, product of:
                1.0270494 = boost
                3.4568708 = idf(docFreq=3806, maxDocs=44421)
                0.014326793 = queryNorm
              0.21605442 = fieldWeight in 3731, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4568708 = idf(docFreq=3806, maxDocs=44421)
                0.0625 = fieldNorm(doc=3731)
          0.058258634 = weight(abstract_txt:structure in 3731) [ClassicSimilarity], result of:
            0.058258634 = score(doc=3731,freq=7.0), product of:
              0.08084253 = queryWeight, product of:
                1.2947906 = boost
                4.3580413 = idf(docFreq=1545, maxDocs=44421)
                0.014326793 = queryNorm
              0.72064334 = fieldWeight in 3731, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                4.3580413 = idf(docFreq=1545, maxDocs=44421)
                0.0625 = fieldNorm(doc=3731)
          0.16063897 = weight(abstract_txt:summaries in 3731) [ClassicSimilarity], result of:
            0.16063897 = score(doc=3731,freq=3.0), product of:
              0.21084304 = queryWeight, product of:
                2.091025 = boost
                7.0380287 = idf(docFreq=105, maxDocs=44421)
                0.014326793 = queryNorm
              0.7618889 = fieldWeight in 3731, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.0380287 = idf(docFreq=105, maxDocs=44421)
                0.0625 = fieldNorm(doc=3731)
          0.042130712 = weight(abstract_txt:document in 3731) [ClassicSimilarity], result of:
            0.042130712 = score(doc=3731,freq=1.0), product of:
              0.15697901 = queryWeight, product of:
                2.5516164 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.014326793 = queryNorm
              0.26838437 = fieldWeight in 3731, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.0625 = fieldNorm(doc=3731)
          0.7652633 = weight(abstract_txt:summarisation in 3731) [ClassicSimilarity], result of:
            0.7652633 = score(doc=3731,freq=6.0), product of:
              0.54235834 = queryWeight, product of:
                4.1074133 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.014326793 = queryNorm
              1.410992 = fieldWeight in 3731, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.0625 = fieldNorm(doc=3731)
        0.2 = coord(5/25)
    
  2. Yang, C.C.; Wang, F.L.: Hierarchical summarization of large documents (2008) 0.19
    0.19071336 = sum of:
      0.19071336 = product of:
        0.5959793 = sum of:
          0.010989721 = weight(abstract_txt:using in 2719) [ClassicSimilarity], result of:
            0.010989721 = score(doc=2719,freq=1.0), product of:
              0.05086552 = queryWeight, product of:
                1.0270494 = boost
                3.4568708 = idf(docFreq=3806, maxDocs=44421)
                0.014326793 = queryNorm
              0.21605442 = fieldWeight in 2719, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4568708 = idf(docFreq=3806, maxDocs=44421)
                0.0625 = fieldNorm(doc=2719)
          0.12738901 = weight(abstract_txt:summarization in 2719) [ClassicSimilarity], result of:
            0.12738901 = score(doc=2719,freq=7.0), product of:
              0.10809635 = queryWeight, product of:
                1.0586932 = boost
                7.1267567 = idf(docFreq=96, maxDocs=44421)
                0.014326793 = queryNorm
              1.1784766 = fieldWeight in 2719, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                7.1267567 = idf(docFreq=96, maxDocs=44421)
                0.0625 = fieldNorm(doc=2719)
          0.044039387 = weight(abstract_txt:structure in 2719) [ClassicSimilarity], result of:
            0.044039387 = score(doc=2719,freq=4.0), product of:
              0.08084253 = queryWeight, product of:
                1.2947906 = boost
                4.3580413 = idf(docFreq=1545, maxDocs=44421)
                0.014326793 = queryNorm
              0.54475516 = fieldWeight in 2719, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.3580413 = idf(docFreq=1545, maxDocs=44421)
                0.0625 = fieldNorm(doc=2719)
          0.02476513 = weight(abstract_txt:techniques in 2719) [ClassicSimilarity], result of:
            0.02476513 = score(doc=2719,freq=1.0), product of:
              0.08742979 = queryWeight, product of:
                1.3465092 = boost
                4.5321174 = idf(docFreq=1298, maxDocs=44421)
                0.014326793 = queryNorm
              0.28325734 = fieldWeight in 2719, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5321174 = idf(docFreq=1298, maxDocs=44421)
                0.0625 = fieldNorm(doc=2719)
          0.09274496 = weight(abstract_txt:summaries in 2719) [ClassicSimilarity], result of:
            0.09274496 = score(doc=2719,freq=1.0), product of:
              0.21084304 = queryWeight, product of:
                2.091025 = boost
                7.0380287 = idf(docFreq=105, maxDocs=44421)
                0.014326793 = queryNorm
              0.4398768 = fieldWeight in 2719, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0380287 = idf(docFreq=105, maxDocs=44421)
                0.0625 = fieldNorm(doc=2719)
          0.11146739 = weight(abstract_txt:document in 2719) [ClassicSimilarity], result of:
            0.11146739 = score(doc=2719,freq=7.0), product of:
              0.15697901 = queryWeight, product of:
                2.5516164 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.014326793 = queryNorm
              0.7100783 = fieldWeight in 2719, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.0625 = fieldNorm(doc=2719)
          0.05266075 = weight(abstract_txt:text in 2719) [ClassicSimilarity], result of:
            0.05266075 = score(doc=2719,freq=1.0), product of:
              0.20851189 = queryWeight, product of:
                3.6016843 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.014326793 = queryNorm
              0.25255513 = fieldWeight in 2719, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0625 = fieldNorm(doc=2719)
          0.13192292 = weight(abstract_txt:automatic in 2719) [ClassicSimilarity], result of:
            0.13192292 = score(doc=2719,freq=2.0), product of:
              0.28726488 = queryWeight, product of:
                3.859143 = boost
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.014326793 = queryNorm
              0.45923787 = fieldWeight in 2719, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.0625 = fieldNorm(doc=2719)
        0.32 = coord(8/25)
    
  3. Sweeney, S.; Crestani, F.; Losada, D.E.: 'Show me more' : incremental length summarisation using novelty detection (2008) 0.18
    0.18490514 = sum of:
      0.18490514 = product of:
        0.9245257 = sum of:
          0.09274496 = weight(abstract_txt:summaries in 3054) [ClassicSimilarity], result of:
            0.09274496 = score(doc=3054,freq=1.0), product of:
              0.21084304 = queryWeight, product of:
                2.091025 = boost
                7.0380287 = idf(docFreq=105, maxDocs=44421)
                0.014326793 = queryNorm
              0.4398768 = fieldWeight in 3054, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0380287 = idf(docFreq=105, maxDocs=44421)
                0.0625 = fieldNorm(doc=3054)
          0.084261425 = weight(abstract_txt:document in 3054) [ClassicSimilarity], result of:
            0.084261425 = score(doc=3054,freq=4.0), product of:
              0.15697901 = queryWeight, product of:
                2.5516164 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.014326793 = queryNorm
              0.53676873 = fieldWeight in 3054, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.0625 = fieldNorm(doc=3054)
          0.074473545 = weight(abstract_txt:text in 3054) [ClassicSimilarity], result of:
            0.074473545 = score(doc=3054,freq=2.0), product of:
              0.20851189 = queryWeight, product of:
                3.6016843 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.014326793 = queryNorm
              0.3571669 = fieldWeight in 3054, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0625 = fieldNorm(doc=3054)
          0.13192292 = weight(abstract_txt:automatic in 3054) [ClassicSimilarity], result of:
            0.13192292 = score(doc=3054,freq=2.0), product of:
              0.28726488 = queryWeight, product of:
                3.859143 = boost
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.014326793 = queryNorm
              0.45923787 = fieldWeight in 3054, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.0625 = fieldNorm(doc=3054)
          0.54112285 = weight(abstract_txt:summarisation in 3054) [ClassicSimilarity], result of:
            0.54112285 = score(doc=3054,freq=3.0), product of:
              0.54235834 = queryWeight, product of:
                4.1074133 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.014326793 = queryNorm
              0.997722 = fieldWeight in 3054, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.0625 = fieldNorm(doc=3054)
        0.2 = coord(5/25)
    
  4. Lihui, C.; Lian, C.W.: Using Web structure and summarisation techniques for Web content mining (2005) 0.17
    0.17240506 = sum of:
      0.17240506 = product of:
        0.71835446 = sum of:
          0.010989721 = weight(abstract_txt:using in 2046) [ClassicSimilarity], result of:
            0.010989721 = score(doc=2046,freq=1.0), product of:
              0.05086552 = queryWeight, product of:
                1.0270494 = boost
                3.4568708 = idf(docFreq=3806, maxDocs=44421)
                0.014326793 = queryNorm
              0.21605442 = fieldWeight in 2046, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4568708 = idf(docFreq=3806, maxDocs=44421)
                0.0625 = fieldNorm(doc=2046)
          0.031140547 = weight(abstract_txt:structure in 2046) [ClassicSimilarity], result of:
            0.031140547 = score(doc=2046,freq=2.0), product of:
              0.08084253 = queryWeight, product of:
                1.2947906 = boost
                4.3580413 = idf(docFreq=1545, maxDocs=44421)
                0.014326793 = queryNorm
              0.38520005 = fieldWeight in 2046, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3580413 = idf(docFreq=1545, maxDocs=44421)
                0.0625 = fieldNorm(doc=2046)
          0.04289446 = weight(abstract_txt:techniques in 2046) [ClassicSimilarity], result of:
            0.04289446 = score(doc=2046,freq=3.0), product of:
              0.08742979 = queryWeight, product of:
                1.3465092 = boost
                4.5321174 = idf(docFreq=1298, maxDocs=44421)
                0.014326793 = queryNorm
              0.49061608 = fieldWeight in 2046, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.5321174 = idf(docFreq=1298, maxDocs=44421)
                0.0625 = fieldNorm(doc=2046)
          0.059581824 = weight(abstract_txt:document in 2046) [ClassicSimilarity], result of:
            0.059581824 = score(doc=2046,freq=2.0), product of:
              0.15697901 = queryWeight, product of:
                2.5516164 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.014326793 = queryNorm
              0.3795528 = fieldWeight in 2046, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.0625 = fieldNorm(doc=2046)
          0.13192292 = weight(abstract_txt:automatic in 2046) [ClassicSimilarity], result of:
            0.13192292 = score(doc=2046,freq=2.0), product of:
              0.28726488 = queryWeight, product of:
                3.859143 = boost
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.014326793 = queryNorm
              0.45923787 = fieldWeight in 2046, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.0625 = fieldNorm(doc=2046)
          0.44182494 = weight(abstract_txt:summarisation in 2046) [ClassicSimilarity], result of:
            0.44182494 = score(doc=2046,freq=2.0), product of:
              0.54235834 = queryWeight, product of:
                4.1074133 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.014326793 = queryNorm
              0.8146366 = fieldWeight in 2046, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.0625 = fieldNorm(doc=2046)
        0.24 = coord(6/25)
    
  5. Mengle, S.; Goharian, N.: Passage detection using text classification (2009) 0.17
    0.16620412 = sum of:
      0.16620412 = product of:
        0.6925172 = sum of:
          0.009616005 = weight(abstract_txt:using in 3765) [ClassicSimilarity], result of:
            0.009616005 = score(doc=3765,freq=1.0), product of:
              0.05086552 = queryWeight, product of:
                1.0270494 = boost
                3.4568708 = idf(docFreq=3806, maxDocs=44421)
                0.014326793 = queryNorm
              0.18904762 = fieldWeight in 3765, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4568708 = idf(docFreq=3806, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3765)
          0.24326852 = weight(abstract_txt:passage in 3765) [ClassicSimilarity], result of:
            0.24326852 = score(doc=3765,freq=14.0), product of:
              0.14435492 = queryWeight, product of:
                1.2234336 = boost
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.014326793 = queryNorm
              1.6852111 = fieldWeight in 3765, product of:
                3.7416575 = tf(freq=14.0), with freq of:
                  14.0 = termFreq=14.0
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3765)
          0.2460928 = weight(abstract_txt:passages in 3765) [ClassicSimilarity], result of:
            0.2460928 = score(doc=3765,freq=14.0), product of:
              0.14547005 = queryWeight, product of:
                1.22815 = boost
                8.267481 = idf(docFreq=30, maxDocs=44421)
                0.014326793 = queryNorm
              1.6917076 = fieldWeight in 3765, product of:
                3.7416575 = tf(freq=14.0), with freq of:
                  14.0 = termFreq=14.0
                8.267481 = idf(docFreq=30, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3765)
          0.037532654 = weight(abstract_txt:techniques in 3765) [ClassicSimilarity], result of:
            0.037532654 = score(doc=3765,freq=3.0), product of:
              0.08742979 = queryWeight, product of:
                1.3465092 = boost
                4.5321174 = idf(docFreq=1298, maxDocs=44421)
                0.014326793 = queryNorm
              0.42928907 = fieldWeight in 3765, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.5321174 = idf(docFreq=1298, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3765)
          0.06385096 = weight(abstract_txt:document in 3765) [ClassicSimilarity], result of:
            0.06385096 = score(doc=3765,freq=3.0), product of:
              0.15697901 = queryWeight, product of:
                2.5516164 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.014326793 = queryNorm
              0.4067484 = fieldWeight in 3765, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3765)
          0.092156306 = weight(abstract_txt:text in 3765) [ClassicSimilarity], result of:
            0.092156306 = score(doc=3765,freq=4.0), product of:
              0.20851189 = queryWeight, product of:
                3.6016843 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.014326793 = queryNorm
              0.44197148 = fieldWeight in 3765, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3765)
        0.24 = coord(6/25)