Document (#33027)

Author
Moens, M.-F.
Angheluta, R.
Dumortier, J.
Title
Generic technologies for single-and multi-document summarization
Source
Information processing and management. 41(2005) no.3, S.569-586
Year
2005
Abstract
The technologies for single- and multi-document summarization that are described and evaluated in this article can be used on heterogeneous texts for different summarization tasks. They refer to the extraction of important sentences from the documents, compressing the sentences to their essential or relevant content, and detecting redundant content across sentences. The technologies are tested at the Document Understanding Conference, organized by the National Institute of Standards and Technology, USA in 2002 and 2003. The system obtained good to very good results in this competition. We tested our summarization system also on a variety of English Encyclopedia texts and on Dutch magazine articles. The results show that relying on generic linguistic resources and statistical techniques offer a basis for text summarization.

Similar documents (author)

  1. Moens, M.F.: Automatic indexing and abstracting of document texts (2000) 5.87
    5.874302 = sum of:
      5.874302 = weight(author_txt:moens in 892) [ClassicSimilarity], result of:
        5.874302 = fieldWeight in 892, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.398883 = idf(docFreq=9, maxDocs=44421)
          0.625 = fieldNorm(doc=892)
    
  2. Moens, M.F.; Dumortier, J.: Use of a text grammar for generating highlight abstracts of magazine articles (2000) 4.70
    4.6994414 = sum of:
      4.6994414 = weight(author_txt:moens in 5540) [ClassicSimilarity], result of:
        4.6994414 = fieldWeight in 5540, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.398883 = idf(docFreq=9, maxDocs=44421)
          0.5 = fieldNorm(doc=5540)
    
  3. Moens, M.-F.: Summarizing court decisions (2007) 4.70
    4.6994414 = sum of:
      4.6994414 = weight(author_txt:moens in 1954) [ClassicSimilarity], result of:
        4.6994414 = fieldWeight in 1954, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.398883 = idf(docFreq=9, maxDocs=44421)
          0.5 = fieldNorm(doc=1954)
    
  4. Moens, M.-F.; Dumortier, J.: Text categorization : the assignment of subject descriptors to magazine articles (2000) 4.11
    4.1120114 = sum of:
      4.1120114 = weight(author_txt:moens in 3397) [ClassicSimilarity], result of:
        4.1120114 = fieldWeight in 3397, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.398883 = idf(docFreq=9, maxDocs=44421)
          0.4375 = fieldNorm(doc=3397)
    
  5. Moens, M.-F.; Uyttendaele, C.: Automatic text structuring and categorization as a first step in summarizing legal cases (1997) 4.11
    4.1120114 = sum of:
      4.1120114 = weight(author_txt:moens in 3256) [ClassicSimilarity], result of:
        4.1120114 = fieldWeight in 3256, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.398883 = idf(docFreq=9, maxDocs=44421)
          0.4375 = fieldNorm(doc=3256)
    

Similar documents (content)

  1. Wan, X.; Yang, J.; Xiao, J.: Incorporating cross-document relationships between sentences for single document summarizations (2006) 0.28
    0.2848958 = sum of:
      0.2848958 = product of:
        1.1870658 = sum of:
          0.047234755 = weight(abstract_txt:2002 in 3421) [ClassicSimilarity], result of:
            0.047234755 = score(doc=3421,freq=1.0), product of:
              0.096862584 = queryWeight, product of:
                6.2418823 = idf(docFreq=234, maxDocs=44421)
                0.015518169 = queryNorm
              0.48764706 = fieldWeight in 3421, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2418823 = idf(docFreq=234, maxDocs=44421)
                0.078125 = fieldNorm(doc=3421)
          0.016352173 = weight(abstract_txt:results in 3421) [ClassicSimilarity], result of:
            0.016352173 = score(doc=3421,freq=1.0), product of:
              0.06016934 = queryWeight, product of:
                1.1146142 = boost
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.015518169 = queryNorm
              0.2717692 = fieldWeight in 3421, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.078125 = fieldNorm(doc=3421)
          0.09594859 = weight(abstract_txt:single in 3421) [ClassicSimilarity], result of:
            0.09594859 = score(doc=3421,freq=3.0), product of:
              0.13571993 = queryWeight, product of:
                1.6740128 = boost
                5.2244954 = idf(docFreq=649, maxDocs=44421)
                0.015518169 = queryNorm
              0.70696026 = fieldWeight in 3421, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.2244954 = idf(docFreq=649, maxDocs=44421)
                0.078125 = fieldNorm(doc=3421)
          0.13050067 = weight(abstract_txt:document in 3421) [ClassicSimilarity], result of:
            0.13050067 = score(doc=3421,freq=8.0), product of:
              0.13753098 = queryWeight, product of:
                2.0638726 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.015518169 = queryNorm
              0.94888204 = fieldWeight in 3421, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.078125 = fieldNorm(doc=3421)
          0.39989316 = weight(abstract_txt:sentences in 3421) [ClassicSimilarity], result of:
            0.39989316 = score(doc=3421,freq=4.0), product of:
              0.36556506 = queryWeight, product of:
                3.364844 = boost
                7.000987 = idf(docFreq=109, maxDocs=44421)
                0.015518169 = queryNorm
              1.0939043 = fieldWeight in 3421, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.000987 = idf(docFreq=109, maxDocs=44421)
                0.078125 = fieldNorm(doc=3421)
          0.49713653 = weight(abstract_txt:summarization in 3421) [ClassicSimilarity], result of:
            0.49713653 = score(doc=3421,freq=2.0), product of:
              0.63136244 = queryWeight, product of:
                5.70882 = boost
                7.1267567 = idf(docFreq=96, maxDocs=44421)
                0.015518169 = queryNorm
              0.78740275 = fieldWeight in 3421, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.1267567 = idf(docFreq=96, maxDocs=44421)
                0.078125 = fieldNorm(doc=3421)
        0.24 = coord(6/25)
    
  2. Agarwal, B.; Ramampiaro, H.; Langseth, H.; Ruocco, M.: ¬A deep network model for paraphrase detection in short text messages (2018) 0.24
    0.24436627 = sum of:
      0.24436627 = product of:
        0.8727367 = sum of:
          0.013081738 = weight(abstract_txt:results in 43) [ClassicSimilarity], result of:
            0.013081738 = score(doc=43,freq=1.0), product of:
              0.06016934 = queryWeight, product of:
                1.1146142 = boost
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.015518169 = queryNorm
              0.21741535 = fieldWeight in 43, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.0625 = fieldNorm(doc=43)
          0.070775695 = weight(abstract_txt:detecting in 43) [ClassicSimilarity], result of:
            0.070775695 = score(doc=43,freq=1.0), product of:
              0.1471785 = queryWeight, product of:
                1.2326624 = boost
                7.694134 = idf(docFreq=54, maxDocs=44421)
                0.015518169 = queryNorm
              0.4808834 = fieldWeight in 43, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.694134 = idf(docFreq=54, maxDocs=44421)
                0.0625 = fieldNorm(doc=43)
          0.07534156 = weight(abstract_txt:good in 43) [ClassicSimilarity], result of:
            0.07534156 = score(doc=43,freq=2.0), product of:
              0.15344216 = queryWeight, product of:
                1.7799562 = boost
                5.5551386 = idf(docFreq=466, maxDocs=44421)
                0.015518169 = queryNorm
              0.4910095 = fieldWeight in 43, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.5551386 = idf(docFreq=466, maxDocs=44421)
                0.0625 = fieldNorm(doc=43)
          0.12451501 = weight(abstract_txt:texts in 43) [ClassicSimilarity], result of:
            0.12451501 = score(doc=43,freq=5.0), product of:
              0.15803602 = queryWeight, product of:
                1.8064046 = boost
                5.6376824 = idf(docFreq=429, maxDocs=44421)
                0.015518169 = queryNorm
              0.7878901 = fieldWeight in 43, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.6376824 = idf(docFreq=429, maxDocs=44421)
                0.0625 = fieldNorm(doc=43)
          0.081586055 = weight(abstract_txt:generic in 43) [ClassicSimilarity], result of:
            0.081586055 = score(doc=43,freq=1.0), product of:
              0.2038648 = queryWeight, product of:
                2.0516727 = boost
                6.40315 = idf(docFreq=199, maxDocs=44421)
                0.015518169 = queryNorm
              0.40019688 = fieldWeight in 43, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.40315 = idf(docFreq=199, maxDocs=44421)
                0.0625 = fieldNorm(doc=43)
          0.22621372 = weight(abstract_txt:sentences in 43) [ClassicSimilarity], result of:
            0.22621372 = score(doc=43,freq=2.0), product of:
              0.36556506 = queryWeight, product of:
                3.364844 = boost
                7.000987 = idf(docFreq=109, maxDocs=44421)
                0.015518169 = queryNorm
              0.61880565 = fieldWeight in 43, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.000987 = idf(docFreq=109, maxDocs=44421)
                0.0625 = fieldNorm(doc=43)
          0.2812229 = weight(abstract_txt:summarization in 43) [ClassicSimilarity], result of:
            0.2812229 = score(doc=43,freq=1.0), product of:
              0.63136244 = queryWeight, product of:
                5.70882 = boost
                7.1267567 = idf(docFreq=96, maxDocs=44421)
                0.015518169 = queryNorm
              0.4454223 = fieldWeight in 43, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.1267567 = idf(docFreq=96, maxDocs=44421)
                0.0625 = fieldNorm(doc=43)
        0.28 = coord(7/25)
    
  3. Lee, J.-H.; Park, S.; Ahn, C.-M.; Kim, D.: Automatic generic document summarization based on non-negative matrix factorization (2009) 0.24
    0.24137555 = sum of:
      0.24137555 = product of:
        1.2068777 = sum of:
          0.019622607 = weight(abstract_txt:results in 3448) [ClassicSimilarity], result of:
            0.019622607 = score(doc=3448,freq=1.0), product of:
              0.06016934 = queryWeight, product of:
                1.1146142 = boost
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.015518169 = queryNorm
              0.32612303 = fieldWeight in 3448, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.09375 = fieldNorm(doc=3448)
          0.17307016 = weight(abstract_txt:generic in 3448) [ClassicSimilarity], result of:
            0.17307016 = score(doc=3448,freq=2.0), product of:
              0.2038648 = queryWeight, product of:
                2.0516727 = boost
                6.40315 = idf(docFreq=199, maxDocs=44421)
                0.015518169 = queryNorm
              0.8489458 = fieldWeight in 3448, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.40315 = idf(docFreq=199, maxDocs=44421)
                0.09375 = fieldNorm(doc=3448)
          0.0783004 = weight(abstract_txt:document in 3448) [ClassicSimilarity], result of:
            0.0783004 = score(doc=3448,freq=2.0), product of:
              0.13753098 = queryWeight, product of:
                2.0638726 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.015518169 = queryNorm
              0.5693292 = fieldWeight in 3448, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.09375 = fieldNorm(doc=3448)
          0.3393206 = weight(abstract_txt:sentences in 3448) [ClassicSimilarity], result of:
            0.3393206 = score(doc=3448,freq=2.0), product of:
              0.36556506 = queryWeight, product of:
                3.364844 = boost
                7.000987 = idf(docFreq=109, maxDocs=44421)
                0.015518169 = queryNorm
              0.9282085 = fieldWeight in 3448, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.000987 = idf(docFreq=109, maxDocs=44421)
                0.09375 = fieldNorm(doc=3448)
          0.5965639 = weight(abstract_txt:summarization in 3448) [ClassicSimilarity], result of:
            0.5965639 = score(doc=3448,freq=2.0), product of:
              0.63136244 = queryWeight, product of:
                5.70882 = boost
                7.1267567 = idf(docFreq=96, maxDocs=44421)
                0.015518169 = queryNorm
              0.94488335 = fieldWeight in 3448, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.1267567 = idf(docFreq=96, maxDocs=44421)
                0.09375 = fieldNorm(doc=3448)
        0.2 = coord(5/25)
    
  4. Sankarasubramaniam, Y.; Ramanathan, K.; Ghosh, S.: Text summarization using Wikipedia (2014) 0.24
    0.2380859 = sum of:
      0.2380859 = product of:
        0.9920246 = sum of:
          0.01850037 = weight(abstract_txt:results in 3693) [ClassicSimilarity], result of:
            0.01850037 = score(doc=3693,freq=2.0), product of:
              0.06016934 = queryWeight, product of:
                1.1146142 = boost
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.015518169 = queryNorm
              0.30747172 = fieldWeight in 3693, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.0625 = fieldNorm(doc=3693)
          0.022690374 = weight(abstract_txt:content in 3693) [ClassicSimilarity], result of:
            0.022690374 = score(doc=3693,freq=1.0), product of:
              0.086861245 = queryWeight, product of:
                1.3392141 = boost
                4.1796083 = idf(docFreq=1847, maxDocs=44421)
                0.015518169 = queryNorm
              0.26122552 = fieldWeight in 3693, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1796083 = idf(docFreq=1847, maxDocs=44421)
                0.0625 = fieldNorm(doc=3693)
          0.06511278 = weight(abstract_txt:multi in 3693) [ClassicSimilarity], result of:
            0.06511278 = score(doc=3693,freq=1.0), product of:
              0.17540519 = queryWeight, product of:
                1.903085 = boost
                5.9394164 = idf(docFreq=317, maxDocs=44421)
                0.015518169 = queryNorm
              0.37121353 = fieldWeight in 3693, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9394164 = idf(docFreq=317, maxDocs=44421)
                0.0625 = fieldNorm(doc=3693)
          0.036911167 = weight(abstract_txt:document in 3693) [ClassicSimilarity], result of:
            0.036911167 = score(doc=3693,freq=1.0), product of:
              0.13753098 = queryWeight, product of:
                2.0638726 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.015518169 = queryNorm
              0.26838437 = fieldWeight in 3693, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.0625 = fieldNorm(doc=3693)
          0.15995726 = weight(abstract_txt:sentences in 3693) [ClassicSimilarity], result of:
            0.15995726 = score(doc=3693,freq=1.0), product of:
              0.36556506 = queryWeight, product of:
                3.364844 = boost
                7.000987 = idf(docFreq=109, maxDocs=44421)
                0.015518169 = queryNorm
              0.4375617 = fieldWeight in 3693, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.000987 = idf(docFreq=109, maxDocs=44421)
                0.0625 = fieldNorm(doc=3693)
          0.68885267 = weight(abstract_txt:summarization in 3693) [ClassicSimilarity], result of:
            0.68885267 = score(doc=3693,freq=6.0), product of:
              0.63136244 = queryWeight, product of:
                5.70882 = boost
                7.1267567 = idf(docFreq=96, maxDocs=44421)
                0.015518169 = queryNorm
              1.0910574 = fieldWeight in 3693, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.1267567 = idf(docFreq=96, maxDocs=44421)
                0.0625 = fieldNorm(doc=3693)
        0.24 = coord(6/25)
    
  5. Zajic, D.M.; Dorr, B.J.; Lin, J.: Single-document and multi-document summarization techniques for email threads using sentence compression (2008) 0.23
    0.22968934 = sum of:
      0.22968934 = product of:
        1.1484467 = sum of:
          0.016352173 = weight(abstract_txt:results in 3105) [ClassicSimilarity], result of:
            0.016352173 = score(doc=3105,freq=1.0), product of:
              0.06016934 = queryWeight, product of:
                1.1146142 = boost
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.015518169 = queryNorm
              0.2717692 = fieldWeight in 3105, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.078125 = fieldNorm(doc=3105)
          0.055395946 = weight(abstract_txt:single in 3105) [ClassicSimilarity], result of:
            0.055395946 = score(doc=3105,freq=1.0), product of:
              0.13571993 = queryWeight, product of:
                1.6740128 = boost
                5.2244954 = idf(docFreq=649, maxDocs=44421)
                0.015518169 = queryNorm
              0.4081637 = fieldWeight in 3105, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2244954 = idf(docFreq=649, maxDocs=44421)
                0.078125 = fieldNorm(doc=3105)
          0.08139098 = weight(abstract_txt:multi in 3105) [ClassicSimilarity], result of:
            0.08139098 = score(doc=3105,freq=1.0), product of:
              0.17540519 = queryWeight, product of:
                1.903085 = boost
                5.9394164 = idf(docFreq=317, maxDocs=44421)
                0.015518169 = queryNorm
              0.4640169 = fieldWeight in 3105, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9394164 = idf(docFreq=317, maxDocs=44421)
                0.078125 = fieldNorm(doc=3105)
          0.06525034 = weight(abstract_txt:document in 3105) [ClassicSimilarity], result of:
            0.06525034 = score(doc=3105,freq=2.0), product of:
              0.13753098 = queryWeight, product of:
                2.0638726 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.015518169 = queryNorm
              0.47444102 = fieldWeight in 3105, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.078125 = fieldNorm(doc=3105)
          0.9300573 = weight(abstract_txt:summarization in 3105) [ClassicSimilarity], result of:
            0.9300573 = score(doc=3105,freq=7.0), product of:
              0.63136244 = queryWeight, product of:
                5.70882 = boost
                7.1267567 = idf(docFreq=96, maxDocs=44421)
                0.015518169 = queryNorm
              1.4730957 = fieldWeight in 3105, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                7.1267567 = idf(docFreq=96, maxDocs=44421)
                0.078125 = fieldNorm(doc=3105)
        0.2 = coord(5/25)