Document (#32946)

Author
Nomoto, T.
Title
Discriminative sentence compression with conditional random fields
Source
Information processing and management. 43(2007) no.6, S.1571-1587
Year
2007
Abstract
The paper focuses on a particular approach to automatic sentence compression which makes use of a discriminative sequence classifier known as Conditional Random Fields (CRF). We devise several features for CRF that allow it to incorporate information on nonlinear relations among words. Along with that, we address the issue of data paucity by collecting data from RSS feeds available on the Internet, and turning them into training data for use with CRF, drawing on techniques from biology and information retrieval. We also discuss a recursive application of CRF on the syntactic structure of a sentence as a way of improving the readability of the compression it generates. Experiments found that our approach works reasonably well compared to the state-of-the-art system [Knight, K., & Marcu, D. (2002). Summarization beyond sentence extraction: A probabilistic approach to sentence compression. Artificial Intelligence 139, 91-107.].
Theme
Automatisches Abstracting

Similar documents (content)

  1. Zajic, D.M.; Dorr, B.J.; Lin, J.: Single-document and multi-document summarization techniques for email threads using sentence compression (2008) 0.25
    0.24747911 = sum of:
      0.24747911 = product of:
        1.031163 = sum of:
          0.051909085 = weight(abstract_txt:sequence in 3105) [ClassicSimilarity], result of:
            0.051909085 = score(doc=3105,freq=1.0), product of:
              0.09733519 = queryWeight, product of:
                6.82627 = idf(docFreq=130, maxDocs=44421)
                0.014258913 = queryNorm
              0.53330237 = fieldWeight in 3105, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.82627 = idf(docFreq=130, maxDocs=44421)
                0.078125 = fieldNorm(doc=3105)
          0.0091577135 = weight(abstract_txt:that in 3105) [ClassicSimilarity], result of:
            0.0091577135 = score(doc=3105,freq=2.0), product of:
              0.03504796 = queryWeight, product of:
                1.0393386 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.014258913 = queryNorm
              0.2612909 = fieldWeight in 3105, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.078125 = fieldNorm(doc=3105)
          0.15628517 = weight(abstract_txt:summarization in 3105) [ClassicSimilarity], result of:
            0.15628517 = score(doc=3105,freq=7.0), product of:
              0.10609302 = queryWeight, product of:
                1.0440191 = boost
                7.1267567 = idf(docFreq=96, maxDocs=44421)
                0.014258913 = queryNorm
              1.4730957 = fieldWeight in 3105, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                7.1267567 = idf(docFreq=96, maxDocs=44421)
                0.078125 = fieldNorm(doc=3105)
          0.044400524 = weight(abstract_txt:approach in 3105) [ClassicSimilarity], result of:
            0.044400524 = score(doc=3105,freq=3.0), product of:
              0.08770675 = queryWeight, product of:
                1.6441529 = boost
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.014258913 = queryNorm
              0.5062384 = fieldWeight in 3105, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.078125 = fieldNorm(doc=3105)
          0.39860794 = weight(abstract_txt:compression in 3105) [ClassicSimilarity], result of:
            0.39860794 = score(doc=3105,freq=2.0), product of:
              0.47732583 = queryWeight, product of:
                4.428968 = boost
                7.558333 = idf(docFreq=62, maxDocs=44421)
                0.014258913 = queryNorm
              0.83508563 = fieldWeight in 3105, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.558333 = idf(docFreq=62, maxDocs=44421)
                0.078125 = fieldNorm(doc=3105)
          0.37080246 = weight(abstract_txt:sentence in 3105) [ClassicSimilarity], result of:
            0.37080246 = score(doc=3105,freq=2.0), product of:
              0.48998487 = queryWeight, product of:
                5.0169687 = boost
                6.849437 = idf(docFreq=127, maxDocs=44421)
                0.014258913 = queryNorm
              0.7567631 = fieldWeight in 3105, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.849437 = idf(docFreq=127, maxDocs=44421)
                0.078125 = fieldNorm(doc=3105)
        0.24 = coord(6/25)
    
  2. Cannane, A.; Williams, H.E.: General-purpose compression for efficient retrieval (2001) 0.24
    0.2397699 = sum of:
      0.2397699 = product of:
        0.9990413 = sum of:
          0.011215861 = weight(abstract_txt:that in 6705) [ClassicSimilarity], result of:
            0.011215861 = score(doc=6705,freq=3.0), product of:
              0.03504796 = queryWeight, product of:
                1.0393386 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.014258913 = queryNorm
              0.32001466 = fieldWeight in 6705, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.078125 = fieldNorm(doc=6705)
          0.0076141404 = weight(abstract_txt:with in 6705) [ClassicSimilarity], result of:
            0.0076141404 = score(doc=6705,freq=1.0), product of:
              0.03904469 = queryWeight, product of:
                1.0970001 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.014258913 = queryNorm
              0.19501092 = fieldWeight in 6705, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.078125 = fieldNorm(doc=6705)
          0.018081529 = weight(abstract_txt:data in 6705) [ClassicSimilarity], result of:
            0.018081529 = score(doc=6705,freq=1.0), product of:
              0.06949787 = queryWeight, product of:
                1.4635631 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.014258913 = queryNorm
              0.26017386 = fieldWeight in 6705, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.078125 = fieldNorm(doc=6705)
          0.025634654 = weight(abstract_txt:approach in 6705) [ClassicSimilarity], result of:
            0.025634654 = score(doc=6705,freq=1.0), product of:
              0.08770675 = queryWeight, product of:
                1.6441529 = boost
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.014258913 = queryNorm
              0.29227686 = fieldWeight in 6705, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.078125 = fieldNorm(doc=6705)
          0.09091985 = weight(abstract_txt:random in 6705) [ClassicSimilarity], result of:
            0.09091985 = score(doc=6705,freq=1.0), product of:
              0.17819278 = queryWeight, product of:
                1.9134852 = boost
                6.5309834 = idf(docFreq=175, maxDocs=44421)
                0.014258913 = queryNorm
              0.5102331 = fieldWeight in 6705, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5309834 = idf(docFreq=175, maxDocs=44421)
                0.078125 = fieldNorm(doc=6705)
          0.8455753 = weight(abstract_txt:compression in 6705) [ClassicSimilarity], result of:
            0.8455753 = score(doc=6705,freq=9.0), product of:
              0.47732583 = queryWeight, product of:
                4.428968 = boost
                7.558333 = idf(docFreq=62, maxDocs=44421)
                0.014258913 = queryNorm
              1.7714844 = fieldWeight in 6705, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                7.558333 = idf(docFreq=62, maxDocs=44421)
                0.078125 = fieldNorm(doc=6705)
        0.24 = coord(6/25)
    
  3. Zajic, D.; Dorr, B.J.; Lin, J.; Schwartz, R.: Multi-candidate reduction : sentence compression as a tool for document summarization tasks (2007) 0.24
    0.23946898 = sum of:
      0.23946898 = product of:
        1.1973449 = sum of:
          0.007770577 = weight(abstract_txt:that in 1944) [ClassicSimilarity], result of:
            0.007770577 = score(doc=1944,freq=1.0), product of:
              0.03504796 = queryWeight, product of:
                1.0393386 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.014258913 = queryNorm
              0.22171268 = fieldWeight in 1944, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.09375 = fieldNorm(doc=1944)
          0.122775204 = weight(abstract_txt:summarization in 1944) [ClassicSimilarity], result of:
            0.122775204 = score(doc=1944,freq=3.0), product of:
              0.10609302 = queryWeight, product of:
                1.0440191 = boost
                7.1267567 = idf(docFreq=96, maxDocs=44421)
                0.014258913 = queryNorm
              1.1572411 = fieldWeight in 1944, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.1267567 = idf(docFreq=96, maxDocs=44421)
                0.09375 = fieldNorm(doc=1944)
          0.04350345 = weight(abstract_txt:approach in 1944) [ClassicSimilarity], result of:
            0.04350345 = score(doc=1944,freq=2.0), product of:
              0.08770675 = queryWeight, product of:
                1.6441529 = boost
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.014258913 = queryNorm
              0.49601027 = fieldWeight in 1944, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.09375 = fieldNorm(doc=1944)
          0.47832957 = weight(abstract_txt:compression in 1944) [ClassicSimilarity], result of:
            0.47832957 = score(doc=1944,freq=2.0), product of:
              0.47732583 = queryWeight, product of:
                4.428968 = boost
                7.558333 = idf(docFreq=62, maxDocs=44421)
                0.014258913 = queryNorm
              1.0021029 = fieldWeight in 1944, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.558333 = idf(docFreq=62, maxDocs=44421)
                0.09375 = fieldNorm(doc=1944)
          0.5449661 = weight(abstract_txt:sentence in 1944) [ClassicSimilarity], result of:
            0.5449661 = score(doc=1944,freq=3.0), product of:
              0.48998487 = queryWeight, product of:
                5.0169687 = boost
                6.849437 = idf(docFreq=127, maxDocs=44421)
                0.014258913 = queryNorm
              1.11221 = fieldWeight in 1944, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.849437 = idf(docFreq=127, maxDocs=44421)
                0.09375 = fieldNorm(doc=1944)
        0.2 = coord(5/25)
    
  4. Yeh, J.-Y.; Ke, H.-R.; Yang, W.-P.; Meng, I.-H.: Text summarization using a trainable summarizer and latent semantic analysis (2005) 0.19
    0.1916607 = sum of:
      0.1916607 = product of:
        0.79858625 = sum of:
          0.06683035 = weight(abstract_txt:summarization in 2003) [ClassicSimilarity], result of:
            0.06683035 = score(doc=2003,freq=2.0), product of:
              0.10609302 = queryWeight, product of:
                1.0440191 = boost
                7.1267567 = idf(docFreq=96, maxDocs=44421)
                0.014258913 = queryNorm
              0.6299222 = fieldWeight in 2003, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.1267567 = idf(docFreq=96, maxDocs=44421)
                0.0625 = fieldNorm(doc=2003)
          0.0060913125 = weight(abstract_txt:with in 2003) [ClassicSimilarity], result of:
            0.0060913125 = score(doc=2003,freq=1.0), product of:
              0.03904469 = queryWeight, product of:
                1.0970001 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.014258913 = queryNorm
              0.15600874 = fieldWeight in 2003, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.0625 = fieldNorm(doc=2003)
          0.014465223 = weight(abstract_txt:data in 2003) [ClassicSimilarity], result of:
            0.014465223 = score(doc=2003,freq=1.0), product of:
              0.06949787 = queryWeight, product of:
                1.4635631 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.014258913 = queryNorm
              0.20813909 = fieldWeight in 2003, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.0625 = fieldNorm(doc=2003)
          0.0290023 = weight(abstract_txt:approach in 2003) [ClassicSimilarity], result of:
            0.0290023 = score(doc=2003,freq=2.0), product of:
              0.08770675 = queryWeight, product of:
                1.6441529 = boost
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.014258913 = queryNorm
              0.33067352 = fieldWeight in 2003, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.0625 = fieldNorm(doc=2003)
          0.31888637 = weight(abstract_txt:compression in 2003) [ClassicSimilarity], result of:
            0.31888637 = score(doc=2003,freq=2.0), product of:
              0.47732583 = queryWeight, product of:
                4.428968 = boost
                7.558333 = idf(docFreq=62, maxDocs=44421)
                0.014258913 = queryNorm
              0.6680685 = fieldWeight in 2003, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.558333 = idf(docFreq=62, maxDocs=44421)
                0.0625 = fieldNorm(doc=2003)
          0.3633107 = weight(abstract_txt:sentence in 2003) [ClassicSimilarity], result of:
            0.3633107 = score(doc=2003,freq=3.0), product of:
              0.48998487 = queryWeight, product of:
                5.0169687 = boost
                6.849437 = idf(docFreq=127, maxDocs=44421)
                0.014258913 = queryNorm
              0.7414733 = fieldWeight in 2003, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.849437 = idf(docFreq=127, maxDocs=44421)
                0.0625 = fieldNorm(doc=2003)
        0.24 = coord(6/25)
    
  5. Finegan-Dollak, C.; Radev, D.R.: Sentence simplification, compression, and disaggregation for summarization of sophisticated documents (2016) 0.17
    0.16683269 = sum of:
      0.16683269 = product of:
        0.8341634 = sum of:
          0.008972689 = weight(abstract_txt:that in 4122) [ClassicSimilarity], result of:
            0.008972689 = score(doc=4122,freq=3.0), product of:
              0.03504796 = queryWeight, product of:
                1.0393386 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.014258913 = queryNorm
              0.25601172 = fieldWeight in 4122, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0625 = fieldNorm(doc=4122)
          0.010550464 = weight(abstract_txt:with in 4122) [ClassicSimilarity], result of:
            0.010550464 = score(doc=4122,freq=3.0), product of:
              0.03904469 = queryWeight, product of:
                1.0970001 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.014258913 = queryNorm
              0.27021506 = fieldWeight in 4122, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.0625 = fieldNorm(doc=4122)
          0.060775112 = weight(abstract_txt:generates in 4122) [ClassicSimilarity], result of:
            0.060775112 = score(doc=4122,freq=1.0), product of:
              0.12546757 = queryWeight, product of:
                1.1353527 = boost
                7.750224 = idf(docFreq=51, maxDocs=44421)
                0.014258913 = queryNorm
              0.484389 = fieldWeight in 4122, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.750224 = idf(docFreq=51, maxDocs=44421)
                0.0625 = fieldNorm(doc=4122)
          0.39055446 = weight(abstract_txt:compression in 4122) [ClassicSimilarity], result of:
            0.39055446 = score(doc=4122,freq=3.0), product of:
              0.47732583 = queryWeight, product of:
                4.428968 = boost
                7.558333 = idf(docFreq=62, maxDocs=44421)
                0.014258913 = queryNorm
              0.8182135 = fieldWeight in 4122, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.558333 = idf(docFreq=62, maxDocs=44421)
                0.0625 = fieldNorm(doc=4122)
          0.3633107 = weight(abstract_txt:sentence in 4122) [ClassicSimilarity], result of:
            0.3633107 = score(doc=4122,freq=3.0), product of:
              0.48998487 = queryWeight, product of:
                5.0169687 = boost
                6.849437 = idf(docFreq=127, maxDocs=44421)
                0.014258913 = queryNorm
              0.7414733 = fieldWeight in 4122, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.849437 = idf(docFreq=127, maxDocs=44421)
                0.0625 = fieldNorm(doc=4122)
        0.2 = coord(5/25)