Document (#32944)

Soricut, R.
Marcu, D.
Abstractive headline generation using WIDL-expressions
Information processing and management. 43(2007) no.6, S.1536-1548
We present a new paradigm for the automatic creation of document headlines that is based on direct transformation of relevant textual information into well-formed textual output. Starting from an input document, we automatically create compact representations of weighted finite sets of strings, called WIDL-expressions, which encode the most important topics in the document. A generic natural language generation engine performs the headline generation task, driven by both statistical knowledge encapsulated in WIDL-expressions (representing topic biases induced by the input document) and statistical knowledge encapsulated in language models (representing biases induced by the target language). Our evaluation shows similar performance in quality with a state-of-the-art, extractive approach to headline generation, and significant improvements in quality over previously proposed solutions to abstractive headline generation.
Automatisches Abstracting

Similar documents (content)

  1. Aker, A.; Gaizauskas, R.: Generating descriptive multi-document summaries of geo-located entities using entity type models (2015) 0.09
    0.09196349 = sum of:
      0.09196349 = product of:
        0.45981744 = sum of:
          0.02925962 = weight(abstract_txt:quality in 2726) [ClassicSimilarity], result of:
            0.02925962 = score(doc=2726,freq=1.0), product of:
              0.100798994 = queryWeight, product of:
                1.4506706 = boost
                4.6444306 = idf(docFreq=1160, maxDocs=44421)
                0.014960804 = queryNorm
              0.2902769 = fieldWeight in 2726, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6444306 = idf(docFreq=1160, maxDocs=44421)
                0.0625 = fieldNorm(doc=2726)
          0.16630508 = weight(abstract_txt:extractive in 2726) [ClassicSimilarity], result of:
            0.16630508 = score(doc=2726,freq=2.0), product of:
              0.20223707 = queryWeight, product of:
                1.4529681 = boost
                9.303573 = idf(docFreq=10, maxDocs=44421)
                0.014960804 = queryNorm
              0.8223274 = fieldWeight in 2726, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.303573 = idf(docFreq=10, maxDocs=44421)
                0.0625 = fieldNorm(doc=2726)
          0.055060457 = weight(abstract_txt:language in 2726) [ClassicSimilarity], result of:
            0.055060457 = score(doc=2726,freq=3.0), product of:
              0.12194396 = queryWeight, product of:
                1.9541885 = boost
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.014960804 = queryNorm
              0.45152265 = fieldWeight in 2726, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.0625 = fieldNorm(doc=2726)
          0.08011124 = weight(abstract_txt:document in 2726) [ClassicSimilarity], result of:
            0.08011124 = score(doc=2726,freq=3.0), product of:
              0.17233585 = queryWeight, product of:
                2.6825233 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.014960804 = queryNorm
              0.46485534 = fieldWeight in 2726, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.0625 = fieldNorm(doc=2726)
          0.12908104 = weight(abstract_txt:generation in 2726) [ClassicSimilarity], result of:
            0.12908104 = score(doc=2726,freq=1.0), product of:
              0.36798662 = queryWeight, product of:
                4.3825483 = boost
                5.612423 = idf(docFreq=440, maxDocs=44421)
                0.014960804 = queryNorm
              0.35077643 = fieldWeight in 2726, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.612423 = idf(docFreq=440, maxDocs=44421)
                0.0625 = fieldNorm(doc=2726)
        0.2 = coord(5/25)
  2. Stede, M.: Lexicalization in natural language generation (2002) 0.09
    0.08587664 = sum of:
      0.08587664 = product of:
        0.42938322 = sum of:
          0.047440078 = weight(abstract_txt:generic in 5245) [ClassicSimilarity], result of:
            0.047440078 = score(doc=5245,freq=2.0), product of:
              0.09579627 = queryWeight, product of:
                6.40315 = idf(docFreq=199, maxDocs=44421)
                0.014960804 = queryNorm
              0.4952184 = fieldWeight in 5245, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.40315 = idf(docFreq=199, maxDocs=44421)
                0.0546875 = fieldNorm(doc=5245)
          0.011398292 = weight(abstract_txt:knowledge in 5245) [ClassicSimilarity], result of:
            0.011398292 = score(doc=5245,freq=1.0), product of:
              0.058771245 = queryWeight, product of:
                1.1077025 = boost
                3.5463927 = idf(docFreq=3480, maxDocs=44421)
                0.014960804 = queryNorm
              0.19394335 = fieldWeight in 5245, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5463927 = idf(docFreq=3480, maxDocs=44421)
                0.0546875 = fieldNorm(doc=5245)
          0.08245565 = weight(abstract_txt:input in 5245) [ClassicSimilarity], result of:
            0.08245565 = score(doc=5245,freq=2.0), product of:
              0.17447853 = queryWeight, product of:
                1.9085858 = boost
                6.110481 = idf(docFreq=267, maxDocs=44421)
                0.014960804 = queryNorm
              0.47258335 = fieldWeight in 5245, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.110481 = idf(docFreq=267, maxDocs=44421)
                0.0546875 = fieldNorm(doc=5245)
          0.0621974 = weight(abstract_txt:language in 5245) [ClassicSimilarity], result of:
            0.0621974 = score(doc=5245,freq=5.0), product of:
              0.12194396 = queryWeight, product of:
                1.9541885 = boost
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.014960804 = queryNorm
              0.51004905 = fieldWeight in 5245, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.0546875 = fieldNorm(doc=5245)
          0.22589181 = weight(abstract_txt:generation in 5245) [ClassicSimilarity], result of:
            0.22589181 = score(doc=5245,freq=4.0), product of:
              0.36798662 = queryWeight, product of:
                4.3825483 = boost
                5.612423 = idf(docFreq=440, maxDocs=44421)
                0.014960804 = queryNorm
              0.61385876 = fieldWeight in 5245, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.612423 = idf(docFreq=440, maxDocs=44421)
                0.0546875 = fieldNorm(doc=5245)
        0.2 = coord(5/25)
  3. Robin, J.; McKeown, K.: Empirically designing and evaluating a new revision-based model for summary generation (1996) 0.08
    0.07901832 = sum of:
      0.07901832 = product of:
        0.4938645 = sum of:
          0.13983192 = weight(abstract_txt:encode in 6819) [ClassicSimilarity], result of:
            0.13983192 = score(doc=6819,freq=1.0), product of:
              0.17322494 = queryWeight, product of:
                1.344717 = boost
                8.610425 = idf(docFreq=21, maxDocs=44421)
                0.014960804 = queryNorm
              0.8072274 = fieldWeight in 6819, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.610425 = idf(docFreq=21, maxDocs=44421)
                0.09375 = fieldNorm(doc=6819)
          0.09297602 = weight(abstract_txt:textual in 6819) [ClassicSimilarity], result of:
            0.09297602 = score(doc=6819,freq=1.0), product of:
              0.16626348 = queryWeight, product of:
                1.8631127 = boost
                5.9648952 = idf(docFreq=309, maxDocs=44421)
                0.014960804 = queryNorm
              0.5592089 = fieldWeight in 6819, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9648952 = idf(docFreq=309, maxDocs=44421)
                0.09375 = fieldNorm(doc=6819)
          0.06743502 = weight(abstract_txt:language in 6819) [ClassicSimilarity], result of:
            0.06743502 = score(doc=6819,freq=2.0), product of:
              0.12194396 = queryWeight, product of:
                1.9541885 = boost
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.014960804 = queryNorm
              0.5530001 = fieldWeight in 6819, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.09375 = fieldNorm(doc=6819)
          0.19362155 = weight(abstract_txt:generation in 6819) [ClassicSimilarity], result of:
            0.19362155 = score(doc=6819,freq=1.0), product of:
              0.36798662 = queryWeight, product of:
                4.3825483 = boost
                5.612423 = idf(docFreq=440, maxDocs=44421)
                0.014960804 = queryNorm
              0.52616465 = fieldWeight in 6819, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.612423 = idf(docFreq=440, maxDocs=44421)
                0.09375 = fieldNorm(doc=6819)
        0.16 = coord(4/25)
  4. Helbig, H.: Knowledge representation and the semantics of natural language (2014) 0.08
    0.07811909 = sum of:
      0.07811909 = product of:
        0.48824432 = sum of:
          0.022562766 = weight(abstract_txt:knowledge in 3396) [ClassicSimilarity], result of:
            0.022562766 = score(doc=3396,freq=3.0), product of:
              0.058771245 = queryWeight, product of:
                1.1077025 = boost
                3.5463927 = idf(docFreq=3480, maxDocs=44421)
                0.014960804 = queryNorm
              0.38390827 = fieldWeight in 3396, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.5463927 = idf(docFreq=3480, maxDocs=44421)
                0.0625 = fieldNorm(doc=3396)
          0.08991335 = weight(abstract_txt:language in 3396) [ClassicSimilarity], result of:
            0.08991335 = score(doc=3396,freq=8.0), product of:
              0.12194396 = queryWeight, product of:
                1.9541885 = boost
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.014960804 = queryNorm
              0.7373334 = fieldWeight in 3396, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.0625 = fieldNorm(doc=3396)
          0.19322008 = weight(abstract_txt:expressions in 3396) [ClassicSimilarity], result of:
            0.19322008 = score(doc=3396,freq=2.0), product of:
              0.32235345 = queryWeight, product of:
                3.177258 = boost
                6.7814865 = idf(docFreq=136, maxDocs=44421)
                0.014960804 = queryNorm
              0.5994044 = fieldWeight in 3396, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.7814865 = idf(docFreq=136, maxDocs=44421)
                0.0625 = fieldNorm(doc=3396)
          0.18254814 = weight(abstract_txt:generation in 3396) [ClassicSimilarity], result of:
            0.18254814 = score(doc=3396,freq=2.0), product of:
              0.36798662 = queryWeight, product of:
                4.3825483 = boost
                5.612423 = idf(docFreq=440, maxDocs=44421)
                0.014960804 = queryNorm
              0.49607277 = fieldWeight in 3396, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.612423 = idf(docFreq=440, maxDocs=44421)
                0.0625 = fieldNorm(doc=3396)
        0.16 = coord(4/25)
  5. Kalczynski, P.J.; Chou, A.: Temporal Document Retrieval Model for business news archives (2005) 0.07
    0.070371374 = sum of:
      0.070371374 = product of:
        0.5864281 = sum of:
          0.10440432 = weight(abstract_txt:representing in 2030) [ClassicSimilarity], result of:
            0.10440432 = score(doc=2030,freq=2.0), product of:
              0.16099265 = queryWeight, product of:
                1.8333429 = boost
                5.869585 = idf(docFreq=340, maxDocs=44421)
                0.014960804 = queryNorm
              0.64850366 = fieldWeight in 2030, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.869585 = idf(docFreq=340, maxDocs=44421)
                0.078125 = fieldNorm(doc=2030)
          0.10013905 = weight(abstract_txt:document in 2030) [ClassicSimilarity], result of:
            0.10013905 = score(doc=2030,freq=3.0), product of:
              0.17233585 = queryWeight, product of:
                2.6825233 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.014960804 = queryNorm
              0.5810692 = fieldWeight in 2030, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.078125 = fieldNorm(doc=2030)
          0.38188472 = weight(abstract_txt:expressions in 2030) [ClassicSimilarity], result of:
            0.38188472 = score(doc=2030,freq=5.0), product of:
              0.32235345 = queryWeight, product of:
                3.177258 = boost
                6.7814865 = idf(docFreq=136, maxDocs=44421)
                0.014960804 = queryNorm
              1.184677 = fieldWeight in 2030, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.7814865 = idf(docFreq=136, maxDocs=44421)
                0.078125 = fieldNorm(doc=2030)
        0.12 = coord(3/25)