Document (#32943)

Author
Hirao, T.
Okumura, M.
Yasuda, N.
Isozaki, H.
Title
Supervised automatic evaluation for summarization with voted regression model
Source
Information processing and management. 43(2007) no.6, S.1521-1535
Year
2007
Abstract
The high quality evaluation of generated summaries is needed if we are to improve automatic summarization systems. Although human evaluation provides better results than automatic evaluation methods, its cost is huge and it is difficult to reproduce the results. Therefore, we need an automatic method that simulates human evaluation if we are to improve our summarization system efficiently. Although automatic evaluation methods have been proposed, they are unreliable when used for individual summaries. To solve this problem, we propose a supervised automatic evaluation method based on a new regression model called the voted regression model (VRM). VRM has two characteristics: (1) model selection based on 'corrected AIC' to avoid multicollinearity, (2) voting by the selected models to alleviate the problem of overfitting. Evaluation results obtained for TSC3 and DUC2004 show that our method achieved error reductions of about 17-51% compared with conventional automatic evaluation methods. Moreover, our method obtained the highest correlation coefficients in several different experiments.
Theme
Automatisches Abstracting

Similar documents (content)

  1. Sjöbergh, J.: Older versions of the ROUGEeval summarization evaluation system were easier to fool (2007) 0.38
    0.38402238 = sum of:
      0.38402238 = product of:
        1.3715085 = sum of:
          0.048450075 = weight(abstract_txt:human in 1940) [ClassicSimilarity], result of:
            0.048450075 = score(doc=1940,freq=1.0), product of:
              0.09462635 = queryWeight, product of:
                1.365345 = boost
                4.681277 = idf(docFreq=1118, maxDocs=44421)
                0.014804897 = queryNorm
              0.5120146 = fieldWeight in 1940, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.681277 = idf(docFreq=1118, maxDocs=44421)
                0.109375 = fieldNorm(doc=1940)
          0.23284645 = weight(abstract_txt:summaries in 1940) [ClassicSimilarity], result of:
            0.23284645 = score(doc=1940,freq=2.0), product of:
              0.21388757 = queryWeight, product of:
                2.0527172 = boost
                7.0380287 = idf(docFreq=105, maxDocs=44421)
                0.014804897 = queryNorm
              1.0886395 = fieldWeight in 1940, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.0380287 = idf(docFreq=105, maxDocs=44421)
                0.109375 = fieldNorm(doc=1940)
          0.05967483 = weight(abstract_txt:model in 1940) [ClassicSimilarity], result of:
            0.05967483 = score(doc=1940,freq=1.0), product of:
              0.13698928 = queryWeight, product of:
                2.323243 = boost
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.014804897 = queryNorm
              0.4356168 = fieldWeight in 1940, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.109375 = fieldNorm(doc=1940)
          0.14896496 = weight(abstract_txt:method in 1940) [ClassicSimilarity], result of:
            0.14896496 = score(doc=1940,freq=3.0), product of:
              0.17478658 = queryWeight, product of:
                2.6242511 = boost
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.014804897 = queryNorm
              0.8522677 = fieldWeight in 1940, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.109375 = fieldNorm(doc=1940)
          0.36264655 = weight(abstract_txt:summarization in 1940) [ClassicSimilarity], result of:
            0.36264655 = score(doc=1940,freq=2.0), product of:
              0.32897174 = queryWeight, product of:
                3.1178935 = boost
                7.1267567 = idf(docFreq=96, maxDocs=44421)
                0.014804897 = queryNorm
              1.1023638 = fieldWeight in 1940, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.1267567 = idf(docFreq=96, maxDocs=44421)
                0.109375 = fieldNorm(doc=1940)
          0.3278783 = weight(abstract_txt:automatic in 1940) [ClassicSimilarity], result of:
            0.3278783 = score(doc=1940,freq=2.0), product of:
              0.40797815 = queryWeight, product of:
                5.3038206 = boost
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.014804897 = queryNorm
              0.8036663 = fieldWeight in 1940, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.109375 = fieldNorm(doc=1940)
          0.19104736 = weight(abstract_txt:evaluation in 1940) [ClassicSimilarity], result of:
            0.19104736 = score(doc=1940,freq=1.0), product of:
              0.3899246 = queryWeight, product of:
                5.879399 = boost
                4.479632 = idf(docFreq=1368, maxDocs=44421)
                0.014804897 = queryNorm
              0.48995975 = fieldWeight in 1940, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.479632 = idf(docFreq=1368, maxDocs=44421)
                0.109375 = fieldNorm(doc=1940)
        0.28 = coord(7/25)
    
  2. Ercan, G.; Cicekli, I.: Using lexical chains for keyword extraction (2007) 0.30
    0.29627785 = sum of:
      0.29627785 = product of:
        0.9258683 = sum of:
          0.05076855 = weight(abstract_txt:problem in 1951) [ClassicSimilarity], result of:
            0.05076855 = score(doc=1951,freq=2.0), product of:
              0.08586842 = queryWeight, product of:
                1.300628 = boost
                4.4593854 = idf(docFreq=1396, maxDocs=44421)
                0.014804897 = queryNorm
              0.5912366 = fieldWeight in 1951, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4593854 = idf(docFreq=1396, maxDocs=44421)
                0.09375 = fieldNorm(doc=1951)
          0.044233914 = weight(abstract_txt:although in 1951) [ClassicSimilarity], result of:
            0.044233914 = score(doc=1951,freq=1.0), product of:
              0.09869244 = queryWeight, product of:
                1.3943708 = boost
                4.780796 = idf(docFreq=1012, maxDocs=44421)
                0.014804897 = queryNorm
              0.44819963 = fieldWeight in 1951, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.780796 = idf(docFreq=1012, maxDocs=44421)
                0.09375 = fieldNorm(doc=1951)
          0.025560925 = weight(abstract_txt:results in 1951) [ClassicSimilarity], result of:
            0.025560925 = score(doc=1951,freq=1.0), product of:
              0.07837816 = queryWeight, product of:
                1.5218768 = boost
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.014804897 = queryNorm
              0.32612303 = fieldWeight in 1951, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.09375 = fieldNorm(doc=1951)
          0.07709671 = weight(abstract_txt:obtained in 1951) [ClassicSimilarity], result of:
            0.07709671 = score(doc=1951,freq=1.0), product of:
              0.14293465 = queryWeight, product of:
                1.6780508 = boost
                5.7534328 = idf(docFreq=382, maxDocs=44421)
                0.014804897 = queryNorm
              0.5393843 = fieldWeight in 1951, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7534328 = idf(docFreq=382, maxDocs=44421)
                0.09375 = fieldNorm(doc=1951)
          0.14112628 = weight(abstract_txt:summaries in 1951) [ClassicSimilarity], result of:
            0.14112628 = score(doc=1951,freq=1.0), product of:
              0.21388757 = queryWeight, product of:
                2.0527172 = boost
                7.0380287 = idf(docFreq=105, maxDocs=44421)
                0.014804897 = queryNorm
              0.6598152 = fieldWeight in 1951, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0380287 = idf(docFreq=105, maxDocs=44421)
                0.09375 = fieldNorm(doc=1951)
          0.16856064 = weight(abstract_txt:supervised in 1951) [ClassicSimilarity], result of:
            0.16856064 = score(doc=1951,freq=1.0), product of:
              0.24077854 = queryWeight, product of:
                2.1779366 = boost
                7.467361 = idf(docFreq=68, maxDocs=44421)
                0.014804897 = queryNorm
              0.7000651 = fieldWeight in 1951, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.467361 = idf(docFreq=68, maxDocs=44421)
                0.09375 = fieldNorm(doc=1951)
          0.21979702 = weight(abstract_txt:summarization in 1951) [ClassicSimilarity], result of:
            0.21979702 = score(doc=1951,freq=1.0), product of:
              0.32897174 = queryWeight, product of:
                3.1178935 = boost
                7.1267567 = idf(docFreq=96, maxDocs=44421)
                0.014804897 = queryNorm
              0.66813344 = fieldWeight in 1951, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.1267567 = idf(docFreq=96, maxDocs=44421)
                0.09375 = fieldNorm(doc=1951)
          0.19872425 = weight(abstract_txt:automatic in 1951) [ClassicSimilarity], result of:
            0.19872425 = score(doc=1951,freq=1.0), product of:
              0.40797815 = queryWeight, product of:
                5.3038206 = boost
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.014804897 = queryNorm
              0.48709533 = fieldWeight in 1951, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.09375 = fieldNorm(doc=1951)
        0.32 = coord(8/25)
    
  3. Vanderwende, L.; Suzuki, H.; Brockett, J.M.; Nenkova, A.: Beyond SumBasic : task-focused summarization with sentence simplification and lexical expansion (2007) 0.26
    0.25811002 = sum of:
      0.25811002 = product of:
        0.9218215 = sum of:
          0.05537152 = weight(abstract_txt:human in 1948) [ClassicSimilarity], result of:
            0.05537152 = score(doc=1948,freq=4.0), product of:
              0.09462635 = queryWeight, product of:
                1.365345 = boost
                4.681277 = idf(docFreq=1118, maxDocs=44421)
                0.014804897 = queryNorm
              0.5851596 = fieldWeight in 1948, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.681277 = idf(docFreq=1118, maxDocs=44421)
                0.0625 = fieldNorm(doc=1948)
          0.024099069 = weight(abstract_txt:results in 1948) [ClassicSimilarity], result of:
            0.024099069 = score(doc=1948,freq=2.0), product of:
              0.07837816 = queryWeight, product of:
                1.5218768 = boost
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.014804897 = queryNorm
              0.30747172 = fieldWeight in 1948, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.0625 = fieldNorm(doc=1948)
          0.16295858 = weight(abstract_txt:summaries in 1948) [ClassicSimilarity], result of:
            0.16295858 = score(doc=1948,freq=3.0), product of:
              0.21388757 = queryWeight, product of:
                2.0527172 = boost
                7.0380287 = idf(docFreq=105, maxDocs=44421)
                0.014804897 = queryNorm
              0.7618889 = fieldWeight in 1948, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.0380287 = idf(docFreq=105, maxDocs=44421)
                0.0625 = fieldNorm(doc=1948)
          0.049145687 = weight(abstract_txt:method in 1948) [ClassicSimilarity], result of:
            0.049145687 = score(doc=1948,freq=1.0), product of:
              0.17478658 = queryWeight, product of:
                2.6242511 = boost
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.014804897 = queryNorm
              0.2811754 = fieldWeight in 1948, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.0625 = fieldNorm(doc=1948)
          0.25379974 = weight(abstract_txt:summarization in 1948) [ClassicSimilarity], result of:
            0.25379974 = score(doc=1948,freq=3.0), product of:
              0.32897174 = queryWeight, product of:
                3.1178935 = boost
                7.1267567 = idf(docFreq=96, maxDocs=44421)
                0.014804897 = queryNorm
              0.77149403 = fieldWeight in 1948, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.1267567 = idf(docFreq=96, maxDocs=44421)
                0.0625 = fieldNorm(doc=1948)
          0.18735902 = weight(abstract_txt:automatic in 1948) [ClassicSimilarity], result of:
            0.18735902 = score(doc=1948,freq=2.0), product of:
              0.40797815 = queryWeight, product of:
                5.3038206 = boost
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.014804897 = queryNorm
              0.45923787 = fieldWeight in 1948, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.0625 = fieldNorm(doc=1948)
          0.18908782 = weight(abstract_txt:evaluation in 1948) [ClassicSimilarity], result of:
            0.18908782 = score(doc=1948,freq=3.0), product of:
              0.3899246 = queryWeight, product of:
                5.879399 = boost
                4.479632 = idf(docFreq=1368, maxDocs=44421)
                0.014804897 = queryNorm
              0.48493436 = fieldWeight in 1948, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.479632 = idf(docFreq=1368, maxDocs=44421)
                0.0625 = fieldNorm(doc=1948)
        0.28 = coord(7/25)
    
  4. Saggion, H.; Lapalme, G.: Selective analysis for the automatic generation of summaries (2000) 0.25
    0.25175652 = sum of:
      0.25175652 = product of:
        0.89913046 = sum of:
          0.04152864 = weight(abstract_txt:human in 1132) [ClassicSimilarity], result of:
            0.04152864 = score(doc=1132,freq=1.0), product of:
              0.09462635 = queryWeight, product of:
                1.365345 = boost
                4.681277 = idf(docFreq=1118, maxDocs=44421)
                0.014804897 = queryNorm
              0.4388697 = fieldWeight in 1132, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.681277 = idf(docFreq=1118, maxDocs=44421)
                0.09375 = fieldNorm(doc=1132)
          0.025560925 = weight(abstract_txt:results in 1132) [ClassicSimilarity], result of:
            0.025560925 = score(doc=1132,freq=1.0), product of:
              0.07837816 = queryWeight, product of:
                1.5218768 = boost
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.014804897 = queryNorm
              0.32612303 = fieldWeight in 1132, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.09375 = fieldNorm(doc=1132)
          0.07709671 = weight(abstract_txt:obtained in 1132) [ClassicSimilarity], result of:
            0.07709671 = score(doc=1132,freq=1.0), product of:
              0.14293465 = queryWeight, product of:
                1.6780508 = boost
                5.7534328 = idf(docFreq=382, maxDocs=44421)
                0.014804897 = queryNorm
              0.5393843 = fieldWeight in 1132, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7534328 = idf(docFreq=382, maxDocs=44421)
                0.09375 = fieldNorm(doc=1132)
          0.14112628 = weight(abstract_txt:summaries in 1132) [ClassicSimilarity], result of:
            0.14112628 = score(doc=1132,freq=1.0), product of:
              0.21388757 = queryWeight, product of:
                2.0527172 = boost
                7.0380287 = idf(docFreq=105, maxDocs=44421)
                0.014804897 = queryNorm
              0.6598152 = fieldWeight in 1132, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0380287 = idf(docFreq=105, maxDocs=44421)
                0.09375 = fieldNorm(doc=1132)
          0.10425375 = weight(abstract_txt:method in 1132) [ClassicSimilarity], result of:
            0.10425375 = score(doc=1132,freq=2.0), product of:
              0.17478658 = queryWeight, product of:
                2.6242511 = boost
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.014804897 = queryNorm
              0.5964631 = fieldWeight in 1132, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.09375 = fieldNorm(doc=1132)
          0.31083992 = weight(abstract_txt:summarization in 1132) [ClassicSimilarity], result of:
            0.31083992 = score(doc=1132,freq=2.0), product of:
              0.32897174 = queryWeight, product of:
                3.1178935 = boost
                7.1267567 = idf(docFreq=96, maxDocs=44421)
                0.014804897 = queryNorm
              0.94488335 = fieldWeight in 1132, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.1267567 = idf(docFreq=96, maxDocs=44421)
                0.09375 = fieldNorm(doc=1132)
          0.19872425 = weight(abstract_txt:automatic in 1132) [ClassicSimilarity], result of:
            0.19872425 = score(doc=1132,freq=1.0), product of:
              0.40797815 = queryWeight, product of:
                5.3038206 = boost
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.014804897 = queryNorm
              0.48709533 = fieldWeight in 1132, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.09375 = fieldNorm(doc=1132)
        0.28 = coord(7/25)
    
  5. Oh, H.; Nam, S.; Zhu, Y.: Structured abstract summarization of scientific articles : summarization using full-text section information (2023) 0.24
    0.24275616 = sum of:
      0.24275616 = product of:
        0.758613 = sum of:
          0.032879677 = weight(abstract_txt:improve in 1890) [ClassicSimilarity], result of:
            0.032879677 = score(doc=1890,freq=1.0), product of:
              0.106118925 = queryWeight, product of:
                1.4458817 = boost
                4.9574084 = idf(docFreq=848, maxDocs=44421)
                0.014804897 = queryNorm
              0.30983803 = fieldWeight in 1890, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9574084 = idf(docFreq=848, maxDocs=44421)
                0.0625 = fieldNorm(doc=1890)
          0.024099069 = weight(abstract_txt:results in 1890) [ClassicSimilarity], result of:
            0.024099069 = score(doc=1890,freq=2.0), product of:
              0.07837816 = queryWeight, product of:
                1.5218768 = boost
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.014804897 = queryNorm
              0.30747172 = fieldWeight in 1890, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.0625 = fieldNorm(doc=1890)
          0.028797058 = weight(abstract_txt:methods in 1890) [ClassicSimilarity], result of:
            0.028797058 = score(doc=1890,freq=1.0), product of:
              0.11119971 = queryWeight, product of:
                1.8127328 = boost
                4.1434727 = idf(docFreq=1915, maxDocs=44421)
                0.014804897 = queryNorm
              0.25896704 = fieldWeight in 1890, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1434727 = idf(docFreq=1915, maxDocs=44421)
                0.0625 = fieldNorm(doc=1890)
          0.034099903 = weight(abstract_txt:model in 1890) [ClassicSimilarity], result of:
            0.034099903 = score(doc=1890,freq=1.0), product of:
              0.13698928 = queryWeight, product of:
                2.323243 = boost
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.014804897 = queryNorm
              0.24892388 = fieldWeight in 1890, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.0625 = fieldNorm(doc=1890)
          0.049145687 = weight(abstract_txt:method in 1890) [ClassicSimilarity], result of:
            0.049145687 = score(doc=1890,freq=1.0), product of:
              0.17478658 = queryWeight, product of:
                2.6242511 = boost
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.014804897 = queryNorm
              0.2811754 = fieldWeight in 1890, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.0625 = fieldNorm(doc=1890)
          0.2930627 = weight(abstract_txt:summarization in 1890) [ClassicSimilarity], result of:
            0.2930627 = score(doc=1890,freq=4.0), product of:
              0.32897174 = queryWeight, product of:
                3.1178935 = boost
                7.1267567 = idf(docFreq=96, maxDocs=44421)
                0.014804897 = queryNorm
              0.8908446 = fieldWeight in 1890, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.1267567 = idf(docFreq=96, maxDocs=44421)
                0.0625 = fieldNorm(doc=1890)
          0.18735902 = weight(abstract_txt:automatic in 1890) [ClassicSimilarity], result of:
            0.18735902 = score(doc=1890,freq=2.0), product of:
              0.40797815 = queryWeight, product of:
                5.3038206 = boost
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.014804897 = queryNorm
              0.45923787 = fieldWeight in 1890, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.0625 = fieldNorm(doc=1890)
          0.109169915 = weight(abstract_txt:evaluation in 1890) [ClassicSimilarity], result of:
            0.109169915 = score(doc=1890,freq=1.0), product of:
              0.3899246 = queryWeight, product of:
                5.879399 = boost
                4.479632 = idf(docFreq=1368, maxDocs=44421)
                0.014804897 = queryNorm
              0.279977 = fieldWeight in 1890, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.479632 = idf(docFreq=1368, maxDocs=44421)
                0.0625 = fieldNorm(doc=1890)
        0.32 = coord(8/25)