Document (#32952)

Author
Ercan, G.
Cicekli, I.
Title
Using lexical chains for keyword extraction
Source
Information processing and management. 43(2007) no.6, S.1705-1714
Year
2007
Abstract
Keywords can be considered as condensed versions of documents and short forms of their summaries. In this paper, the problem of automatic extraction of keywords from documents is treated as a supervised learning task. A lexical chain holds a set of semantically related words of a text and it can be said that a lexical chain represents the semantic content of a portion of the text. Although lexical chains have been extensively used in text summarization, their usage for keyword extraction problem has not been fully investigated. In this paper, a keyword extraction technique that uses lexical chains is described, and encouraging results are obtained.
Theme
Automatisches Abstracting

Similar documents (content)

  1. Wang, F.L.; Yang, C.C.: ¬The impact analysis of language differences on an automatic multilingual text summarization system (2006) 0.17
    0.17104879 = sum of:
      0.17104879 = product of:
        0.61088854 = sum of:
          0.1171527 = weight(abstract_txt:summarization in 49) [ClassicSimilarity], result of:
            0.1171527 = score(doc=49,freq=5.0), product of:
              0.11762384 = queryWeight, product of:
                1.1402028 = boost
                7.1267567 = idf(docFreq=96, maxDocs=44421)
                0.014475092 = queryNorm
              0.9959945 = fieldWeight in 49, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.1267567 = idf(docFreq=96, maxDocs=44421)
                0.0625 = fieldNorm(doc=49)
          0.023675872 = weight(abstract_txt:been in 49) [ClassicSimilarity], result of:
            0.023675872 = score(doc=49,freq=3.0), product of:
              0.06050957 = queryWeight, product of:
                1.1565421 = boost
                3.614442 = idf(docFreq=3251, maxDocs=44421)
                0.014475092 = queryNorm
              0.39127484 = fieldWeight in 49, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.614442 = idf(docFreq=3251, maxDocs=44421)
                0.0625 = fieldNorm(doc=49)
          0.035149887 = weight(abstract_txt:documents in 49) [ClassicSimilarity], result of:
            0.035149887 = score(doc=49,freq=3.0), product of:
              0.07874736 = queryWeight, product of:
                1.3193724 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.014475092 = queryNorm
              0.4463627 = fieldWeight in 49, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.0625 = fieldNorm(doc=49)
          0.025671223 = weight(abstract_txt:problem in 49) [ClassicSimilarity], result of:
            0.025671223 = score(doc=49,freq=1.0), product of:
              0.09210677 = queryWeight, product of:
                1.4269054 = boost
                4.4593854 = idf(docFreq=1396, maxDocs=44421)
                0.014475092 = queryNorm
              0.2787116 = fieldWeight in 49, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4593854 = idf(docFreq=1396, maxDocs=44421)
                0.0625 = fieldNorm(doc=49)
          0.07018058 = weight(abstract_txt:text in 49) [ClassicSimilarity], result of:
            0.07018058 = score(doc=49,freq=6.0), product of:
              0.11344493 = queryWeight, product of:
                1.9394902 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.014475092 = queryNorm
              0.61863124 = fieldWeight in 49, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0625 = fieldNorm(doc=49)
          0.13745514 = weight(abstract_txt:extraction in 49) [ClassicSimilarity], result of:
            0.13745514 = score(doc=49,freq=1.0), product of:
              0.3551767 = queryWeight, product of:
                3.9626584 = boost
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.014475092 = queryNorm
              0.38700494 = fieldWeight in 49, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.0625 = fieldNorm(doc=49)
          0.20160314 = weight(abstract_txt:lexical in 49) [ClassicSimilarity], result of:
            0.20160314 = score(doc=49,freq=1.0), product of:
              0.49389964 = queryWeight, product of:
                5.224428 = boost
                6.5309834 = idf(docFreq=175, maxDocs=44421)
                0.014475092 = queryNorm
              0.40818647 = fieldWeight in 49, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5309834 = idf(docFreq=175, maxDocs=44421)
                0.0625 = fieldNorm(doc=49)
        0.28 = coord(7/25)
    
  2. Yang, C.C.; Wang, F.L.: Hierarchical summarization of large documents (2008) 0.15
    0.15357824 = sum of:
      0.15357824 = product of:
        0.5484937 = sum of:
          0.050459698 = weight(abstract_txt:summaries in 2719) [ClassicSimilarity], result of:
            0.050459698 = score(doc=2719,freq=1.0), product of:
              0.11471325 = queryWeight, product of:
                1.1260073 = boost
                7.0380287 = idf(docFreq=105, maxDocs=44421)
                0.014475092 = queryNorm
              0.4398768 = fieldWeight in 2719, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0380287 = idf(docFreq=105, maxDocs=44421)
                0.0625 = fieldNorm(doc=2719)
          0.13861695 = weight(abstract_txt:summarization in 2719) [ClassicSimilarity], result of:
            0.13861695 = score(doc=2719,freq=7.0), product of:
              0.11762384 = queryWeight, product of:
                1.1402028 = boost
                7.1267567 = idf(docFreq=96, maxDocs=44421)
                0.014475092 = queryNorm
              1.1784766 = fieldWeight in 2719, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                7.1267567 = idf(docFreq=96, maxDocs=44421)
                0.0625 = fieldNorm(doc=2719)
          0.019331267 = weight(abstract_txt:been in 2719) [ClassicSimilarity], result of:
            0.019331267 = score(doc=2719,freq=2.0), product of:
              0.06050957 = queryWeight, product of:
                1.1565421 = boost
                3.614442 = idf(docFreq=3251, maxDocs=44421)
                0.014475092 = queryNorm
              0.31947455 = fieldWeight in 2719, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.614442 = idf(docFreq=3251, maxDocs=44421)
                0.0625 = fieldNorm(doc=2719)
          0.028699761 = weight(abstract_txt:documents in 2719) [ClassicSimilarity], result of:
            0.028699761 = score(doc=2719,freq=2.0), product of:
              0.07874736 = queryWeight, product of:
                1.3193724 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.014475092 = queryNorm
              0.3644536 = fieldWeight in 2719, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.0625 = fieldNorm(doc=2719)
          0.08834404 = weight(abstract_txt:condensed in 2719) [ClassicSimilarity], result of:
            0.08834404 = score(doc=2719,freq=1.0), product of:
              0.16663593 = queryWeight, product of:
                1.3571215 = boost
                8.482592 = idf(docFreq=24, maxDocs=44421)
                0.014475092 = queryNorm
              0.530162 = fieldWeight in 2719, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.482592 = idf(docFreq=24, maxDocs=44421)
                0.0625 = fieldNorm(doc=2719)
          0.0286511 = weight(abstract_txt:text in 2719) [ClassicSimilarity], result of:
            0.0286511 = score(doc=2719,freq=1.0), product of:
              0.11344493 = queryWeight, product of:
                1.9394902 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.014475092 = queryNorm
              0.25255513 = fieldWeight in 2719, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0625 = fieldNorm(doc=2719)
          0.19439091 = weight(abstract_txt:extraction in 2719) [ClassicSimilarity], result of:
            0.19439091 = score(doc=2719,freq=2.0), product of:
              0.3551767 = queryWeight, product of:
                3.9626584 = boost
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.014475092 = queryNorm
              0.5473076 = fieldWeight in 2719, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.0625 = fieldNorm(doc=2719)
        0.28 = coord(7/25)
    
  3. Naing, M.-M.; Lim, E.-P.; Chiang, R.H.L.: Extracting link chains of relationship instances from a Web site (2006) 0.15
    0.15110914 = sum of:
      0.15110914 = product of:
        0.94443214 = sum of:
          0.03208903 = weight(abstract_txt:problem in 111) [ClassicSimilarity], result of:
            0.03208903 = score(doc=111,freq=1.0), product of:
              0.09210677 = queryWeight, product of:
                1.4269054 = boost
                4.4593854 = idf(docFreq=1396, maxDocs=44421)
                0.014475092 = queryNorm
              0.34838948 = fieldWeight in 111, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4593854 = idf(docFreq=1396, maxDocs=44421)
                0.078125 = fieldNorm(doc=111)
          0.23741521 = weight(abstract_txt:chain in 111) [ClassicSimilarity], result of:
            0.23741521 = score(doc=111,freq=3.0), product of:
              0.2424855 = queryWeight, product of:
                2.31522 = boost
                7.2355595 = idf(docFreq=86, maxDocs=44421)
                0.014475092 = queryNorm
              0.97909033 = fieldWeight in 111, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.2355595 = idf(docFreq=86, maxDocs=44421)
                0.078125 = fieldNorm(doc=111)
          0.34363782 = weight(abstract_txt:extraction in 111) [ClassicSimilarity], result of:
            0.34363782 = score(doc=111,freq=4.0), product of:
              0.3551767 = queryWeight, product of:
                3.9626584 = boost
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.014475092 = queryNorm
              0.96751237 = fieldWeight in 111, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.078125 = fieldNorm(doc=111)
          0.33129007 = weight(abstract_txt:chains in 111) [ClassicSimilarity], result of:
            0.33129007 = score(doc=111,freq=1.0), product of:
              0.49990773 = queryWeight, product of:
                4.0713644 = boost
                8.482592 = idf(docFreq=24, maxDocs=44421)
                0.014475092 = queryNorm
              0.66270244 = fieldWeight in 111, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.482592 = idf(docFreq=24, maxDocs=44421)
                0.078125 = fieldNorm(doc=111)
        0.16 = coord(4/25)
    
  4. Tseng, Y.-H.: Keyword extraction techniques and relevance feedback (1997) 0.15
    0.14899576 = sum of:
      0.14899576 = product of:
        0.7449788 = sum of:
          0.011335761 = weight(abstract_txt:their in 2830) [ClassicSimilarity], result of:
            0.011335761 = score(doc=2830,freq=1.0), product of:
              0.046027876 = queryWeight, product of:
                1.008695 = boost
                3.1523883 = idf(docFreq=5161, maxDocs=44421)
                0.014475092 = queryNorm
              0.24628034 = fieldWeight in 2830, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1523883 = idf(docFreq=5161, maxDocs=44421)
                0.078125 = fieldNorm(doc=2830)
          0.03208903 = weight(abstract_txt:problem in 2830) [ClassicSimilarity], result of:
            0.03208903 = score(doc=2830,freq=1.0), product of:
              0.09210677 = queryWeight, product of:
                1.4269054 = boost
                4.4593854 = idf(docFreq=1396, maxDocs=44421)
                0.014475092 = queryNorm
              0.34838948 = fieldWeight in 2830, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4593854 = idf(docFreq=1396, maxDocs=44421)
                0.078125 = fieldNorm(doc=2830)
          0.078331195 = weight(abstract_txt:keywords in 2830) [ClassicSimilarity], result of:
            0.078331195 = score(doc=2830,freq=1.0), product of:
              0.16698481 = queryWeight, product of:
                1.9212677 = boost
                6.004374 = idf(docFreq=297, maxDocs=44421)
                0.014475092 = queryNorm
              0.4690917 = fieldWeight in 2830, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.004374 = idf(docFreq=297, maxDocs=44421)
                0.078125 = fieldNorm(doc=2830)
          0.23902398 = weight(abstract_txt:keyword in 2830) [ClassicSimilarity], result of:
            0.23902398 = score(doc=2830,freq=4.0), product of:
              0.25333306 = queryWeight, product of:
                2.8982842 = boost
                6.038507 = idf(docFreq=287, maxDocs=44421)
                0.014475092 = queryNorm
              0.94351673 = fieldWeight in 2830, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.038507 = idf(docFreq=287, maxDocs=44421)
                0.078125 = fieldNorm(doc=2830)
          0.3841988 = weight(abstract_txt:extraction in 2830) [ClassicSimilarity], result of:
            0.3841988 = score(doc=2830,freq=5.0), product of:
              0.3551767 = queryWeight, product of:
                3.9626584 = boost
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.014475092 = queryNorm
              1.0817118 = fieldWeight in 2830, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.078125 = fieldNorm(doc=2830)
        0.2 = coord(5/25)
    
  5. Morris, J.: Individual differences in the interpretation of text : implications for information science (2009) 0.14
    0.14373639 = sum of:
      0.14373639 = product of:
        0.8983525 = sum of:
          0.05895103 = weight(abstract_txt:semantically in 305) [ClassicSimilarity], result of:
            0.05895103 = score(doc=305,freq=1.0), product of:
              0.10965744 = queryWeight, product of:
                1.1009141 = boost
                6.881186 = idf(docFreq=123, maxDocs=44421)
                0.014475092 = queryNorm
              0.53759265 = fieldWeight in 305, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.881186 = idf(docFreq=123, maxDocs=44421)
                0.078125 = fieldNorm(doc=305)
          0.07162775 = weight(abstract_txt:text in 305) [ClassicSimilarity], result of:
            0.07162775 = score(doc=305,freq=4.0), product of:
              0.11344493 = queryWeight, product of:
                1.9394902 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.014475092 = queryNorm
              0.6313878 = fieldWeight in 305, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.078125 = fieldNorm(doc=305)
          0.33129007 = weight(abstract_txt:chains in 305) [ClassicSimilarity], result of:
            0.33129007 = score(doc=305,freq=1.0), product of:
              0.49990773 = queryWeight, product of:
                4.0713644 = boost
                8.482592 = idf(docFreq=24, maxDocs=44421)
                0.014475092 = queryNorm
              0.66270244 = fieldWeight in 305, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.482592 = idf(docFreq=24, maxDocs=44421)
                0.078125 = fieldNorm(doc=305)
          0.43648362 = weight(abstract_txt:lexical in 305) [ClassicSimilarity], result of:
            0.43648362 = score(doc=305,freq=3.0), product of:
              0.49389964 = queryWeight, product of:
                5.224428 = boost
                6.5309834 = idf(docFreq=175, maxDocs=44421)
                0.014475092 = queryNorm
              0.8837496 = fieldWeight in 305, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.5309834 = idf(docFreq=175, maxDocs=44421)
                0.078125 = fieldNorm(doc=305)
        0.16 = coord(4/25)