Document (#10862)

Author
Cohen, J.D.
Title
Highlights: language- and domain-independent automatic indexing terms for abstracting
Source
Journal of the American Society for Information Science. 46(1995) no.3, S.162-174
Year
1995
Abstract
Presents a model of drawing index terms from text. The approach uses no stop list, stemmer, or other language and domain specific component, allowing operation in any language or domain with only trivial modification. The method uses n-grams counts, achieving a function similar to, but more general than, a stemmer. The generated index terms, called 'highlights', are suitable for identifying the topic for perusal and selection. An extension is also described and demonstrated which selects index terms to represent a subset of documents, distinguishing them from the corpus. Presents some experimental results, showing operation in English, Spanish, German, Georgian, Russian and Japanese
Theme
Automatisches Indexieren

Similar documents (author)

  1. Cohen, W.W.: ¬The whirl approach to information integration (1998) 5.51
    5.506935 = sum of:
      5.506935 = weight(author_txt:cohen in 5638) [ClassicSimilarity], result of:
        5.506935 = fieldWeight in 5638, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.811096 = idf(docFreq=17, maxDocs=44421)
          0.625 = fieldNorm(doc=5638)
    
  2. Cohen, J.: ¬The hermeneutics of the reference question (1993) 5.51
    5.506935 = sum of:
      5.506935 = weight(author_txt:cohen in 7427) [ClassicSimilarity], result of:
        5.506935 = fieldWeight in 7427, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.811096 = idf(docFreq=17, maxDocs=44421)
          0.625 = fieldNorm(doc=7427)
    
  3. Cohen, P.: Different approaches to quality (1996) 5.51
    5.506935 = sum of:
      5.506935 = weight(author_txt:cohen in 359) [ClassicSimilarity], result of:
        5.506935 = fieldWeight in 359, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.811096 = idf(docFreq=17, maxDocs=44421)
          0.625 = fieldNorm(doc=359)
    
  4. Cohen, J.D.: Massive query resolution for rapid selective dissemination of information (1999) 5.51
    5.506935 = sum of:
      5.506935 = weight(author_txt:cohen in 4054) [ClassicSimilarity], result of:
        5.506935 = fieldWeight in 4054, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.811096 = idf(docFreq=17, maxDocs=44421)
          0.625 = fieldNorm(doc=4054)
    
  5. Cohen, M.: 99 philosophische Rätsel (2004) 5.51
    5.506935 = sum of:
      5.506935 = weight(author_txt:cohen in 2291) [ClassicSimilarity], result of:
        5.506935 = fieldWeight in 2291, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.811096 = idf(docFreq=17, maxDocs=44421)
          0.625 = fieldNorm(doc=2291)
    

Similar documents (content)

  1. Fautsch, C.; Savoy, J.: Algorithmic stemmers or morphological analysis? : an evaluation (2009) 0.10
    0.09676063 = sum of:
      0.09676063 = product of:
        0.604754 = sum of:
          0.10056283 = weight(abstract_txt:stop in 3950) [ClassicSimilarity], result of:
            0.10056283 = score(doc=3950,freq=1.0), product of:
              0.17100976 = queryWeight, product of:
                1.1818775 = boost
                7.5270805 = idf(docFreq=64, maxDocs=44421)
                0.019223033 = queryNorm
              0.58805317 = fieldWeight in 3950, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5270805 = idf(docFreq=64, maxDocs=44421)
                0.078125 = fieldNorm(doc=3950)
          0.07259567 = weight(abstract_txt:language in 3950) [ClassicSimilarity], result of:
            0.07259567 = score(doc=3950,freq=2.0), product of:
              0.15753128 = queryWeight, product of:
                1.9647442 = boost
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.019223033 = queryNorm
              0.46083337 = fieldWeight in 3950, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.078125 = fieldNorm(doc=3950)
          0.06236752 = weight(abstract_txt:terms in 3950) [ClassicSimilarity], result of:
            0.06236752 = score(doc=3950,freq=1.0), product of:
              0.19741866 = queryWeight, product of:
                2.5397215 = boost
                4.043712 = idf(docFreq=2116, maxDocs=44421)
                0.019223033 = queryNorm
              0.31591502 = fieldWeight in 3950, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.043712 = idf(docFreq=2116, maxDocs=44421)
                0.078125 = fieldNorm(doc=3950)
          0.36922792 = weight(abstract_txt:stemmer in 3950) [ClassicSimilarity], result of:
            0.36922792 = score(doc=3950,freq=1.0), product of:
              0.5127853 = queryWeight, product of:
                2.8943083 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.019223033 = queryNorm
              0.72004384 = fieldWeight in 3950, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.078125 = fieldNorm(doc=3950)
        0.16 = coord(4/25)
    
  2. Fox, B.; Fox, C.J.: Efficient stemmer generation (2002) 0.10
    0.096666045 = sum of:
      0.096666045 = product of:
        1.2083256 = sum of:
          0.052459367 = weight(abstract_txt:presents in 3585) [ClassicSimilarity], result of:
            0.052459367 = score(doc=3585,freq=1.0), product of:
              0.111567035 = queryWeight, product of:
                1.3500347 = boost
                4.299016 = idf(docFreq=1639, maxDocs=44421)
                0.019223033 = queryNorm
              0.4702049 = fieldWeight in 3585, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.299016 = idf(docFreq=1639, maxDocs=44421)
                0.109375 = fieldNorm(doc=3585)
          1.1558663 = weight(abstract_txt:stemmer in 3585) [ClassicSimilarity], result of:
            1.1558663 = score(doc=3585,freq=5.0), product of:
              0.5127853 = queryWeight, product of:
                2.8943083 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.019223033 = queryNorm
              2.254094 = fieldWeight in 3585, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.109375 = fieldNorm(doc=3585)
        0.08 = coord(2/25)
    
  3. Li, Q.; Chen, Y.P.; Myaeng, S.-H.; Jin, Y.; Kang, B.-Y.: Concept unification of terms in different languages via web mining for Information Retrieval (2009) 0.09
    0.0856595 = sum of:
      0.0856595 = product of:
        0.42829746 = sum of:
          0.065243304 = weight(abstract_txt:achieving in 215) [ClassicSimilarity], result of:
            0.065243304 = score(doc=215,freq=1.0), product of:
              0.14871675 = queryWeight, product of:
                1.1021531 = boost
                7.019336 = idf(docFreq=107, maxDocs=44421)
                0.019223033 = queryNorm
              0.4387085 = fieldWeight in 215, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.019336 = idf(docFreq=107, maxDocs=44421)
                0.0625 = fieldNorm(doc=215)
          0.08359581 = weight(abstract_txt:japanese in 215) [ClassicSimilarity], result of:
            0.08359581 = score(doc=215,freq=1.0), product of:
              0.17543878 = queryWeight, product of:
                1.1970844 = boost
                7.62393 = idf(docFreq=58, maxDocs=44421)
                0.019223033 = queryNorm
              0.47649562 = fieldWeight in 215, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.62393 = idf(docFreq=58, maxDocs=44421)
                0.0625 = fieldNorm(doc=215)
          0.08213262 = weight(abstract_txt:language in 215) [ClassicSimilarity], result of:
            0.08213262 = score(doc=215,freq=4.0), product of:
              0.15753128 = queryWeight, product of:
                1.9647442 = boost
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.019223033 = queryNorm
              0.52137345 = fieldWeight in 215, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.0625 = fieldNorm(doc=215)
          0.08575932 = weight(abstract_txt:index in 215) [ClassicSimilarity], result of:
            0.08575932 = score(doc=215,freq=2.0), product of:
              0.20427752 = queryWeight, product of:
                2.2373447 = boost
                4.7496953 = idf(docFreq=1044, maxDocs=44421)
                0.019223033 = queryNorm
              0.41981772 = fieldWeight in 215, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.7496953 = idf(docFreq=1044, maxDocs=44421)
                0.0625 = fieldNorm(doc=215)
          0.11156641 = weight(abstract_txt:terms in 215) [ClassicSimilarity], result of:
            0.11156641 = score(doc=215,freq=5.0), product of:
              0.19741866 = queryWeight, product of:
                2.5397215 = boost
                4.043712 = idf(docFreq=2116, maxDocs=44421)
                0.019223033 = queryNorm
              0.56512594 = fieldWeight in 215, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.043712 = idf(docFreq=2116, maxDocs=44421)
                0.0625 = fieldNorm(doc=215)
        0.2 = coord(5/25)
    
  4. King, S.V.: ELNET: the electronic library database system (1992) 0.08
    0.08133732 = sum of:
      0.08133732 = product of:
        0.50835824 = sum of:
          0.20688906 = weight(abstract_txt:japanese in 4262) [ClassicSimilarity], result of:
            0.20688906 = score(doc=4262,freq=2.0), product of:
              0.17543878 = queryWeight, product of:
                1.1970844 = boost
                7.62393 = idf(docFreq=58, maxDocs=44421)
                0.019223033 = queryNorm
              1.1792665 = fieldWeight in 4262, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.62393 = idf(docFreq=58, maxDocs=44421)
                0.109375 = fieldNorm(doc=4262)
          0.07186604 = weight(abstract_txt:language in 4262) [ClassicSimilarity], result of:
            0.07186604 = score(doc=4262,freq=1.0), product of:
              0.15753128 = queryWeight, product of:
                1.9647442 = boost
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.019223033 = queryNorm
              0.45620176 = fieldWeight in 4262, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.109375 = fieldNorm(doc=4262)
          0.10612175 = weight(abstract_txt:index in 4262) [ClassicSimilarity], result of:
            0.10612175 = score(doc=4262,freq=1.0), product of:
              0.20427752 = queryWeight, product of:
                2.2373447 = boost
                4.7496953 = idf(docFreq=1044, maxDocs=44421)
                0.019223033 = queryNorm
              0.51949793 = fieldWeight in 4262, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7496953 = idf(docFreq=1044, maxDocs=44421)
                0.109375 = fieldNorm(doc=4262)
          0.12348138 = weight(abstract_txt:terms in 4262) [ClassicSimilarity], result of:
            0.12348138 = score(doc=4262,freq=2.0), product of:
              0.19741866 = queryWeight, product of:
                2.5397215 = boost
                4.043712 = idf(docFreq=2116, maxDocs=44421)
                0.019223033 = queryNorm
              0.62547976 = fieldWeight in 4262, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.043712 = idf(docFreq=2116, maxDocs=44421)
                0.109375 = fieldNorm(doc=4262)
        0.16 = coord(4/25)
    
  5. Panzer, M.: Dewey: how to make it work for you (2013) 0.08
    0.07927265 = sum of:
      0.07927265 = product of:
        0.49545407 = sum of:
          0.14078797 = weight(abstract_txt:stop in 797) [ClassicSimilarity], result of:
            0.14078797 = score(doc=797,freq=1.0), product of:
              0.17100976 = queryWeight, product of:
                1.1818775 = boost
                7.5270805 = idf(docFreq=64, maxDocs=44421)
                0.019223033 = queryNorm
              0.82327443 = fieldWeight in 797, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5270805 = idf(docFreq=64, maxDocs=44421)
                0.109375 = fieldNorm(doc=797)
          0.16122982 = weight(abstract_txt:highlights in 797) [ClassicSimilarity], result of:
            0.16122982 = score(doc=797,freq=1.0), product of:
              0.23584 = queryWeight, product of:
                1.9628438 = boost
                6.250429 = idf(docFreq=232, maxDocs=44421)
                0.019223033 = queryNorm
              0.6836407 = fieldWeight in 797, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.250429 = idf(docFreq=232, maxDocs=44421)
                0.109375 = fieldNorm(doc=797)
          0.10612175 = weight(abstract_txt:index in 797) [ClassicSimilarity], result of:
            0.10612175 = score(doc=797,freq=1.0), product of:
              0.20427752 = queryWeight, product of:
                2.2373447 = boost
                4.7496953 = idf(docFreq=1044, maxDocs=44421)
                0.019223033 = queryNorm
              0.51949793 = fieldWeight in 797, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7496953 = idf(docFreq=1044, maxDocs=44421)
                0.109375 = fieldNorm(doc=797)
          0.087314524 = weight(abstract_txt:terms in 797) [ClassicSimilarity], result of:
            0.087314524 = score(doc=797,freq=1.0), product of:
              0.19741866 = queryWeight, product of:
                2.5397215 = boost
                4.043712 = idf(docFreq=2116, maxDocs=44421)
                0.019223033 = queryNorm
              0.442281 = fieldWeight in 797, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.043712 = idf(docFreq=2116, maxDocs=44421)
                0.109375 = fieldNorm(doc=797)
        0.16 = coord(4/25)