Document (#21504)

Author
Frakes, W.B.
Title
Stemming algorithms
Source
Information retrieval: data structures and algorithms. Ed.: W.B. Frakes u. R. Baeza-Yates
Imprint
Englewood Cliffs, NJ : Prentice Hall
Year
1992
Pages
S.131-160
Abstract
Desribes stemming algorithms - programs that relate morphologically similar indexing and search terms. Stemming is used to improve retrieval effectiveness and to reduce the size of indexing files. Several approaches to stemming are describes - table lookup, affix removal, successor variety, and n-gram. empirical studies of stemming are summarized. The Porter stemmer is described in detail, and a full implementation in C is presented
Theme
Computerlinguistik
Retrievalalgorithmen

Similar documents (content)

  1. Ahmad, F.; Yusoff, M.; Sembok, T.M.T.: Experiments with a stemming algorithm for Malay words (1996) 0.27
    0.2695255 = sum of:
      0.2695255 = product of:
        1.1230229 = sum of:
          0.03005778 = weight(abstract_txt:improve in 6572) [ClassicSimilarity], result of:
            0.03005778 = score(doc=6572,freq=1.0), product of:
              0.064674184 = queryWeight, product of:
                4.9574084 = idf(docFreq=848, maxDocs=44421)
                0.013045967 = queryNorm
              0.46475703 = fieldWeight in 6572, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9574084 = idf(docFreq=848, maxDocs=44421)
                0.09375 = fieldNorm(doc=6572)
          0.032653105 = weight(abstract_txt:effectiveness in 6572) [ClassicSimilarity], result of:
            0.032653105 = score(doc=6572,freq=1.0), product of:
              0.068345405 = queryWeight, product of:
                1.0279907 = boost
                5.0961695 = idf(docFreq=738, maxDocs=44421)
                0.013045967 = queryNorm
              0.4777659 = fieldWeight in 6572, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0961695 = idf(docFreq=738, maxDocs=44421)
                0.09375 = fieldNorm(doc=6572)
          0.061261095 = weight(abstract_txt:reduce in 6572) [ClassicSimilarity], result of:
            0.061261095 = score(doc=6572,freq=1.0), product of:
              0.10396396 = queryWeight, product of:
                1.2678735 = boost
                6.285367 = idf(docFreq=224, maxDocs=44421)
                0.013045967 = queryNorm
              0.5892532 = fieldWeight in 6572, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.285367 = idf(docFreq=224, maxDocs=44421)
                0.09375 = fieldNorm(doc=6572)
          0.0738981 = weight(abstract_txt:relate in 6572) [ClassicSimilarity], result of:
            0.0738981 = score(doc=6572,freq=1.0), product of:
              0.117809914 = queryWeight, product of:
                1.3496633 = boost
                6.690832 = idf(docFreq=149, maxDocs=44421)
                0.013045967 = queryNorm
              0.6272655 = fieldWeight in 6572, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.690832 = idf(docFreq=149, maxDocs=44421)
                0.09375 = fieldNorm(doc=6572)
          0.040624056 = weight(abstract_txt:indexing in 6572) [ClassicSimilarity], result of:
            0.040624056 = score(doc=6572,freq=1.0), product of:
              0.09960746 = queryWeight, product of:
                1.755074 = boost
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.013045967 = queryNorm
              0.4078415 = fieldWeight in 6572, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.09375 = fieldNorm(doc=6572)
          0.8845288 = weight(abstract_txt:stemming in 6572) [ClassicSimilarity], result of:
            0.8845288 = score(doc=6572,freq=3.0), product of:
              0.7308876 = queryWeight, product of:
                7.5170045 = boost
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.013045967 = queryNorm
              1.2102119 = fieldWeight in 6572, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.09375 = fieldNorm(doc=6572)
        0.24 = coord(6/25)
    
  2. Dolamic, L.; Savoy, J.: Indexing and searching strategies for the Russian language (2009) 0.24
    0.23626883 = sum of:
      0.23626883 = product of:
        1.1813442 = sum of:
          0.021768736 = weight(abstract_txt:effectiveness in 288) [ClassicSimilarity], result of:
            0.021768736 = score(doc=288,freq=1.0), product of:
              0.068345405 = queryWeight, product of:
                1.0279907 = boost
                5.0961695 = idf(docFreq=738, maxDocs=44421)
                0.013045967 = queryNorm
              0.3185106 = fieldWeight in 288, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0961695 = idf(docFreq=738, maxDocs=44421)
                0.0625 = fieldNorm(doc=288)
          0.1164453 = weight(abstract_txt:gram in 288) [ClassicSimilarity], result of:
            0.1164453 = score(doc=288,freq=2.0), product of:
              0.16591737 = queryWeight, product of:
                1.6016973 = boost
                7.9402676 = idf(docFreq=42, maxDocs=44421)
                0.013045967 = queryNorm
              0.7018271 = fieldWeight in 288, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.9402676 = idf(docFreq=42, maxDocs=44421)
                0.0625 = fieldNorm(doc=288)
          0.027082704 = weight(abstract_txt:indexing in 288) [ClassicSimilarity], result of:
            0.027082704 = score(doc=288,freq=1.0), product of:
              0.09960746 = queryWeight, product of:
                1.755074 = boost
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.013045967 = queryNorm
              0.27189434 = fieldWeight in 288, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.0625 = fieldNorm(doc=288)
          0.1821056 = weight(abstract_txt:stemmer in 288) [ClassicSimilarity], result of:
            0.1821056 = score(doc=288,freq=2.0), product of:
              0.22354214 = queryWeight, product of:
                1.8591491 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.013045967 = queryNorm
              0.8146366 = fieldWeight in 288, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.0625 = fieldNorm(doc=288)
          0.8339418 = weight(abstract_txt:stemming in 288) [ClassicSimilarity], result of:
            0.8339418 = score(doc=288,freq=6.0), product of:
              0.7308876 = queryWeight, product of:
                7.5170045 = boost
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.013045967 = queryNorm
              1.1409987 = fieldWeight in 288, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.0625 = fieldNorm(doc=288)
        0.2 = coord(5/25)
    
  3. Paice, C.D.: Method for evaluation of stemming algorithms based on error counting (1996) 0.23
    0.22650263 = sum of:
      0.22650263 = product of:
        1.4156414 = sum of:
          0.038095288 = weight(abstract_txt:effectiveness in 5867) [ClassicSimilarity], result of:
            0.038095288 = score(doc=5867,freq=1.0), product of:
              0.068345405 = queryWeight, product of:
                1.0279907 = boost
                5.0961695 = idf(docFreq=738, maxDocs=44421)
                0.013045967 = queryNorm
              0.55739355 = fieldWeight in 5867, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0961695 = idf(docFreq=738, maxDocs=44421)
                0.109375 = fieldNorm(doc=5867)
          0.23898375 = weight(abstract_txt:porter in 5867) [ClassicSimilarity], result of:
            0.23898375 = score(doc=5867,freq=1.0), product of:
              0.23247382 = queryWeight, product of:
                1.8959267 = boost
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.013045967 = queryNorm
              1.0280029 = fieldWeight in 5867, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.109375 = fieldNorm(doc=5867)
          0.10661203 = weight(abstract_txt:algorithms in 5867) [ClassicSimilarity], result of:
            0.10661203 = score(doc=5867,freq=1.0), product of:
              0.17100519 = queryWeight, product of:
                2.29961 = boost
                5.7000527 = idf(docFreq=403, maxDocs=44421)
                0.013045967 = queryNorm
              0.62344325 = fieldWeight in 5867, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7000527 = idf(docFreq=403, maxDocs=44421)
                0.109375 = fieldNorm(doc=5867)
          1.0319504 = weight(abstract_txt:stemming in 5867) [ClassicSimilarity], result of:
            1.0319504 = score(doc=5867,freq=3.0), product of:
              0.7308876 = queryWeight, product of:
                7.5170045 = boost
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.013045967 = queryNorm
              1.4119139 = fieldWeight in 5867, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.109375 = fieldNorm(doc=5867)
        0.16 = coord(4/25)
    
  4. Kraaij, W.; Pohlmann, R.: Evaluation of a Dutch stemming algorithm (1995) 0.21
    0.20816067 = sum of:
      0.20816067 = product of:
        1.3010042 = sum of:
          0.22763202 = weight(abstract_txt:stemmer in 5866) [ClassicSimilarity], result of:
            0.22763202 = score(doc=5866,freq=2.0), product of:
              0.22354214 = queryWeight, product of:
                1.8591491 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.013045967 = queryNorm
              1.0182958 = fieldWeight in 5866, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.078125 = fieldNorm(doc=5866)
          0.1655621 = weight(abstract_txt:morphologically in 5866) [ClassicSimilarity], result of:
            0.1655621 = score(doc=5866,freq=1.0), product of:
              0.22778289 = queryWeight, product of:
                1.8767009 = boost
                9.303573 = idf(docFreq=10, maxDocs=44421)
                0.013045967 = queryNorm
              0.7268416 = fieldWeight in 5866, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.303573 = idf(docFreq=10, maxDocs=44421)
                0.078125 = fieldNorm(doc=5866)
          0.17070268 = weight(abstract_txt:porter in 5866) [ClassicSimilarity], result of:
            0.17070268 = score(doc=5866,freq=1.0), product of:
              0.23247382 = queryWeight, product of:
                1.8959267 = boost
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.013045967 = queryNorm
              0.73428774 = fieldWeight in 5866, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.078125 = fieldNorm(doc=5866)
          0.73710734 = weight(abstract_txt:stemming in 5866) [ClassicSimilarity], result of:
            0.73710734 = score(doc=5866,freq=3.0), product of:
              0.7308876 = queryWeight, product of:
                7.5170045 = boost
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.013045967 = queryNorm
              1.0085099 = fieldWeight in 5866, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.078125 = fieldNorm(doc=5866)
        0.16 = coord(4/25)
    
  5. Fox, B.; Fox, C.J.: Efficient stemmer generation (2002) 0.21
    0.2052286 = sum of:
      0.2052286 = product of:
        1.2826787 = sum of:
          0.076384924 = weight(abstract_txt:files in 3585) [ClassicSimilarity], result of:
            0.076384924 = score(doc=3585,freq=2.0), product of:
              0.08625619 = queryWeight, product of:
                1.1548609 = boost
                5.7251167 = idf(docFreq=393, maxDocs=44421)
                0.013045967 = queryNorm
              0.8855588 = fieldWeight in 3585, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.7251167 = idf(docFreq=393, maxDocs=44421)
                0.109375 = fieldNorm(doc=3585)
          0.503885 = weight(abstract_txt:stemmer in 3585) [ClassicSimilarity], result of:
            0.503885 = score(doc=3585,freq=5.0), product of:
              0.22354214 = queryWeight, product of:
                1.8591491 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.013045967 = queryNorm
              2.254094 = fieldWeight in 3585, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.109375 = fieldNorm(doc=3585)
          0.10661203 = weight(abstract_txt:algorithms in 3585) [ClassicSimilarity], result of:
            0.10661203 = score(doc=3585,freq=1.0), product of:
              0.17100519 = queryWeight, product of:
                2.29961 = boost
                5.7000527 = idf(docFreq=403, maxDocs=44421)
                0.013045967 = queryNorm
              0.62344325 = fieldWeight in 3585, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7000527 = idf(docFreq=403, maxDocs=44421)
                0.109375 = fieldNorm(doc=3585)
          0.5957968 = weight(abstract_txt:stemming in 3585) [ClassicSimilarity], result of:
            0.5957968 = score(doc=3585,freq=1.0), product of:
              0.7308876 = queryWeight, product of:
                7.5170045 = boost
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.013045967 = queryNorm
              0.81516886 = fieldWeight in 3585, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.109375 = fieldNorm(doc=3585)
        0.16 = coord(4/25)