Document (#27586)

Author
Fox, B.
Fox, C.J.
Title
Efficient stemmer generation
Source
Information processing and management. 38(2002) no.4, S.547-558
Year
2002
Abstract
This paper presents an algorithm for generating stemmers from text stemmer specification files. A small study shows that the generated stemmers are computationally efficient, often running faster than stemmers custom written to implement particular stemming algorithms. The stemmer specification files are easily written and modified by non-programmers, making it much easier to create a stemmer, or tune a stemmer's performance, than would be the case with a custom stemmer program. Stemmer generation is thus also human-resource efficient.
Theme
Computerlinguistik

Similar documents (content)

  1. Kraaij, W.; Pohlmann, R.: Evaluation of a Dutch stemming algorithm (1995) 0.24
    0.24455753 = sum of:
      0.24455753 = product of:
        1.2227876 = sum of:
          0.020398838 = weight(abstract_txt:generated in 5866) [ClassicSimilarity], result of:
            0.020398838 = score(doc=5866,freq=1.0), product of:
              0.047395196 = queryWeight, product of:
                1.0323024 = boost
                5.509105 = idf(docFreq=488, maxDocs=44421)
                0.008333863 = queryNorm
              0.43039885 = fieldWeight in 5866, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.509105 = idf(docFreq=488, maxDocs=44421)
                0.078125 = fieldNorm(doc=5866)
          0.0320367 = weight(abstract_txt:algorithm in 5866) [ClassicSimilarity], result of:
            0.0320367 = score(doc=5866,freq=2.0), product of:
              0.050825994 = queryWeight, product of:
                1.0690123 = boost
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.008333863 = queryNorm
              0.63032115 = fieldWeight in 5866, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.078125 = fieldNorm(doc=5866)
          0.08748052 = weight(abstract_txt:stemming in 5866) [ClassicSimilarity], result of:
            0.08748052 = score(doc=5866,freq=3.0), product of:
              0.08674236 = queryWeight, product of:
                1.3965464 = boost
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.008333863 = queryNorm
              1.0085099 = fieldWeight in 5866, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.078125 = fieldNorm(doc=5866)
          0.27240473 = weight(abstract_txt:stemmers in 5866) [ClassicSimilarity], result of:
            0.27240473 = score(doc=5866,freq=1.0), product of:
              0.384752 = queryWeight, product of:
                5.094374 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.008333863 = queryNorm
              0.7080008 = fieldWeight in 5866, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.078125 = fieldNorm(doc=5866)
          0.8104668 = weight(abstract_txt:stemmer in 5866) [ClassicSimilarity], result of:
            0.8104668 = score(doc=5866,freq=2.0), product of:
              0.7959051 = queryWeight, product of:
                10.362059 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.008333863 = queryNorm
              1.0182958 = fieldWeight in 5866, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.078125 = fieldNorm(doc=5866)
        0.2 = coord(5/25)
    
  2. Frakes, W.B.: Stemming algorithms (1992) 0.19
    0.1900754 = sum of:
      0.1900754 = product of:
        1.1879712 = sum of:
          0.036150876 = weight(abstract_txt:algorithms in 4503) [ClassicSimilarity], result of:
            0.036150876 = score(doc=4503,freq=1.0), product of:
              0.050737604 = queryWeight, product of:
                1.0680823 = boost
                5.7000527 = idf(docFreq=403, maxDocs=44421)
                0.008333863 = queryNorm
              0.7125066 = fieldWeight in 4503, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7000527 = idf(docFreq=403, maxDocs=44421)
                0.125 = fieldNorm(doc=4503)
          0.16162209 = weight(abstract_txt:stemming in 4503) [ClassicSimilarity], result of:
            0.16162209 = score(doc=4503,freq=4.0), product of:
              0.08674236 = queryWeight, product of:
                1.3965464 = boost
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.008333863 = queryNorm
              1.8632431 = fieldWeight in 4503, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.125 = fieldNorm(doc=4503)
          0.07325972 = weight(abstract_txt:files in 4503) [ClassicSimilarity], result of:
            0.07325972 = score(doc=4503,freq=1.0), product of:
              0.10236958 = queryWeight, product of:
                2.1455576 = boost
                5.7251167 = idf(docFreq=393, maxDocs=44421)
                0.008333863 = queryNorm
              0.7156396 = fieldWeight in 4503, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7251167 = idf(docFreq=393, maxDocs=44421)
                0.125 = fieldNorm(doc=4503)
          0.91693854 = weight(abstract_txt:stemmer in 4503) [ClassicSimilarity], result of:
            0.91693854 = score(doc=4503,freq=1.0), product of:
              0.7959051 = queryWeight, product of:
                10.362059 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.008333863 = queryNorm
              1.1520702 = fieldWeight in 4503, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.125 = fieldNorm(doc=4503)
        0.16 = coord(4/25)
    
  3. Dolamic, L.; Savoy, J.: Indexing and searching strategies for the Russian language (2009) 0.17
    0.17207694 = sum of:
      0.17207694 = product of:
        1.0754809 = sum of:
          0.09897292 = weight(abstract_txt:stemming in 288) [ClassicSimilarity], result of:
            0.09897292 = score(doc=288,freq=6.0), product of:
              0.08674236 = queryWeight, product of:
                1.3965464 = boost
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.008333863 = queryNorm
              1.1409987 = fieldWeight in 288, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.0625 = fieldNorm(doc=288)
          0.019943828 = weight(abstract_txt:than in 288) [ClassicSimilarity], result of:
            0.019943828 = score(doc=288,freq=3.0), product of:
              0.0473274 = queryWeight, product of:
                1.4588515 = boost
                3.8927383 = idf(docFreq=2461, maxDocs=44421)
                0.008333863 = queryNorm
              0.4214013 = fieldWeight in 288, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.8927383 = idf(docFreq=2461, maxDocs=44421)
                0.0625 = fieldNorm(doc=288)
          0.30819076 = weight(abstract_txt:stemmers in 288) [ClassicSimilarity], result of:
            0.30819076 = score(doc=288,freq=2.0), product of:
              0.384752 = queryWeight, product of:
                5.094374 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.008333863 = queryNorm
              0.80101144 = fieldWeight in 288, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.0625 = fieldNorm(doc=288)
          0.6483734 = weight(abstract_txt:stemmer in 288) [ClassicSimilarity], result of:
            0.6483734 = score(doc=288,freq=2.0), product of:
              0.7959051 = queryWeight, product of:
                10.362059 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.008333863 = queryNorm
              0.8146366 = fieldWeight in 288, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.0625 = fieldNorm(doc=288)
        0.16 = coord(4/25)
    
  4. Kettunen, K.; Kunttu, T.; Järvelin, K.: To stem or lemmatize a highly inflectional language in a probabilistic IR environment? (2005) 0.14
    0.13661687 = sum of:
      0.13661687 = product of:
        0.6830843 = sum of:
          0.01298024 = weight(abstract_txt:small in 5395) [ClassicSimilarity], result of:
            0.01298024 = score(doc=5395,freq=1.0), product of:
              0.044475466 = queryWeight, product of:
                5.3367167 = idf(docFreq=580, maxDocs=44421)
                0.008333863 = queryNorm
              0.2918517 = fieldWeight in 5395, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3367167 = idf(docFreq=580, maxDocs=44421)
                0.0546875 = fieldNorm(doc=5395)
          0.049999285 = weight(abstract_txt:stemming in 5395) [ClassicSimilarity], result of:
            0.049999285 = score(doc=5395,freq=2.0), product of:
              0.08674236 = queryWeight, product of:
                1.3965464 = boost
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.008333863 = queryNorm
              0.5764114 = fieldWeight in 5395, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.0546875 = fieldNorm(doc=5395)
          0.0100752525 = weight(abstract_txt:than in 5395) [ClassicSimilarity], result of:
            0.0100752525 = score(doc=5395,freq=1.0), product of:
              0.0473274 = queryWeight, product of:
                1.4588515 = boost
                3.8927383 = idf(docFreq=2461, maxDocs=44421)
                0.008333863 = queryNorm
              0.21288413 = fieldWeight in 5395, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.8927383 = idf(docFreq=2461, maxDocs=44421)
                0.0546875 = fieldNorm(doc=5395)
          0.04270281 = weight(abstract_txt:generation in 5395) [ClassicSimilarity], result of:
            0.04270281 = score(doc=5395,freq=2.0), product of:
              0.098379135 = queryWeight, product of:
                2.1033244 = boost
                5.612423 = idf(docFreq=440, maxDocs=44421)
                0.008333863 = queryNorm
              0.43406367 = fieldWeight in 5395, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.612423 = idf(docFreq=440, maxDocs=44421)
                0.0546875 = fieldNorm(doc=5395)
          0.5673267 = weight(abstract_txt:stemmer in 5395) [ClassicSimilarity], result of:
            0.5673267 = score(doc=5395,freq=2.0), product of:
              0.7959051 = queryWeight, product of:
                10.362059 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.008333863 = queryNorm
              0.712807 = fieldWeight in 5395, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.0546875 = fieldNorm(doc=5395)
        0.2 = coord(5/25)
    
  5. Fautsch, C.; Savoy, J.: Algorithmic stemmers or morphological analysis? : an evaluation (2009) 0.12
    0.12357031 = sum of:
      0.12357031 = product of:
        1.0297526 = sum of:
          0.07142755 = weight(abstract_txt:stemming in 3950) [ClassicSimilarity], result of:
            0.07142755 = score(doc=3950,freq=2.0), product of:
              0.08674236 = queryWeight, product of:
                1.3965464 = boost
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.008333863 = queryNorm
              0.82344484 = fieldWeight in 3950, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.078125 = fieldNorm(doc=3950)
          0.38523847 = weight(abstract_txt:stemmers in 3950) [ClassicSimilarity], result of:
            0.38523847 = score(doc=3950,freq=2.0), product of:
              0.384752 = queryWeight, product of:
                5.094374 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.008333863 = queryNorm
              1.0012643 = fieldWeight in 3950, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.078125 = fieldNorm(doc=3950)
          0.57308656 = weight(abstract_txt:stemmer in 3950) [ClassicSimilarity], result of:
            0.57308656 = score(doc=3950,freq=1.0), product of:
              0.7959051 = queryWeight, product of:
                10.362059 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.008333863 = queryNorm
              0.72004384 = fieldWeight in 3950, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.078125 = fieldNorm(doc=3950)
        0.12 = coord(3/25)