Document (#14868)

Author
Paice, C.D.
Title
Method for evaluation of stemming algorithms based on error counting
Source
Journal of the American Society for Information Science. 47(1996) no.8, S.632-649
Year
1996
Abstract
Assesses the effectiveness of stemming algorithms by counting the number of identifiable errors during the stemming of words from various text samples. This entails manual groupings of the words in each sample using software developed for this purpose, stemming the words and computing indeices which represent the rate of understemming and overstemming. Presents the results for 3 stemmers (Lovins, Porter, and Paice/Husk), in each case using 3 text samples
Theme
Computerlinguistik

Similar documents (author)

  1. Paice, C.D.: Expert systems for information retrieval? (1986) 5.94
    5.9401517 = sum of:
      5.9401517 = weight(author_txt:paice in 1100) [ClassicSimilarity], result of:
        5.9401517 = fieldWeight in 1100, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.504243 = idf(docFreq=8, maxDocs=44421)
          0.625 = fieldNorm(doc=1100)
    
  2. Paice, C.D.: ¬A thesaural model of information retrieval (1991) 5.94
    5.9401517 = sum of:
      5.9401517 = weight(author_txt:paice in 2293) [ClassicSimilarity], result of:
        5.9401517 = fieldWeight in 2293, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.504243 = idf(docFreq=8, maxDocs=44421)
          0.625 = fieldNorm(doc=2293)
    
  3. Paice, C.D.: Automatic abstracting (1994) 5.94
    5.9401517 = sum of:
      5.9401517 = weight(author_txt:paice in 985) [ClassicSimilarity], result of:
        5.9401517 = fieldWeight in 985, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.504243 = idf(docFreq=8, maxDocs=44421)
          0.625 = fieldNorm(doc=985)
    
  4. Paice, C.D.: Automatic abstracting (1994) 5.94
    5.9401517 = sum of:
      5.9401517 = weight(author_txt:paice in 1323) [ClassicSimilarity], result of:
        5.9401517 = fieldWeight in 1323, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.504243 = idf(docFreq=8, maxDocs=44421)
          0.625 = fieldNorm(doc=1323)
    
  5. Paice, C.D.: Soft evaluation of Boolean search queries in information retrieval systems (1984) 5.94
    5.9401517 = sum of:
      5.9401517 = weight(author_txt:paice in 1789) [ClassicSimilarity], result of:
        5.9401517 = fieldWeight in 1789, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.504243 = idf(docFreq=8, maxDocs=44421)
          0.625 = fieldNorm(doc=1789)
    

Similar documents (content)

  1. Kraaij, W.; Pohlmann, R.: Evaluation of a Dutch stemming algorithm (1995) 0.33
    0.33291677 = sum of:
      0.33291677 = product of:
        1.1889884 = sum of:
          0.01783906 = weight(abstract_txt:using in 5866) [ClassicSimilarity], result of:
            0.01783906 = score(doc=5866,freq=1.0), product of:
              0.06605395 = queryWeight, product of:
                1.441951 = boost
                3.4568708 = idf(docFreq=3806, maxDocs=44421)
                0.013251503 = queryNorm
              0.27006802 = fieldWeight in 5866, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4568708 = idf(docFreq=3806, maxDocs=44421)
                0.078125 = fieldNorm(doc=5866)
          0.028493848 = weight(abstract_txt:text in 5866) [ClassicSimilarity], result of:
            0.028493848 = score(doc=5866,freq=1.0), product of:
              0.09025783 = queryWeight, product of:
                1.6855575 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.013251503 = queryNorm
              0.3156939 = fieldWeight in 5866, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.078125 = fieldNorm(doc=5866)
          0.042638827 = weight(abstract_txt:each in 5866) [ClassicSimilarity], result of:
            0.042638827 = score(doc=5866,freq=2.0), product of:
              0.093722604 = queryWeight, product of:
                1.717605 = boost
                4.1177115 = idf(docFreq=1965, maxDocs=44421)
                0.013251503 = queryNorm
              0.4549471 = fieldWeight in 5866, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1177115 = idf(docFreq=1965, maxDocs=44421)
                0.078125 = fieldNorm(doc=5866)
          0.1607029 = weight(abstract_txt:stemmers in 5866) [ClassicSimilarity], result of:
            0.1607029 = score(doc=5866,freq=1.0), product of:
              0.22698125 = queryWeight, product of:
                1.890084 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.013251503 = queryNorm
              0.7080008 = fieldWeight in 5866, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.078125 = fieldNorm(doc=5866)
          0.17927563 = weight(abstract_txt:porter in 5866) [ClassicSimilarity], result of:
            0.17927563 = score(doc=5866,freq=1.0), product of:
              0.24414903 = queryWeight, product of:
                1.9602597 = boost
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.013251503 = queryNorm
              0.73428774 = fieldWeight in 5866, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.078125 = fieldNorm(doc=5866)
          0.14073737 = weight(abstract_txt:words in 5866) [ClassicSimilarity], result of:
            0.14073737 = score(doc=5866,freq=2.0), product of:
              0.23783596 = queryWeight, product of:
                3.3510854 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.013251503 = queryNorm
              0.5917413 = fieldWeight in 5866, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.078125 = fieldNorm(doc=5866)
          0.6193009 = weight(abstract_txt:stemming in 5866) [ClassicSimilarity], result of:
            0.6193009 = score(doc=5866,freq=3.0), product of:
              0.6140752 = queryWeight, product of:
                6.217659 = boost
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.013251503 = queryNorm
              1.0085099 = fieldWeight in 5866, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.078125 = fieldNorm(doc=5866)
        0.28 = coord(7/25)
    
  2. Fox, B.; Fox, C.J.: Efficient stemmer generation (2002) 0.26
    0.26012632 = sum of:
      0.26012632 = product of:
        1.0838597 = sum of:
          0.033320263 = weight(abstract_txt:case in 3585) [ClassicSimilarity], result of:
            0.033320263 = score(doc=3585,freq=1.0), product of:
              0.06353715 = queryWeight, product of:
                4.794713 = idf(docFreq=998, maxDocs=44421)
                0.013251503 = queryNorm
              0.52442175 = fieldWeight in 3585, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.794713 = idf(docFreq=998, maxDocs=44421)
                0.109375 = fieldNorm(doc=3585)
          0.008422957 = weight(abstract_txt:this in 3585) [ClassicSimilarity], result of:
            0.008422957 = score(doc=3585,freq=1.0), product of:
              0.03200431 = queryWeight, product of:
                1.0037034 = boost
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.013251503 = queryNorm
              0.26318192 = fieldWeight in 3585, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.109375 = fieldNorm(doc=3585)
          0.03989139 = weight(abstract_txt:text in 3585) [ClassicSimilarity], result of:
            0.03989139 = score(doc=3585,freq=1.0), product of:
              0.09025783 = queryWeight, product of:
                1.6855575 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.013251503 = queryNorm
              0.44197148 = fieldWeight in 3585, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.109375 = fieldNorm(doc=3585)
          0.38968384 = weight(abstract_txt:stemmers in 3585) [ClassicSimilarity], result of:
            0.38968384 = score(doc=3585,freq=3.0), product of:
              0.22698125 = queryWeight, product of:
                1.890084 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.013251503 = queryNorm
              1.7168107 = fieldWeight in 3585, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.109375 = fieldNorm(doc=3585)
          0.11196625 = weight(abstract_txt:algorithms in 3585) [ClassicSimilarity], result of:
            0.11196625 = score(doc=3585,freq=1.0), product of:
              0.17959334 = queryWeight, product of:
                2.3776407 = boost
                5.7000527 = idf(docFreq=403, maxDocs=44421)
                0.013251503 = queryNorm
              0.62344325 = fieldWeight in 3585, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7000527 = idf(docFreq=403, maxDocs=44421)
                0.109375 = fieldNorm(doc=3585)
          0.50057495 = weight(abstract_txt:stemming in 3585) [ClassicSimilarity], result of:
            0.50057495 = score(doc=3585,freq=1.0), product of:
              0.6140752 = queryWeight, product of:
                6.217659 = boost
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.013251503 = queryNorm
              0.81516886 = fieldWeight in 3585, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.109375 = fieldNorm(doc=3585)
        0.24 = coord(6/25)
    
  3. Frakes, W.B.: Stemming algorithms (1992) 0.26
    0.25675163 = sum of:
      0.25675163 = product of:
        1.6046977 = sum of:
          0.045723986 = weight(abstract_txt:effectiveness in 4503) [ClassicSimilarity], result of:
            0.045723986 = score(doc=4503,freq=1.0), product of:
              0.07177781 = queryWeight, product of:
                1.0628726 = boost
                5.0961695 = idf(docFreq=738, maxDocs=44421)
                0.013251503 = queryNorm
              0.6370212 = fieldWeight in 4503, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0961695 = idf(docFreq=738, maxDocs=44421)
                0.125 = fieldNorm(doc=4503)
          0.286841 = weight(abstract_txt:porter in 4503) [ClassicSimilarity], result of:
            0.286841 = score(doc=4503,freq=1.0), product of:
              0.24414903 = queryWeight, product of:
                1.9602597 = boost
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.013251503 = queryNorm
              1.1748604 = fieldWeight in 4503, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.125 = fieldNorm(doc=4503)
          0.12796144 = weight(abstract_txt:algorithms in 4503) [ClassicSimilarity], result of:
            0.12796144 = score(doc=4503,freq=1.0), product of:
              0.17959334 = queryWeight, product of:
                2.3776407 = boost
                5.7000527 = idf(docFreq=403, maxDocs=44421)
                0.013251503 = queryNorm
              0.7125066 = fieldWeight in 4503, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7000527 = idf(docFreq=403, maxDocs=44421)
                0.125 = fieldNorm(doc=4503)
          1.1441714 = weight(abstract_txt:stemming in 4503) [ClassicSimilarity], result of:
            1.1441714 = score(doc=4503,freq=4.0), product of:
              0.6140752 = queryWeight, product of:
                6.217659 = boost
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.013251503 = queryNorm
              1.8632431 = fieldWeight in 4503, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.125 = fieldNorm(doc=4503)
        0.16 = coord(4/25)
    
  4. Flores, F.N.; Moreira, V.P.: Assessing the impact of stemming accuracy on information retrieval : a multilingual perspective (2016) 0.24
    0.23976877 = sum of:
      0.23976877 = product of:
        0.99903655 = sum of:
          0.023800187 = weight(abstract_txt:case in 4187) [ClassicSimilarity], result of:
            0.023800187 = score(doc=4187,freq=1.0), product of:
              0.06353715 = queryWeight, product of:
                4.794713 = idf(docFreq=998, maxDocs=44421)
                0.013251503 = queryNorm
              0.37458694 = fieldWeight in 4187, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.794713 = idf(docFreq=998, maxDocs=44421)
                0.078125 = fieldNorm(doc=4187)
          0.008508471 = weight(abstract_txt:this in 4187) [ClassicSimilarity], result of:
            0.008508471 = score(doc=4187,freq=2.0), product of:
              0.03200431 = queryWeight, product of:
                1.0037034 = boost
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.013251503 = queryNorm
              0.26585388 = fieldWeight in 4187, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.078125 = fieldNorm(doc=4187)
          0.06962229 = weight(abstract_txt:error in 4187) [ClassicSimilarity], result of:
            0.06962229 = score(doc=4187,freq=1.0), product of:
              0.129959 = queryWeight, product of:
                1.4301754 = boost
                6.8572807 = idf(docFreq=126, maxDocs=44421)
                0.013251503 = queryNorm
              0.53572506 = fieldWeight in 4187, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8572807 = idf(docFreq=126, maxDocs=44421)
                0.078125 = fieldNorm(doc=4187)
          0.27834558 = weight(abstract_txt:stemmers in 4187) [ClassicSimilarity], result of:
            0.27834558 = score(doc=4187,freq=3.0), product of:
              0.22698125 = queryWeight, product of:
                1.890084 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.013251503 = queryNorm
              1.2262933 = fieldWeight in 4187, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.078125 = fieldNorm(doc=4187)
          0.113102995 = weight(abstract_txt:algorithms in 4187) [ClassicSimilarity], result of:
            0.113102995 = score(doc=4187,freq=2.0), product of:
              0.17959334 = queryWeight, product of:
                2.3776407 = boost
                5.7000527 = idf(docFreq=403, maxDocs=44421)
                0.013251503 = queryNorm
              0.6297728 = fieldWeight in 4187, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.7000527 = idf(docFreq=403, maxDocs=44421)
                0.078125 = fieldNorm(doc=4187)
          0.505657 = weight(abstract_txt:stemming in 4187) [ClassicSimilarity], result of:
            0.505657 = score(doc=4187,freq=2.0), product of:
              0.6140752 = queryWeight, product of:
                6.217659 = boost
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.013251503 = queryNorm
              0.82344484 = fieldWeight in 4187, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.078125 = fieldNorm(doc=4187)
        0.24 = coord(6/25)
    
  5. Duwairi, R.; Al-Refai, M.N.; Khasawneh, N.: Feature reduction techniques for Arabic text categorization (2009) 0.19
    0.19135474 = sum of:
      0.19135474 = product of:
        0.7973114 = sum of:
          0.004813118 = weight(abstract_txt:this in 156) [ClassicSimilarity], result of:
            0.004813118 = score(doc=156,freq=1.0), product of:
              0.03200431 = queryWeight, product of:
                1.0037034 = boost
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.013251503 = queryNorm
              0.15038967 = fieldWeight in 156, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.0625 = fieldNorm(doc=156)
          0.014271248 = weight(abstract_txt:using in 156) [ClassicSimilarity], result of:
            0.014271248 = score(doc=156,freq=1.0), product of:
              0.06605395 = queryWeight, product of:
                1.441951 = boost
                3.4568708 = idf(docFreq=3806, maxDocs=44421)
                0.013251503 = queryNorm
              0.21605442 = fieldWeight in 156, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4568708 = idf(docFreq=3806, maxDocs=44421)
                0.0625 = fieldNorm(doc=156)
          0.02279508 = weight(abstract_txt:text in 156) [ClassicSimilarity], result of:
            0.02279508 = score(doc=156,freq=1.0), product of:
              0.09025783 = queryWeight, product of:
                1.6855575 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.013251503 = queryNorm
              0.25255513 = fieldWeight in 156, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0625 = fieldNorm(doc=156)
          0.024120165 = weight(abstract_txt:each in 156) [ClassicSimilarity], result of:
            0.024120165 = score(doc=156,freq=1.0), product of:
              0.093722604 = queryWeight, product of:
                1.717605 = boost
                4.1177115 = idf(docFreq=1965, maxDocs=44421)
                0.013251503 = queryNorm
              0.25735697 = fieldWeight in 156, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1177115 = idf(docFreq=1965, maxDocs=44421)
                0.0625 = fieldNorm(doc=156)
          0.15922615 = weight(abstract_txt:words in 156) [ClassicSimilarity], result of:
            0.15922615 = score(doc=156,freq=4.0), product of:
              0.23783596 = queryWeight, product of:
                3.3510854 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.013251503 = queryNorm
              0.6694789 = fieldWeight in 156, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.0625 = fieldNorm(doc=156)
          0.5720857 = weight(abstract_txt:stemming in 156) [ClassicSimilarity], result of:
            0.5720857 = score(doc=156,freq=4.0), product of:
              0.6140752 = queryWeight, product of:
                6.217659 = boost
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.013251503 = queryNorm
              0.93162155 = fieldWeight in 156, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.0625 = fieldNorm(doc=156)
        0.24 = coord(6/25)