Document (#12068)

Author
Hull, D.A.
Title
Stemming algorithms : a case study for detailed evaluation
Source
Journal of the American Society for Information Science. 47(1996) no.1, S.70-84
Year
1996
Abstract
The majority of information retrieval experiments are evaluated by measures such as average precision and average recall. Fundamental decisions about the superiority of one retrieval technique over another are made solely on the bases of these measures. We claim that average performance figures need to be validated with a careful statistical analysis and that there is a great deal of additional information that can be uncovered by looking closely at the results of individual queries. This article is a case study of stemming algorithms which describes a number of novel approaches to evaluation and demonstrates their value
Theme
Retrievalstudien

Similar documents (author)

  1. Hull, P.: Videotex: a new tool for librarians (1994) 5.76
    5.7603507 = sum of:
      5.7603507 = weight(author_txt:hull in 7835) [ClassicSimilarity], result of:
        5.7603507 = fieldWeight in 7835, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.216561 = idf(docFreq=11, maxDocs=44421)
          0.625 = fieldNorm(doc=7835)
    
  2. Hull, T.J.: Reference services and electronic records : the impact of changing methods of communication and access (1995) 5.76
    5.7603507 = sum of:
      5.7603507 = weight(author_txt:hull in 1811) [ClassicSimilarity], result of:
        5.7603507 = fieldWeight in 1811, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.216561 = idf(docFreq=11, maxDocs=44421)
          0.625 = fieldNorm(doc=1811)
    
  3. Hull, T.J.: Reference services and electronic records : the impact of changing methods of communication and access (1995) 5.76
    5.7603507 = sum of:
      5.7603507 = weight(author_txt:hull in 1812) [ClassicSimilarity], result of:
        5.7603507 = fieldWeight in 1812, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.216561 = idf(docFreq=11, maxDocs=44421)
          0.625 = fieldNorm(doc=1812)
    
  4. Hull, T.J.: Reference services for electronic records in archives (1997) 5.76
    5.7603507 = sum of:
      5.7603507 = weight(author_txt:hull in 481) [ClassicSimilarity], result of:
        5.7603507 = fieldWeight in 481, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.216561 = idf(docFreq=11, maxDocs=44421)
          0.625 = fieldNorm(doc=481)
    
  5. Hull, D.: ¬A weighted Boolean model for cross-language text retrieval (1998) 5.76
    5.7603507 = sum of:
      5.7603507 = weight(author_txt:hull in 307) [ClassicSimilarity], result of:
        5.7603507 = fieldWeight in 307, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.216561 = idf(docFreq=11, maxDocs=44421)
          0.625 = fieldNorm(doc=307)
    

Similar documents (content)

  1. Alemayehu, N.: Analysis of performance variation using quey expansion (2003) 0.22
    0.21990494 = sum of:
      0.21990494 = product of:
        0.61084706 = sum of:
          0.061702263 = weight(abstract_txt:recall in 2454) [ClassicSimilarity], result of:
            0.061702263 = score(doc=2454,freq=2.0), product of:
              0.12138805 = queryWeight, product of:
                5.750825 = idf(docFreq=383, maxDocs=44421)
                0.021107936 = queryNorm
              0.5083059 = fieldWeight in 2454, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.750825 = idf(docFreq=383, maxDocs=44421)
                0.0625 = fieldNorm(doc=2454)
          0.025845703 = weight(abstract_txt:study in 2454) [ClassicSimilarity], result of:
            0.025845703 = score(doc=2454,freq=2.0), product of:
              0.08562044 = queryWeight, product of:
                1.1877246 = boost
                3.415198 = idf(docFreq=3968, maxDocs=44421)
                0.021107936 = queryNorm
              0.3018637 = fieldWeight in 2454, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.415198 = idf(docFreq=3968, maxDocs=44421)
                0.0625 = fieldNorm(doc=2454)
          0.047220223 = weight(abstract_txt:retrieval in 2454) [ClassicSimilarity], result of:
            0.047220223 = score(doc=2454,freq=6.0), product of:
              0.08872175 = queryWeight, product of:
                1.209044 = boost
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.021107936 = queryNorm
              0.53222823 = fieldWeight in 2454, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.0625 = fieldNorm(doc=2454)
          0.015766498 = weight(abstract_txt:that in 2454) [ClassicSimilarity], result of:
            0.015766498 = score(doc=2454,freq=3.0), product of:
              0.06158506 = queryWeight, product of:
                1.2337023 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.021107936 = queryNorm
              0.25601172 = fieldWeight in 2454, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0625 = fieldNorm(doc=2454)
          0.13278992 = weight(abstract_txt:figures in 2454) [ClassicSimilarity], result of:
            0.13278992 = score(doc=2454,freq=2.0), product of:
              0.20234165 = queryWeight, product of:
                1.2910845 = boost
                7.4248013 = idf(docFreq=71, maxDocs=44421)
                0.021107936 = queryNorm
              0.6562659 = fieldWeight in 2454, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.4248013 = idf(docFreq=71, maxDocs=44421)
                0.0625 = fieldNorm(doc=2454)
          0.041243225 = weight(abstract_txt:evaluation in 2454) [ClassicSimilarity], result of:
            0.041243225 = score(doc=2454,freq=1.0), product of:
              0.14730933 = queryWeight, product of:
                1.5579094 = boost
                4.479632 = idf(docFreq=1368, maxDocs=44421)
                0.021107936 = queryNorm
              0.279977 = fieldWeight in 2454, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.479632 = idf(docFreq=1368, maxDocs=44421)
                0.0625 = fieldNorm(doc=2454)
          0.07152016 = weight(abstract_txt:case in 2454) [ClassicSimilarity], result of:
            0.07152016 = score(doc=2454,freq=2.0), product of:
              0.16876052 = queryWeight, product of:
                1.667487 = boost
                4.794713 = idf(docFreq=998, maxDocs=44421)
                0.021107936 = queryNorm
              0.42379674 = fieldWeight in 2454, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.794713 = idf(docFreq=998, maxDocs=44421)
                0.0625 = fieldNorm(doc=2454)
          0.073245786 = weight(abstract_txt:measures in 2454) [ClassicSimilarity], result of:
            0.073245786 = score(doc=2454,freq=1.0), product of:
              0.21603145 = queryWeight, product of:
                1.8866247 = boost
                5.424824 = idf(docFreq=531, maxDocs=44421)
                0.021107936 = queryNorm
              0.3390515 = fieldWeight in 2454, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.424824 = idf(docFreq=531, maxDocs=44421)
                0.0625 = fieldNorm(doc=2454)
          0.1415133 = weight(abstract_txt:average in 2454) [ClassicSimilarity], result of:
            0.1415133 = score(doc=2454,freq=1.0), product of:
              0.3836105 = queryWeight, product of:
                3.0790582 = boost
                5.9023747 = idf(docFreq=329, maxDocs=44421)
                0.021107936 = queryNorm
              0.36889842 = fieldWeight in 2454, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9023747 = idf(docFreq=329, maxDocs=44421)
                0.0625 = fieldNorm(doc=2454)
        0.36 = coord(9/25)
    
  2. Kekäläinen, J.; Järvelin, K.: Using graded relevance assessments in IR evaluation (2002) 0.21
    0.20926157 = sum of:
      0.20926157 = product of:
        0.6539424 = sum of:
          0.07556953 = weight(abstract_txt:recall in 225) [ClassicSimilarity], result of:
            0.07556953 = score(doc=225,freq=3.0), product of:
              0.12138805 = queryWeight, product of:
                5.750825 = idf(docFreq=383, maxDocs=44421)
                0.021107936 = queryNorm
              0.62254506 = fieldWeight in 225, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.750825 = idf(docFreq=383, maxDocs=44421)
                0.0625 = fieldNorm(doc=225)
          0.019277573 = weight(abstract_txt:retrieval in 225) [ClassicSimilarity], result of:
            0.019277573 = score(doc=225,freq=1.0), product of:
              0.08872175 = queryWeight, product of:
                1.209044 = boost
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.021107936 = queryNorm
              0.21728125 = fieldWeight in 225, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.0625 = fieldNorm(doc=225)
          0.0091027925 = weight(abstract_txt:that in 225) [ClassicSimilarity], result of:
            0.0091027925 = score(doc=225,freq=1.0), product of:
              0.06158506 = queryWeight, product of:
                1.2337023 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.021107936 = queryNorm
              0.14780845 = fieldWeight in 225, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0625 = fieldNorm(doc=225)
          0.13118142 = weight(abstract_txt:superiority in 225) [ClassicSimilarity], result of:
            0.13118142 = score(doc=225,freq=1.0), product of:
              0.2528716 = queryWeight, product of:
                1.4433181 = boost
                8.30027 = idf(docFreq=29, maxDocs=44421)
                0.021107936 = queryNorm
              0.5187669 = fieldWeight in 225, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.30027 = idf(docFreq=29, maxDocs=44421)
                0.0625 = fieldNorm(doc=225)
          0.041243225 = weight(abstract_txt:evaluation in 225) [ClassicSimilarity], result of:
            0.041243225 = score(doc=225,freq=1.0), product of:
              0.14730933 = queryWeight, product of:
                1.5579094 = boost
                4.479632 = idf(docFreq=1368, maxDocs=44421)
                0.021107936 = queryNorm
              0.279977 = fieldWeight in 225, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.479632 = idf(docFreq=1368, maxDocs=44421)
                0.0625 = fieldNorm(doc=225)
          0.05057239 = weight(abstract_txt:case in 225) [ClassicSimilarity], result of:
            0.05057239 = score(doc=225,freq=1.0), product of:
              0.16876052 = queryWeight, product of:
                1.667487 = boost
                4.794713 = idf(docFreq=998, maxDocs=44421)
                0.021107936 = queryNorm
              0.29966956 = fieldWeight in 225, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.794713 = idf(docFreq=998, maxDocs=44421)
                0.0625 = fieldNorm(doc=225)
          0.12686543 = weight(abstract_txt:measures in 225) [ClassicSimilarity], result of:
            0.12686543 = score(doc=225,freq=3.0), product of:
              0.21603145 = queryWeight, product of:
                1.8866247 = boost
                5.424824 = idf(docFreq=531, maxDocs=44421)
                0.021107936 = queryNorm
              0.58725446 = fieldWeight in 225, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.424824 = idf(docFreq=531, maxDocs=44421)
                0.0625 = fieldNorm(doc=225)
          0.20013003 = weight(abstract_txt:average in 225) [ClassicSimilarity], result of:
            0.20013003 = score(doc=225,freq=2.0), product of:
              0.3836105 = queryWeight, product of:
                3.0790582 = boost
                5.9023747 = idf(docFreq=329, maxDocs=44421)
                0.021107936 = queryNorm
              0.52170116 = fieldWeight in 225, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.9023747 = idf(docFreq=329, maxDocs=44421)
                0.0625 = fieldNorm(doc=225)
        0.32 = coord(8/25)
    
  3. Lesk, M.E.; Salton, G.: Relevance assements and retrieval system evaluation (1969) 0.17
    0.16656454 = sum of:
      0.16656454 = product of:
        0.59487337 = sum of:
          0.13089027 = weight(abstract_txt:recall in 4219) [ClassicSimilarity], result of:
            0.13089027 = score(doc=4219,freq=4.0), product of:
              0.12138805 = queryWeight, product of:
                5.750825 = idf(docFreq=383, maxDocs=44421)
                0.021107936 = queryNorm
              1.0782797 = fieldWeight in 4219, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.750825 = idf(docFreq=383, maxDocs=44421)
                0.09375 = fieldNorm(doc=4219)
          0.027413508 = weight(abstract_txt:study in 4219) [ClassicSimilarity], result of:
            0.027413508 = score(doc=4219,freq=1.0), product of:
              0.08562044 = queryWeight, product of:
                1.1877246 = boost
                3.415198 = idf(docFreq=3968, maxDocs=44421)
                0.021107936 = queryNorm
              0.3201748 = fieldWeight in 4219, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.415198 = idf(docFreq=3968, maxDocs=44421)
                0.09375 = fieldNorm(doc=4219)
          0.02891636 = weight(abstract_txt:retrieval in 4219) [ClassicSimilarity], result of:
            0.02891636 = score(doc=4219,freq=1.0), product of:
              0.08872175 = queryWeight, product of:
                1.209044 = boost
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.021107936 = queryNorm
              0.3259219 = fieldWeight in 4219, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.09375 = fieldNorm(doc=4219)
          0.023649747 = weight(abstract_txt:that in 4219) [ClassicSimilarity], result of:
            0.023649747 = score(doc=4219,freq=3.0), product of:
              0.06158506 = queryWeight, product of:
                1.2337023 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.021107936 = queryNorm
              0.3840176 = fieldWeight in 4219, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.09375 = fieldNorm(doc=4219)
          0.061864838 = weight(abstract_txt:evaluation in 4219) [ClassicSimilarity], result of:
            0.061864838 = score(doc=4219,freq=1.0), product of:
              0.14730933 = queryWeight, product of:
                1.5579094 = boost
                4.479632 = idf(docFreq=1368, maxDocs=44421)
                0.021107936 = queryNorm
              0.4199655 = fieldWeight in 4219, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.479632 = idf(docFreq=1368, maxDocs=44421)
                0.09375 = fieldNorm(doc=4219)
          0.10986869 = weight(abstract_txt:measures in 4219) [ClassicSimilarity], result of:
            0.10986869 = score(doc=4219,freq=1.0), product of:
              0.21603145 = queryWeight, product of:
                1.8866247 = boost
                5.424824 = idf(docFreq=531, maxDocs=44421)
                0.021107936 = queryNorm
              0.5085773 = fieldWeight in 4219, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.424824 = idf(docFreq=531, maxDocs=44421)
                0.09375 = fieldNorm(doc=4219)
          0.21226996 = weight(abstract_txt:average in 4219) [ClassicSimilarity], result of:
            0.21226996 = score(doc=4219,freq=1.0), product of:
              0.3836105 = queryWeight, product of:
                3.0790582 = boost
                5.9023747 = idf(docFreq=329, maxDocs=44421)
                0.021107936 = queryNorm
              0.55334765 = fieldWeight in 4219, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9023747 = idf(docFreq=329, maxDocs=44421)
                0.09375 = fieldNorm(doc=4219)
        0.28 = coord(7/25)
    
  4. Kraaij, W.; Pohlmann, R.: Evaluation of a Dutch stemming algorithm (1995) 0.16
    0.1637986 = sum of:
      0.1637986 = product of:
        0.68249416 = sum of:
          0.05453761 = weight(abstract_txt:recall in 5866) [ClassicSimilarity], result of:
            0.05453761 = score(doc=5866,freq=1.0), product of:
              0.12138805 = queryWeight, product of:
                5.750825 = idf(docFreq=383, maxDocs=44421)
                0.021107936 = queryNorm
              0.44928318 = fieldWeight in 5866, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.750825 = idf(docFreq=383, maxDocs=44421)
                0.078125 = fieldNorm(doc=5866)
          0.024096966 = weight(abstract_txt:retrieval in 5866) [ClassicSimilarity], result of:
            0.024096966 = score(doc=5866,freq=1.0), product of:
              0.08872175 = queryWeight, product of:
                1.209044 = boost
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.021107936 = queryNorm
              0.27160156 = fieldWeight in 5866, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.078125 = fieldNorm(doc=5866)
          0.01137849 = weight(abstract_txt:that in 5866) [ClassicSimilarity], result of:
            0.01137849 = score(doc=5866,freq=1.0), product of:
              0.06158506 = queryWeight, product of:
                1.2337023 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.021107936 = queryNorm
              0.18476056 = fieldWeight in 5866, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.078125 = fieldNorm(doc=5866)
          0.1296971 = weight(abstract_txt:careful in 5866) [ClassicSimilarity], result of:
            0.1296971 = score(doc=5866,freq=1.0), product of:
              0.21627119 = queryWeight, product of:
                1.3347852 = boost
                7.676116 = idf(docFreq=55, maxDocs=44421)
                0.021107936 = queryNorm
              0.5996966 = fieldWeight in 5866, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.676116 = idf(docFreq=55, maxDocs=44421)
                0.078125 = fieldNorm(doc=5866)
          0.051554028 = weight(abstract_txt:evaluation in 5866) [ClassicSimilarity], result of:
            0.051554028 = score(doc=5866,freq=1.0), product of:
              0.14730933 = queryWeight, product of:
                1.5579094 = boost
                4.479632 = idf(docFreq=1368, maxDocs=44421)
                0.021107936 = queryNorm
              0.34997123 = fieldWeight in 5866, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.479632 = idf(docFreq=1368, maxDocs=44421)
                0.078125 = fieldNorm(doc=5866)
          0.41123 = weight(abstract_txt:stemming in 5866) [ClassicSimilarity], result of:
            0.41123 = score(doc=5866,freq=3.0), product of:
              0.40776002 = queryWeight, product of:
                2.5919664 = boost
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.021107936 = queryNorm
              1.0085099 = fieldWeight in 5866, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.078125 = fieldNorm(doc=5866)
        0.24 = coord(6/25)
    
  5. Frakes, W.B.: Stemming algorithms (1992) 0.16
    0.15783286 = sum of:
      0.15783286 = product of:
        0.98645544 = sum of:
          0.038555145 = weight(abstract_txt:retrieval in 4503) [ClassicSimilarity], result of:
            0.038555145 = score(doc=4503,freq=1.0), product of:
              0.08872175 = queryWeight, product of:
                1.209044 = boost
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.021107936 = queryNorm
              0.4345625 = fieldWeight in 4503, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.125 = fieldNorm(doc=4503)
          0.018205585 = weight(abstract_txt:that in 4503) [ClassicSimilarity], result of:
            0.018205585 = score(doc=4503,freq=1.0), product of:
              0.06158506 = queryWeight, product of:
                1.2337023 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.021107936 = queryNorm
              0.2956169 = fieldWeight in 4503, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.125 = fieldNorm(doc=4503)
          0.16993868 = weight(abstract_txt:algorithms in 4503) [ClassicSimilarity], result of:
            0.16993868 = score(doc=4503,freq=1.0), product of:
              0.23850822 = queryWeight, product of:
                1.9823426 = boost
                5.7000527 = idf(docFreq=403, maxDocs=44421)
                0.021107936 = queryNorm
              0.7125066 = fieldWeight in 4503, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7000527 = idf(docFreq=403, maxDocs=44421)
                0.125 = fieldNorm(doc=4503)
          0.759756 = weight(abstract_txt:stemming in 4503) [ClassicSimilarity], result of:
            0.759756 = score(doc=4503,freq=4.0), product of:
              0.40776002 = queryWeight, product of:
                2.5919664 = boost
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.021107936 = queryNorm
              1.8632431 = fieldWeight in 4503, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.125 = fieldNorm(doc=4503)
        0.16 = coord(4/25)