Document (#40188)

Author
Flores, F.N.
Moreira, V.P.
Title
Assessing the impact of stemming accuracy on information retrieval : a multilingual perspective
Source
Information processing and management. 52(2016) no.5, S.840-854
Year
2016
Abstract
The quality of stemming algorithms is typically measured in two different ways: (i) how accurately they map the variant forms of a word to the same stem; or (ii) how much improvement they bring to Information Retrieval systems. In this article, we evaluate various stemming algorithms, in four languages, in terms of accuracy and in terms of their aid to Information Retrieval. The aim is to assess whether the most accurate stemmers are also the ones that bring the biggest gain in Information Retrieval. Experiments in English, French, Portuguese, and Spanish show that this is not always the case, as stemmers with higher error rates yield better retrieval quality. As a byproduct, we also identified the most accurate stemmers and the best for Information Retrieval purposes.
Content
Vgl.: http://www.sciencedirect.com/science/article/pii/S0306457316300358.
Theme
Automatisches Indexieren
Multilinguale Probleme

Similar documents (author)

  1. Flores, F.; Spinosa, C.: Information technology and the institution of identity : reflections since 'Understanding computers and cognition' (1998) 1.85
    1.8464384 = sum of:
      1.8464384 = product of:
        3.6928768 = sum of:
          3.6928768 = weight(author_txt:flores in 4833) [ClassicSimilarity], result of:
            3.6928768 = score(doc=4833,freq=1.0), product of:
              0.7570817 = queryWeight, product of:
                1.0764859 = boost
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.072091214 = queryNorm
              4.8777785 = fieldWeight in 4833, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.5 = fieldNorm(doc=4833)
        0.5 = coord(1/2)
    
  2. Winograd, T.; Flores, F.: Erkenntnis, Maschinen, Verstehen : zur Neugestaltung von Computersystemen (1992) 1.85
    1.8464384 = sum of:
      1.8464384 = product of:
        3.6928768 = sum of:
          3.6928768 = weight(author_txt:flores in 2524) [ClassicSimilarity], result of:
            3.6928768 = score(doc=2524,freq=1.0), product of:
              0.7570817 = queryWeight, product of:
                1.0764859 = boost
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.072091214 = queryNorm
              4.8777785 = fieldWeight in 2524, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.5 = fieldNorm(doc=2524)
        0.5 = coord(1/2)
    
  3. Moreira, F. Mosso => Mosso Moreira, F.: 1.57
    1.5699508 = sum of:
      1.5699508 = product of:
        3.1399016 = sum of:
          3.1399016 = weight(author_txt:moreira in 730) [ClassicSimilarity], result of:
            3.1399016 = score(doc=730,freq=2.0), product of:
              0.6533202 = queryWeight, product of:
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.072091214 = queryNorm
              4.8060684 = fieldWeight in 730, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.375 = fieldNorm(doc=730)
        0.5 = coord(1/2)
    
  4. Medina-Mora, R.; Winograd, T.; Flores, R.: ¬The ActionWorkflow approach to workflow management technology (1993) 1.38
    1.3848288 = sum of:
      1.3848288 = product of:
        2.7696576 = sum of:
          2.7696576 = weight(author_txt:flores in 7055) [ClassicSimilarity], result of:
            2.7696576 = score(doc=7055,freq=1.0), product of:
              0.7570817 = queryWeight, product of:
                1.0764859 = boost
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.072091214 = queryNorm
              3.6583338 = fieldWeight in 7055, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.375 = fieldNorm(doc=7055)
        0.5 = coord(1/2)
    
  5. Flores-Herr, N.; Sack, H.; Bossert, K.: Suche in Multimediaarchiven von Kultureinrichtungen (2011) 1.38
    1.3848288 = sum of:
      1.3848288 = product of:
        2.7696576 = sum of:
          2.7696576 = weight(author_txt:flores in 1346) [ClassicSimilarity], result of:
            2.7696576 = score(doc=1346,freq=1.0), product of:
              0.7570817 = queryWeight, product of:
                1.0764859 = boost
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.072091214 = queryNorm
              3.6583338 = fieldWeight in 1346, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.375 = fieldNorm(doc=1346)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Fautsch, C.; Savoy, J.: Algorithmic stemmers or morphological analysis? : an evaluation (2009) 0.28
    0.28096655 = sum of:
      0.28096655 = product of:
        1.170694 = sum of:
          0.098766975 = weight(abstract_txt:stem in 3950) [ClassicSimilarity], result of:
            0.098766975 = score(doc=3950,freq=1.0), product of:
              0.15777889 = queryWeight, product of:
                1.2192808 = boost
                8.0125885 = idf(docFreq=39, maxDocs=44421)
                0.01614999 = queryNorm
              0.6259835 = fieldWeight in 3950, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.0125885 = idf(docFreq=39, maxDocs=44421)
                0.078125 = fieldNorm(doc=3950)
          0.025390072 = weight(abstract_txt:terms in 3950) [ClassicSimilarity], result of:
            0.025390072 = score(doc=3950,freq=1.0), product of:
              0.08036994 = queryWeight, product of:
                1.2306687 = boost
                4.043712 = idf(docFreq=2116, maxDocs=44421)
                0.01614999 = queryNorm
              0.31591502 = fieldWeight in 3950, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.043712 = idf(docFreq=2116, maxDocs=44421)
                0.078125 = fieldNorm(doc=3950)
          0.019214617 = weight(abstract_txt:information in 3950) [ClassicSimilarity], result of:
            0.019214617 = score(doc=3950,freq=2.0), product of:
              0.07189669 = queryWeight, product of:
                1.8404278 = boost
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.01614999 = queryNorm
              0.26725316 = fieldWeight in 3950, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.078125 = fieldNorm(doc=3950)
          0.08383625 = weight(abstract_txt:retrieval in 3950) [ClassicSimilarity], result of:
            0.08383625 = score(doc=3950,freq=3.0), product of:
              0.17821282 = queryWeight, product of:
                3.174128 = boost
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.01614999 = queryNorm
              0.4704277 = fieldWeight in 3950, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.078125 = fieldNorm(doc=3950)
          0.33722362 = weight(abstract_txt:stemming in 3950) [ClassicSimilarity], result of:
            0.33722362 = score(doc=3950,freq=2.0), product of:
              0.40952787 = queryWeight, product of:
                3.4023712 = boost
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.01614999 = queryNorm
              0.82344484 = fieldWeight in 3950, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.078125 = fieldNorm(doc=3950)
          0.60626245 = weight(abstract_txt:stemmers in 3950) [ClassicSimilarity], result of:
            0.60626245 = score(doc=3950,freq=2.0), product of:
              0.6054969 = queryWeight, product of:
                4.137099 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.01614999 = queryNorm
              1.0012643 = fieldWeight in 3950, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.078125 = fieldNorm(doc=3950)
        0.24 = coord(6/25)
    
  2. Fox, B.; Fox, C.J.: Efficient stemmer generation (2002) 0.24
    0.23903276 = sum of:
      0.23903276 = product of:
        1.4939548 = sum of:
          0.021036277 = weight(abstract_txt:also in 3585) [ClassicSimilarity], result of:
            0.021036277 = score(doc=3585,freq=1.0), product of:
              0.056651525 = queryWeight, product of:
                1.0332375 = boost
                3.3949955 = idf(docFreq=4049, maxDocs=44421)
                0.01614999 = queryNorm
              0.37132764 = fieldWeight in 3585, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3949955 = idf(docFreq=4049, maxDocs=44421)
                0.109375 = fieldNorm(doc=3585)
          0.09956068 = weight(abstract_txt:algorithms in 3585) [ClassicSimilarity], result of:
            0.09956068 = score(doc=3585,freq=1.0), product of:
              0.15969485 = queryWeight, product of:
                1.7347616 = boost
                5.7000527 = idf(docFreq=403, maxDocs=44421)
                0.01614999 = queryNorm
              0.62344325 = fieldWeight in 3585, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7000527 = idf(docFreq=403, maxDocs=44421)
                0.109375 = fieldNorm(doc=3585)
          0.33383435 = weight(abstract_txt:stemming in 3585) [ClassicSimilarity], result of:
            0.33383435 = score(doc=3585,freq=1.0), product of:
              0.40952787 = queryWeight, product of:
                3.4023712 = boost
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.01614999 = queryNorm
              0.81516886 = fieldWeight in 3585, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.109375 = fieldNorm(doc=3585)
          1.0395235 = weight(abstract_txt:stemmers in 3585) [ClassicSimilarity], result of:
            1.0395235 = score(doc=3585,freq=3.0), product of:
              0.6054969 = queryWeight, product of:
                4.137099 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.01614999 = queryNorm
              1.7168107 = fieldWeight in 3585, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.109375 = fieldNorm(doc=3585)
        0.16 = coord(4/25)
    
  3. Brychcín, T.; Konopík, M.: HPS: High precision stemmer (2015) 0.24
    0.23728481 = sum of:
      0.23728481 = product of:
        0.8474457 = sum of:
          0.05190353 = weight(abstract_txt:spanish in 3686) [ClassicSimilarity], result of:
            0.05190353 = score(doc=3686,freq=1.0), product of:
              0.1192282 = queryWeight, product of:
                1.0599096 = boost
                6.965269 = idf(docFreq=113, maxDocs=44421)
                0.01614999 = queryNorm
              0.43532932 = fieldWeight in 3686, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.965269 = idf(docFreq=113, maxDocs=44421)
                0.0625 = fieldNorm(doc=3686)
          0.07901358 = weight(abstract_txt:stem in 3686) [ClassicSimilarity], result of:
            0.07901358 = score(doc=3686,freq=1.0), product of:
              0.15777889 = queryWeight, product of:
                1.2192808 = boost
                8.0125885 = idf(docFreq=39, maxDocs=44421)
                0.01614999 = queryNorm
              0.5007868 = fieldWeight in 3686, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.0125885 = idf(docFreq=39, maxDocs=44421)
                0.0625 = fieldNorm(doc=3686)
          0.08045717 = weight(abstract_txt:algorithms in 3686) [ClassicSimilarity], result of:
            0.08045717 = score(doc=3686,freq=2.0), product of:
              0.15969485 = queryWeight, product of:
                1.7347616 = boost
                5.7000527 = idf(docFreq=403, maxDocs=44421)
                0.01614999 = queryNorm
              0.5038182 = fieldWeight in 3686, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.7000527 = idf(docFreq=403, maxDocs=44421)
                0.0625 = fieldNorm(doc=3686)
          0.015371693 = weight(abstract_txt:information in 3686) [ClassicSimilarity], result of:
            0.015371693 = score(doc=3686,freq=2.0), product of:
              0.07189669 = queryWeight, product of:
                1.8404278 = boost
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.01614999 = queryNorm
              0.21380253 = fieldWeight in 3686, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.0625 = fieldNorm(doc=3686)
          0.077267334 = weight(abstract_txt:accurate in 3686) [ClassicSimilarity], result of:
            0.077267334 = score(doc=3686,freq=1.0), product of:
              0.19584914 = queryWeight, product of:
                1.921123 = boost
                6.312396 = idf(docFreq=218, maxDocs=44421)
                0.01614999 = queryNorm
              0.39452475 = fieldWeight in 3686, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.312396 = idf(docFreq=218, maxDocs=44421)
                0.0625 = fieldNorm(doc=3686)
          0.038722306 = weight(abstract_txt:retrieval in 3686) [ClassicSimilarity], result of:
            0.038722306 = score(doc=3686,freq=1.0), product of:
              0.17821282 = queryWeight, product of:
                3.174128 = boost
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.01614999 = queryNorm
              0.21728125 = fieldWeight in 3686, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.0625 = fieldNorm(doc=3686)
          0.50471014 = weight(abstract_txt:stemming in 3686) [ClassicSimilarity], result of:
            0.50471014 = score(doc=3686,freq=7.0), product of:
              0.40952787 = queryWeight, product of:
                3.4023712 = boost
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.01614999 = queryNorm
              1.2324195 = fieldWeight in 3686, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.0625 = fieldNorm(doc=3686)
        0.28 = coord(7/25)
    
  4. Dolamic, L.; Savoy, J.: Indexing and searching strategies for the Russian language (2009) 0.21
    0.2059865 = sum of:
      0.2059865 = product of:
        1.0299325 = sum of:
          0.0120207295 = weight(abstract_txt:also in 288) [ClassicSimilarity], result of:
            0.0120207295 = score(doc=288,freq=1.0), product of:
              0.056651525 = queryWeight, product of:
                1.0332375 = boost
                3.3949955 = idf(docFreq=4049, maxDocs=44421)
                0.01614999 = queryNorm
              0.21218722 = fieldWeight in 288, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3949955 = idf(docFreq=4049, maxDocs=44421)
                0.0625 = fieldNorm(doc=288)
          0.0108694285 = weight(abstract_txt:information in 288) [ClassicSimilarity], result of:
            0.0108694285 = score(doc=288,freq=1.0), product of:
              0.07189669 = queryWeight, product of:
                1.8404278 = boost
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.01614999 = queryNorm
              0.15118122 = fieldWeight in 288, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.0625 = fieldNorm(doc=288)
          0.054761607 = weight(abstract_txt:retrieval in 288) [ClassicSimilarity], result of:
            0.054761607 = score(doc=288,freq=2.0), product of:
              0.17821282 = queryWeight, product of:
                3.174128 = boost
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.01614999 = queryNorm
              0.3072821 = fieldWeight in 288, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.0625 = fieldNorm(doc=288)
          0.46727076 = weight(abstract_txt:stemming in 288) [ClassicSimilarity], result of:
            0.46727076 = score(doc=288,freq=6.0), product of:
              0.40952787 = queryWeight, product of:
                3.4023712 = boost
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.01614999 = queryNorm
              1.1409987 = fieldWeight in 288, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.0625 = fieldNorm(doc=288)
          0.48500994 = weight(abstract_txt:stemmers in 288) [ClassicSimilarity], result of:
            0.48500994 = score(doc=288,freq=2.0), product of:
              0.6054969 = queryWeight, product of:
                4.137099 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.01614999 = queryNorm
              0.80101144 = fieldWeight in 288, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.0625 = fieldNorm(doc=288)
        0.2 = coord(5/25)
    
  5. Greengrass, M.: Conflation methods for searching databases of Latin text (1996) 0.21
    0.20589392 = sum of:
      0.20589392 = product of:
        1.0294696 = sum of:
          0.02823234 = weight(abstract_txt:most in 56) [ClassicSimilarity], result of:
            0.02823234 = score(doc=56,freq=1.0), product of:
              0.07638852 = queryWeight, product of:
                1.1997987 = boost
                3.94228 = idf(docFreq=2342, maxDocs=44421)
                0.01614999 = queryNorm
              0.36958876 = fieldWeight in 56, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.94228 = idf(docFreq=2342, maxDocs=44421)
                0.09375 = fieldNorm(doc=56)
          0.11852037 = weight(abstract_txt:stem in 56) [ClassicSimilarity], result of:
            0.11852037 = score(doc=56,freq=1.0), product of:
              0.15777889 = queryWeight, product of:
                1.2192808 = boost
                8.0125885 = idf(docFreq=39, maxDocs=44421)
                0.01614999 = queryNorm
              0.7511802 = fieldWeight in 56, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.0125885 = idf(docFreq=39, maxDocs=44421)
                0.09375 = fieldNorm(doc=56)
          0.08214241 = weight(abstract_txt:retrieval in 56) [ClassicSimilarity], result of:
            0.08214241 = score(doc=56,freq=2.0), product of:
              0.17821282 = queryWeight, product of:
                3.174128 = boost
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.01614999 = queryNorm
              0.46092314 = fieldWeight in 56, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.09375 = fieldNorm(doc=56)
          0.28614375 = weight(abstract_txt:stemming in 56) [ClassicSimilarity], result of:
            0.28614375 = score(doc=56,freq=1.0), product of:
              0.40952787 = queryWeight, product of:
                3.4023712 = boost
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.01614999 = queryNorm
              0.69871616 = fieldWeight in 56, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.09375 = fieldNorm(doc=56)
          0.51443076 = weight(abstract_txt:stemmers in 56) [ClassicSimilarity], result of:
            0.51443076 = score(doc=56,freq=1.0), product of:
              0.6054969 = queryWeight, product of:
                4.137099 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.01614999 = queryNorm
              0.849601 = fieldWeight in 56, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.09375 = fieldNorm(doc=56)
        0.2 = coord(5/25)