Document (#14867)

Author
Kraaij, W.
Pohlmann, R.
Title
Evaluation of a Dutch stemming algorithm
Source
New review of document and text management. 1995, no.1, S.25-43
Year
1995
Abstract
A stemming algorithm enables the recall of text retrieval systems to be enhanced. Describes the development of a Dutch version of the Porter stemming algorithm. The stemmer was evaluated using a method drawn from Paice. The evaluation method is based on a list of groups of morphologically related words. Ideally, each group must be stemmed to the same root. The result of applying the stemmer to these groups of words is used to calculate the understemming and overstemming index. These parameters and the diversity of stem group categories that could be generated from the CELEX database enabled a careful analysis of the effects of each stemming rule. The test suite is highly suited to qualitative comparison of different versions of stemmers
Theme
Computerlinguistik

Similar documents (author)

  1. Pohlmann, T.: Vermittlung von Informationskompetenz an Master-Studierende und Doktoranden : Themen und Konzepte (2012) 2.12
    2.1216395 = sum of:
      2.1216395 = product of:
        4.243279 = sum of:
          4.243279 = weight(author_txt:pohlmann in 3447) [ClassicSimilarity], result of:
            4.243279 = score(doc=3447,freq=1.0), product of:
              0.6959363 = queryWeight, product of:
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.071337424 = queryNorm
              6.0972233 = fieldWeight in 3447, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.625 = fieldNorm(doc=3447)
        0.5 = coord(1/2)
    
  2. Hiemstra, D.; Kraaij, W.: ¬A language-modeling approach to TREC (2005) 1.78
    1.779049 = sum of:
      1.779049 = product of:
        3.558098 = sum of:
          3.558098 = weight(author_txt:kraaij in 91) [ClassicSimilarity], result of:
            3.558098 = score(doc=91,freq=1.0), product of:
              0.7181035 = queryWeight, product of:
                1.0158013 = boost
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.071337424 = queryNorm
              4.954854 = fieldWeight in 91, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.5 = fieldNorm(doc=91)
        0.5 = coord(1/2)
    
  3. Friedrich, H.; Pohlmann, J.M.: Aufbau und Betrieb von Navigationssystemen als zentrale Aufgabe moderner Fachinformationssysteme (1996) 1.70
    1.6973116 = sum of:
      1.6973116 = product of:
        3.3946233 = sum of:
          3.3946233 = weight(author_txt:pohlmann in 5333) [ClassicSimilarity], result of:
            3.3946233 = score(doc=5333,freq=1.0), product of:
              0.6959363 = queryWeight, product of:
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.071337424 = queryNorm
              4.8777785 = fieldWeight in 5333, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.5 = fieldNorm(doc=5333)
        0.5 = coord(1/2)
    
  4. Pohlmann, J.M.; König, E.: Landwirtschaft vernetzt : das deutsche Agrarinformationsnetz DAINet (1996) 1.70
    1.6973116 = sum of:
      1.6973116 = product of:
        3.3946233 = sum of:
          3.3946233 = weight(author_txt:pohlmann in 6001) [ClassicSimilarity], result of:
            3.3946233 = score(doc=6001,freq=1.0), product of:
              0.6959363 = queryWeight, product of:
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.071337424 = queryNorm
              4.8777785 = fieldWeight in 6001, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.5 = fieldNorm(doc=6001)
        0.5 = coord(1/2)
    
  5. Diebel, C.; Pohlmann, C.: Retrokonversion des alten alphabetischen Kataloges der Deutschen Bücherei Leipzig (2001) 1.70
    1.6973116 = sum of:
      1.6973116 = product of:
        3.3946233 = sum of:
          3.3946233 = weight(author_txt:pohlmann in 6673) [ClassicSimilarity], result of:
            3.3946233 = score(doc=6673,freq=1.0), product of:
              0.6959363 = queryWeight, product of:
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.071337424 = queryNorm
              4.8777785 = fieldWeight in 6673, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.5 = fieldNorm(doc=6673)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Brychcín, T.; Konopík, M.: HPS: High precision stemmer (2015) 0.35
    0.35418487 = sum of:
      0.35418487 = product of:
        1.264946 = sum of:
          0.042127535 = weight(abstract_txt:rule in 3686) [ClassicSimilarity], result of:
            0.042127535 = score(doc=3686,freq=1.0), product of:
              0.10181784 = queryWeight, product of:
                6.6200633 = idf(docFreq=160, maxDocs=44421)
                0.015380192 = queryNorm
              0.41375396 = fieldWeight in 3686, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6200633 = idf(docFreq=160, maxDocs=44421)
                0.0625 = fieldNorm(doc=3686)
          0.07469614 = weight(abstract_txt:stem in 3686) [ClassicSimilarity], result of:
            0.07469614 = score(doc=3686,freq=1.0), product of:
              0.14915757 = queryWeight, product of:
                1.2103492 = boost
                8.0125885 = idf(docFreq=39, maxDocs=44421)
                0.015380192 = queryNorm
              0.5007868 = fieldWeight in 3686, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.0125885 = idf(docFreq=39, maxDocs=44421)
                0.0625 = fieldNorm(doc=3686)
          0.02644244 = weight(abstract_txt:method in 3686) [ClassicSimilarity], result of:
            0.02644244 = score(doc=3686,freq=1.0), product of:
              0.09404251 = queryWeight, product of:
                1.359143 = boost
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.015380192 = queryNorm
              0.2811754 = fieldWeight in 3686, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.0625 = fieldNorm(doc=3686)
          0.0772771 = weight(abstract_txt:words in 3686) [ClassicSimilarity], result of:
            0.0772771 = score(doc=3686,freq=3.0), product of:
              0.13328563 = queryWeight, product of:
                1.6180604 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.015380192 = queryNorm
              0.5797857 = fieldWeight in 3686, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.0625 = fieldNorm(doc=3686)
          0.18086578 = weight(abstract_txt:algorithm in 3686) [ClassicSimilarity], result of:
            0.18086578 = score(doc=3686,freq=5.0), product of:
              0.22684778 = queryWeight, product of:
                2.5853298 = boost
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.015380192 = queryNorm
              0.79730016 = fieldWeight in 3686, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.0625 = fieldNorm(doc=3686)
          0.22736122 = weight(abstract_txt:stemmer in 3686) [ClassicSimilarity], result of:
            0.22736122 = score(doc=3686,freq=1.0), product of:
              0.3947003 = queryWeight, product of:
                2.784433 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.015380192 = queryNorm
              0.5760351 = fieldWeight in 3686, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.0625 = fieldNorm(doc=3686)
          0.6361758 = weight(abstract_txt:stemming in 3686) [ClassicSimilarity], result of:
            0.6361758 = score(doc=3686,freq=7.0), product of:
              0.5162007 = queryWeight, product of:
                4.5032635 = boost
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.015380192 = queryNorm
              1.2324195 = fieldWeight in 3686, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.0625 = fieldNorm(doc=3686)
        0.28 = coord(7/25)
    
  2. Kettunen, K.; Kunttu, T.; Järvelin, K.: To stem or lemmatize a highly inflectional language in a probabilistic IR environment? (2005) 0.32
    0.32432026 = sum of:
      0.32432026 = product of:
        1.0135008 = sum of:
          0.14614744 = weight(abstract_txt:stem in 5395) [ClassicSimilarity], result of:
            0.14614744 = score(doc=5395,freq=5.0), product of:
              0.14915757 = queryWeight, product of:
                1.2103492 = boost
                8.0125885 = idf(docFreq=39, maxDocs=44421)
                0.015380192 = queryNorm
              0.9798191 = fieldWeight in 5395, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                8.0125885 = idf(docFreq=39, maxDocs=44421)
                0.0546875 = fieldNorm(doc=5395)
          0.0177413 = weight(abstract_txt:each in 5395) [ClassicSimilarity], result of:
            0.0177413 = score(doc=5395,freq=1.0), product of:
              0.07878462 = queryWeight, product of:
                1.2440097 = boost
                4.1177115 = idf(docFreq=1965, maxDocs=44421)
                0.015380192 = queryNorm
              0.22518735 = fieldWeight in 5395, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1177115 = idf(docFreq=1965, maxDocs=44421)
                0.0546875 = fieldNorm(doc=5395)
          0.022842553 = weight(abstract_txt:evaluation in 5395) [ClassicSimilarity], result of:
            0.022842553 = score(doc=5395,freq=1.0), product of:
              0.09324257 = queryWeight, product of:
                1.3533502 = boost
                4.479632 = idf(docFreq=1368, maxDocs=44421)
                0.015380192 = queryNorm
              0.24497987 = fieldWeight in 5395, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.479632 = idf(docFreq=1368, maxDocs=44421)
                0.0546875 = fieldNorm(doc=5395)
          0.040074695 = weight(abstract_txt:method in 5395) [ClassicSimilarity], result of:
            0.040074695 = score(doc=5395,freq=3.0), product of:
              0.09404251 = queryWeight, product of:
                1.359143 = boost
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.015380192 = queryNorm
              0.42613384 = fieldWeight in 5395, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.0546875 = fieldNorm(doc=5395)
          0.10231444 = weight(abstract_txt:morphologically in 5395) [ClassicSimilarity], result of:
            0.10231444 = score(doc=5395,freq=1.0), product of:
              0.201094 = queryWeight, product of:
                1.4053601 = boost
                9.303573 = idf(docFreq=10, maxDocs=44421)
                0.015380192 = queryNorm
              0.5087891 = fieldWeight in 5395, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.303573 = idf(docFreq=10, maxDocs=44421)
                0.0546875 = fieldNorm(doc=5395)
          0.10549124 = weight(abstract_txt:porter in 5395) [ClassicSimilarity], result of:
            0.10549124 = score(doc=5395,freq=1.0), product of:
              0.20523532 = queryWeight, product of:
                1.4197572 = boost
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.015380192 = queryNorm
              0.5140014 = fieldWeight in 5395, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.0546875 = fieldNorm(doc=5395)
          0.28134513 = weight(abstract_txt:stemmer in 5395) [ClassicSimilarity], result of:
            0.28134513 = score(doc=5395,freq=2.0), product of:
              0.3947003 = queryWeight, product of:
                2.784433 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.015380192 = queryNorm
              0.712807 = fieldWeight in 5395, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.0546875 = fieldNorm(doc=5395)
          0.297544 = weight(abstract_txt:stemming in 5395) [ClassicSimilarity], result of:
            0.297544 = score(doc=5395,freq=2.0), product of:
              0.5162007 = queryWeight, product of:
                4.5032635 = boost
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.015380192 = queryNorm
              0.5764114 = fieldWeight in 5395, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.0546875 = fieldNorm(doc=5395)
        0.32 = coord(8/25)
    
  3. Frakes, W.B.: Stemming algorithms (1992) 0.30
    0.3026423 = sum of:
      0.3026423 = product of:
        1.8915143 = sum of:
          0.23386158 = weight(abstract_txt:morphologically in 4503) [ClassicSimilarity], result of:
            0.23386158 = score(doc=4503,freq=1.0), product of:
              0.201094 = queryWeight, product of:
                1.4053601 = boost
                9.303573 = idf(docFreq=10, maxDocs=44421)
                0.015380192 = queryNorm
              1.1629466 = fieldWeight in 4503, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.303573 = idf(docFreq=10, maxDocs=44421)
                0.125 = fieldNorm(doc=4503)
          0.24112284 = weight(abstract_txt:porter in 4503) [ClassicSimilarity], result of:
            0.24112284 = score(doc=4503,freq=1.0), product of:
              0.20523532 = queryWeight, product of:
                1.4197572 = boost
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.015380192 = queryNorm
              1.1748604 = fieldWeight in 4503, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.125 = fieldNorm(doc=4503)
          0.45472243 = weight(abstract_txt:stemmer in 4503) [ClassicSimilarity], result of:
            0.45472243 = score(doc=4503,freq=1.0), product of:
              0.3947003 = queryWeight, product of:
                2.784433 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.015380192 = queryNorm
              1.1520702 = fieldWeight in 4503, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.125 = fieldNorm(doc=4503)
          0.9618074 = weight(abstract_txt:stemming in 4503) [ClassicSimilarity], result of:
            0.9618074 = score(doc=4503,freq=4.0), product of:
              0.5162007 = queryWeight, product of:
                4.5032635 = boost
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.015380192 = queryNorm
              1.8632431 = fieldWeight in 4503, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.125 = fieldNorm(doc=4503)
        0.16 = coord(4/25)
    
  4. Fox, B.; Fox, C.J.: Efficient stemmer generation (2002) 0.28
    0.284737 = sum of:
      0.284737 = product of:
        1.7796062 = sum of:
          0.327574 = weight(abstract_txt:stemmers in 3585) [ClassicSimilarity], result of:
            0.327574 = score(doc=3585,freq=3.0), product of:
              0.19080381 = queryWeight, product of:
                1.368931 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.015380192 = queryNorm
              1.7168107 = fieldWeight in 3585, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.109375 = fieldNorm(doc=3585)
          0.14154986 = weight(abstract_txt:algorithm in 3585) [ClassicSimilarity], result of:
            0.14154986 = score(doc=3585,freq=1.0), product of:
              0.22684778 = queryWeight, product of:
                2.5853298 = boost
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.015380192 = queryNorm
              0.62398607 = fieldWeight in 3585, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.109375 = fieldNorm(doc=3585)
          0.88969153 = weight(abstract_txt:stemmer in 3585) [ClassicSimilarity], result of:
            0.88969153 = score(doc=3585,freq=5.0), product of:
              0.3947003 = queryWeight, product of:
                2.784433 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.015380192 = queryNorm
              2.254094 = fieldWeight in 3585, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.109375 = fieldNorm(doc=3585)
          0.42079076 = weight(abstract_txt:stemming in 3585) [ClassicSimilarity], result of:
            0.42079076 = score(doc=3585,freq=1.0), product of:
              0.5162007 = queryWeight, product of:
                4.5032635 = boost
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.015380192 = queryNorm
              0.81516886 = fieldWeight in 3585, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.109375 = fieldNorm(doc=3585)
        0.16 = coord(4/25)
    
  5. Fautsch, C.; Savoy, J.: Algorithmic stemmers or morphological analysis? : an evaluation (2009) 0.28
    0.28139406 = sum of:
      0.28139406 = product of:
        1.1724753 = sum of:
          0.09337018 = weight(abstract_txt:stem in 3950) [ClassicSimilarity], result of:
            0.09337018 = score(doc=3950,freq=1.0), product of:
              0.14915757 = queryWeight, product of:
                1.2103492 = boost
                8.0125885 = idf(docFreq=39, maxDocs=44421)
                0.015380192 = queryNorm
              0.6259835 = fieldWeight in 3950, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.0125885 = idf(docFreq=39, maxDocs=44421)
                0.078125 = fieldNorm(doc=3950)
          0.032632217 = weight(abstract_txt:evaluation in 3950) [ClassicSimilarity], result of:
            0.032632217 = score(doc=3950,freq=1.0), product of:
              0.09324257 = queryWeight, product of:
                1.3533502 = boost
                4.479632 = idf(docFreq=1368, maxDocs=44421)
                0.015380192 = queryNorm
              0.34997123 = fieldWeight in 3950, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.479632 = idf(docFreq=1368, maxDocs=44421)
                0.078125 = fieldNorm(doc=3950)
          0.19104505 = weight(abstract_txt:stemmers in 3950) [ClassicSimilarity], result of:
            0.19104505 = score(doc=3950,freq=2.0), product of:
              0.19080381 = queryWeight, product of:
                1.368931 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.015380192 = queryNorm
              1.0012643 = fieldWeight in 3950, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.078125 = fieldNorm(doc=3950)
          0.1461635 = weight(abstract_txt:morphologically in 3950) [ClassicSimilarity], result of:
            0.1461635 = score(doc=3950,freq=1.0), product of:
              0.201094 = queryWeight, product of:
                1.4053601 = boost
                9.303573 = idf(docFreq=10, maxDocs=44421)
                0.015380192 = queryNorm
              0.7268416 = fieldWeight in 3950, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.303573 = idf(docFreq=10, maxDocs=44421)
                0.078125 = fieldNorm(doc=3950)
          0.2842015 = weight(abstract_txt:stemmer in 3950) [ClassicSimilarity], result of:
            0.2842015 = score(doc=3950,freq=1.0), product of:
              0.3947003 = queryWeight, product of:
                2.784433 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.015380192 = queryNorm
              0.72004384 = fieldWeight in 3950, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.078125 = fieldNorm(doc=3950)
          0.42506284 = weight(abstract_txt:stemming in 3950) [ClassicSimilarity], result of:
            0.42506284 = score(doc=3950,freq=2.0), product of:
              0.5162007 = queryWeight, product of:
                4.5032635 = boost
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.015380192 = queryNorm
              0.82344484 = fieldWeight in 3950, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.078125 = fieldNorm(doc=3950)
        0.24 = coord(6/25)