Document (#33002)

Bacchin, M.
Ferro, N.
Melucci, M.
¬A probabilistic model for stemmer generation
Information processing and management. 41(2005) no.1, S.121-137
In this paper we will present a language-independent probabilistic model which can automatically generate stemmers. Stemmers can improve the retrieval effectiveness of information retrieval systems, however the designing and the implementation of stemmers requires a laborious amount of effort due to the fact that documents and queries are often written or spoken in several different languages. The probabilistic model proposed in this paper aims at the development of stemmers used for several languages. The proposed model describes the mutual reinforcement relationship between stems and derivations and then provides a probabilistic interpretation. A series of experiments shows that the stemmers generated by the probabilistic model are as effective as the ones based on linguistic knowledge.

Similar documents (author)

  1. Melucci, M.: Passage retrieval : a probabilistic technique (1998) 1.96
    1.9551578 = sum of:
      1.9551578 = product of:
        3.9103155 = sum of:
          3.9103155 = weight(author_txt:melucci in 1150) [ClassicSimilarity], result of:
            3.9103155 = score(doc=1150,freq=1.0), product of:
              0.6728154 = queryWeight, product of:
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.07235358 = queryNorm
              5.81187 = fieldWeight in 1150, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.625 = fieldNorm(doc=1150)
        0.5 = coord(1/2)
  2. Melucci, M.: Making digital libraries effective : automatic generation of links for similarity search across hyper-textbooks (2004) 1.96
    1.9551578 = sum of:
      1.9551578 = product of:
        3.9103155 = sum of:
          3.9103155 = weight(author_txt:melucci in 2226) [ClassicSimilarity], result of:
            3.9103155 = score(doc=2226,freq=1.0), product of:
              0.6728154 = queryWeight, product of:
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.07235358 = queryNorm
              5.81187 = fieldWeight in 2226, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.625 = fieldNorm(doc=2226)
        0.5 = coord(1/2)
  3. Melucci, M.: Contextual search : a computational framework (2012) 1.96
    1.9551578 = sum of:
      1.9551578 = product of:
        3.9103155 = sum of:
          3.9103155 = weight(author_txt:melucci in 4913) [ClassicSimilarity], result of:
            3.9103155 = score(doc=4913,freq=1.0), product of:
              0.6728154 = queryWeight, product of:
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.07235358 = queryNorm
              5.81187 = fieldWeight in 4913, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.625 = fieldNorm(doc=4913)
        0.5 = coord(1/2)
  4. Ferro, N.; Silvello, G.: NESTOR: a formal model for digital archives (2013) 1.80
    1.8034687 = sum of:
      1.8034687 = product of:
        3.6069374 = sum of:
          3.6069374 = weight(author_txt:ferro in 2707) [ClassicSimilarity], result of:
            3.6069374 = score(doc=2707,freq=1.0), product of:
              0.7398104 = queryWeight, product of:
                1.0486058 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.07235358 = queryNorm
              4.8754888 = fieldWeight in 2707, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.5 = fieldNorm(doc=2707)
        0.5 = coord(1/2)
  5. Ferro, N.; Silvello, G.: Toward an anatomy of IR system component performances (2018) 1.80
    1.8034687 = sum of:
      1.8034687 = product of:
        3.6069374 = sum of:
          3.6069374 = weight(author_txt:ferro in 4035) [ClassicSimilarity], result of:
            3.6069374 = score(doc=4035,freq=1.0), product of:
              0.7398104 = queryWeight, product of:
                1.0486058 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.07235358 = queryNorm
              4.8754888 = fieldWeight in 4035, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.5 = fieldNorm(doc=4035)
        0.5 = coord(1/2)

Similar documents (content)

  1. Melucci, M.; Orio, N.: Design, implementation, and evaluation of a methodology for automatic stemmer generation (2007) 0.33
    0.328059 = sum of:
      0.328059 = product of:
        1.1716392 = sum of:
          0.033487804 = weight(abstract_txt:requires in 268) [ClassicSimilarity], result of:
            0.033487804 = score(doc=268,freq=1.0), product of:
              0.061935976 = queryWeight, product of:
                5.767298 = idf(docFreq=375, maxDocs=44218)
                0.010739166 = queryNorm
              0.5406842 = fieldWeight in 268, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.767298 = idf(docFreq=375, maxDocs=44218)
                0.09375 = fieldNorm(doc=268)
          0.033721533 = weight(abstract_txt:written in 268) [ClassicSimilarity], result of:
            0.033721533 = score(doc=268,freq=1.0), product of:
              0.062223833 = queryWeight, product of:
                1.0023211 = boost
                5.780685 = idf(docFreq=370, maxDocs=44218)
                0.010739166 = queryNorm
              0.5419392 = fieldWeight in 268, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.780685 = idf(docFreq=370, maxDocs=44218)
                0.09375 = fieldNorm(doc=268)
          0.048788857 = weight(abstract_txt:linguistic in 268) [ClassicSimilarity], result of:
            0.048788857 = score(doc=268,freq=2.0), product of:
              0.06317651 = queryWeight, product of:
                1.0099651 = boost
                5.8247695 = idf(docFreq=354, maxDocs=44218)
                0.010739166 = queryNorm
              0.7722626 = fieldWeight in 268, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.8247695 = idf(docFreq=354, maxDocs=44218)
                0.09375 = fieldNorm(doc=268)
          0.035166074 = weight(abstract_txt:amount in 268) [ClassicSimilarity], result of:
            0.035166074 = score(doc=268,freq=1.0), product of:
              0.06398838 = queryWeight, product of:
                1.0164337 = boost
                5.8620763 = idf(docFreq=341, maxDocs=44218)
                0.010739166 = queryNorm
              0.54956967 = fieldWeight in 268, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8620763 = idf(docFreq=341, maxDocs=44218)
                0.09375 = fieldNorm(doc=268)
          0.034190714 = weight(abstract_txt:proposed in 268) [ClassicSimilarity], result of:
            0.034190714 = score(doc=268,freq=1.0), product of:
              0.07912262 = queryWeight, product of:
                1.5984308 = boost
                4.6093135 = idf(docFreq=1196, maxDocs=44218)
                0.010739166 = queryNorm
              0.43212312 = fieldWeight in 268, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6093135 = idf(docFreq=1196, maxDocs=44218)
                0.09375 = fieldNorm(doc=268)
          0.06895154 = weight(abstract_txt:languages in 268) [ClassicSimilarity], result of:
            0.06895154 = score(doc=268,freq=2.0), product of:
              0.100241564 = queryWeight, product of:
                1.7991502 = boost
                5.188118 = idf(docFreq=670, maxDocs=44218)
                0.010739166 = queryNorm
              0.68785375 = fieldWeight in 268, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.188118 = idf(docFreq=670, maxDocs=44218)
                0.09375 = fieldNorm(doc=268)
          0.9173327 = weight(abstract_txt:stemmers in 268) [ClassicSimilarity], result of:
            0.9173327 = score(doc=268,freq=2.0), product of:
              0.76386476 = queryWeight, product of:
                7.85275 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.010739166 = queryNorm
              1.2009099 = fieldWeight in 268, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.09375 = fieldNorm(doc=268)
        0.28 = coord(7/25)
  2. Dolamic, L.; Savoy, J.: Indexing and searching strategies for the Russian language (2009) 0.29
    0.28555468 = sum of:
      0.28555468 = product of:
        1.0198381 = sum of:
          0.022866817 = weight(abstract_txt:independent in 3301) [ClassicSimilarity], result of:
            0.022866817 = score(doc=3301,freq=1.0), product of:
              0.06293369 = queryWeight, product of:
                1.0080222 = boost
                5.813565 = idf(docFreq=358, maxDocs=44218)
                0.010739166 = queryNorm
              0.3633478 = fieldWeight in 3301, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.813565 = idf(docFreq=358, maxDocs=44218)
                0.0625 = fieldNorm(doc=3301)
          0.00970311 = weight(abstract_txt:paper in 3301) [ClassicSimilarity], result of:
            0.00970311 = score(doc=3301,freq=1.0), product of:
              0.044774424 = queryWeight, product of:
                1.2024264 = boost
                3.467376 = idf(docFreq=3749, maxDocs=44218)
                0.010739166 = queryNorm
              0.216711 = fieldWeight in 3301, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.467376 = idf(docFreq=3749, maxDocs=44218)
                0.0625 = fieldNorm(doc=3301)
          0.013814649 = weight(abstract_txt:retrieval in 3301) [ClassicSimilarity], result of:
            0.013814649 = score(doc=3301,freq=2.0), product of:
              0.04497515 = queryWeight, product of:
                1.2051187 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.010739166 = queryNorm
              0.3071618 = fieldWeight in 3301, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.0625 = fieldNorm(doc=3301)
          0.12866257 = weight(abstract_txt:stemmer in 3301) [ClassicSimilarity], result of:
            0.12866257 = score(doc=3301,freq=2.0), product of:
              0.15801714 = queryWeight, product of:
                1.5972784 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.010739166 = queryNorm
              0.81423175 = fieldWeight in 3301, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.0625 = fieldNorm(doc=3301)
          0.052125655 = weight(abstract_txt:model in 3301) [ClassicSimilarity], result of:
            0.052125655 = score(doc=3301,freq=2.0), product of:
              0.14794277 = queryWeight, product of:
                3.4558938 = boost
                3.986234 = idf(docFreq=2231, maxDocs=44218)
                0.010739166 = queryNorm
              0.35233662 = fieldWeight in 3301, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.986234 = idf(docFreq=2231, maxDocs=44218)
                0.0625 = fieldNorm(doc=3301)
          0.18111011 = weight(abstract_txt:probabilistic in 3301) [ClassicSimilarity], result of:
            0.18111011 = score(doc=3301,freq=1.0), product of:
              0.4275936 = queryWeight, product of:
                5.875287 = boost
                6.7769065 = idf(docFreq=136, maxDocs=44218)
                0.010739166 = queryNorm
              0.42355666 = fieldWeight in 3301, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7769065 = idf(docFreq=136, maxDocs=44218)
                0.0625 = fieldNorm(doc=3301)
          0.61155516 = weight(abstract_txt:stemmers in 3301) [ClassicSimilarity], result of:
            0.61155516 = score(doc=3301,freq=2.0), product of:
              0.76386476 = queryWeight, product of:
                7.85275 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.010739166 = queryNorm
              0.8006066 = fieldWeight in 3301, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.0625 = fieldNorm(doc=3301)
        0.28 = coord(7/25)
  3. Fox, B.; Fox, C.J.: Efficient stemmer generation (2002) 0.28
    0.2783 = sum of:
      0.2783 = product of:
        1.7393749 = sum of:
          0.055637695 = weight(abstract_txt:written in 2585) [ClassicSimilarity], result of:
            0.055637695 = score(doc=2585,freq=2.0), product of:
              0.062223833 = queryWeight, product of:
                1.0023211 = boost
                5.780685 = idf(docFreq=370, maxDocs=44218)
                0.010739166 = queryNorm
              0.8941541 = fieldWeight in 2585, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.780685 = idf(docFreq=370, maxDocs=44218)
                0.109375 = fieldNorm(doc=2585)
          0.016980443 = weight(abstract_txt:paper in 2585) [ClassicSimilarity], result of:
            0.016980443 = score(doc=2585,freq=1.0), product of:
              0.044774424 = queryWeight, product of:
                1.2024264 = boost
                3.467376 = idf(docFreq=3749, maxDocs=44218)
                0.010739166 = queryNorm
              0.37924424 = fieldWeight in 2585, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.467376 = idf(docFreq=3749, maxDocs=44218)
                0.109375 = fieldNorm(doc=2585)
          0.35600844 = weight(abstract_txt:stemmer in 2585) [ClassicSimilarity], result of:
            0.35600844 = score(doc=2585,freq=5.0), product of:
              0.15801714 = queryWeight, product of:
                1.5972784 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.010739166 = queryNorm
              2.2529736 = fieldWeight in 2585, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.109375 = fieldNorm(doc=2585)
          1.3107483 = weight(abstract_txt:stemmers in 2585) [ClassicSimilarity], result of:
            1.3107483 = score(doc=2585,freq=3.0), product of:
              0.76386476 = queryWeight, product of:
                7.85275 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.010739166 = queryNorm
              1.715943 = fieldWeight in 2585, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.109375 = fieldNorm(doc=2585)
        0.16 = coord(4/25)
  4. Xu, J.; Weischedel, R.: Empirical studies on the impact of lexical resources on CLIR performance (2005) 0.21
    0.20753026 = sum of:
      0.20753026 = product of:
        0.86470944 = sum of:
          0.012128888 = weight(abstract_txt:paper in 1020) [ClassicSimilarity], result of:
            0.012128888 = score(doc=1020,freq=1.0), product of:
              0.044774424 = queryWeight, product of:
                1.2024264 = boost
                3.467376 = idf(docFreq=3749, maxDocs=44218)
                0.010739166 = queryNorm
              0.27088875 = fieldWeight in 1020, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.467376 = idf(docFreq=3749, maxDocs=44218)
                0.078125 = fieldNorm(doc=1020)
          0.01221054 = weight(abstract_txt:retrieval in 1020) [ClassicSimilarity], result of:
            0.01221054 = score(doc=1020,freq=1.0), product of:
              0.04497515 = queryWeight, product of:
                1.2051187 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.010739166 = queryNorm
              0.27149525 = fieldWeight in 1020, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.078125 = fieldNorm(doc=1020)
          0.02736589 = weight(abstract_txt:several in 1020) [ClassicSimilarity], result of:
            0.02736589 = score(doc=1020,freq=1.0), product of:
              0.07702335 = queryWeight, product of:
                1.5770836 = boost
                4.5477557 = idf(docFreq=1272, maxDocs=44218)
                0.010739166 = queryNorm
              0.35529342 = fieldWeight in 1020, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5477557 = idf(docFreq=1272, maxDocs=44218)
                0.078125 = fieldNorm(doc=1020)
          0.046073005 = weight(abstract_txt:model in 1020) [ClassicSimilarity], result of:
            0.046073005 = score(doc=1020,freq=1.0), product of:
              0.14794277 = queryWeight, product of:
                3.4558938 = boost
                3.986234 = idf(docFreq=2231, maxDocs=44218)
                0.010739166 = queryNorm
              0.31142452 = fieldWeight in 1020, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.986234 = idf(docFreq=2231, maxDocs=44218)
                0.078125 = fieldNorm(doc=1020)
          0.22638763 = weight(abstract_txt:probabilistic in 1020) [ClassicSimilarity], result of:
            0.22638763 = score(doc=1020,freq=1.0), product of:
              0.4275936 = queryWeight, product of:
                5.875287 = boost
                6.7769065 = idf(docFreq=136, maxDocs=44218)
                0.010739166 = queryNorm
              0.5294458 = fieldWeight in 1020, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7769065 = idf(docFreq=136, maxDocs=44218)
                0.078125 = fieldNorm(doc=1020)
          0.5405435 = weight(abstract_txt:stemmers in 1020) [ClassicSimilarity], result of:
            0.5405435 = score(doc=1020,freq=1.0), product of:
              0.76386476 = queryWeight, product of:
                7.85275 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.010739166 = queryNorm
              0.707643 = fieldWeight in 1020, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.078125 = fieldNorm(doc=1020)
        0.24 = coord(6/25)
  5. Flores, F.N.; Moreira, V.P.: Assessing the impact of stemming accuracy on information retrieval : a multilingual perspective (2016) 0.17
    0.16657539 = sum of:
      0.16657539 = product of:
        1.0410962 = sum of:
          0.03691364 = weight(abstract_txt:ones in 3187) [ClassicSimilarity], result of:
            0.03691364 = score(doc=3187,freq=1.0), product of:
              0.07463295 = queryWeight, product of:
                1.0977256 = boost
                6.330911 = idf(docFreq=213, maxDocs=44218)
                0.010739166 = queryNorm
              0.49460244 = fieldWeight in 3187, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.330911 = idf(docFreq=213, maxDocs=44218)
                0.078125 = fieldNorm(doc=3187)
          0.027303599 = weight(abstract_txt:retrieval in 3187) [ClassicSimilarity], result of:
            0.027303599 = score(doc=3187,freq=5.0), product of:
              0.04497515 = queryWeight, product of:
                1.2051187 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.010739166 = queryNorm
              0.6070819 = fieldWeight in 3187, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.078125 = fieldNorm(doc=3187)
          0.040630084 = weight(abstract_txt:languages in 3187) [ClassicSimilarity], result of:
            0.040630084 = score(doc=3187,freq=1.0), product of:
              0.100241564 = queryWeight, product of:
                1.7991502 = boost
                5.188118 = idf(docFreq=670, maxDocs=44218)
                0.010739166 = queryNorm
              0.40532172 = fieldWeight in 3187, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.188118 = idf(docFreq=670, maxDocs=44218)
                0.078125 = fieldNorm(doc=3187)
          0.93624884 = weight(abstract_txt:stemmers in 3187) [ClassicSimilarity], result of:
            0.93624884 = score(doc=3187,freq=3.0), product of:
              0.76386476 = queryWeight, product of:
                7.85275 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.010739166 = queryNorm
              1.2256736 = fieldWeight in 3187, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.078125 = fieldNorm(doc=3187)
        0.16 = coord(4/25)