Document (#33002)

Author
Bacchin, M.
Ferro, N.
Melucci, M.
Title
¬A probabilistic model for stemmer generation
Source
Information processing and management. 41(2005) no.1, S.121-137
Year
2005
Abstract
In this paper we will present a language-independent probabilistic model which can automatically generate stemmers. Stemmers can improve the retrieval effectiveness of information retrieval systems, however the designing and the implementation of stemmers requires a laborious amount of effort due to the fact that documents and queries are often written or spoken in several different languages. The probabilistic model proposed in this paper aims at the development of stemmers used for several languages. The proposed model describes the mutual reinforcement relationship between stems and derivations and then provides a probabilistic interpretation. A series of experiments shows that the stemmers generated by the probabilistic model are as effective as the ones based on linguistic knowledge.
Theme
Computerlinguistik

Similar documents (author)

  1. Melucci, M.: Passage retrieval : a probabilistic technique (1998) 1.96
    1.9561697 = sum of:
      1.9561697 = product of:
        3.9123394 = sum of:
          3.9123394 = weight(author_txt:melucci in 2150) [ClassicSimilarity], result of:
            3.9123394 = score(doc=2150,freq=1.0), product of:
              0.6728322 = queryWeight, product of:
                9.303573 = idf(docFreq=10, maxDocs=44421)
                0.07231977 = queryNorm
              5.814733 = fieldWeight in 2150, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.303573 = idf(docFreq=10, maxDocs=44421)
                0.625 = fieldNorm(doc=2150)
        0.5 = coord(1/2)
    
  2. Melucci, M.: Making digital libraries effective : automatic generation of links for similarity search across hyper-textbooks (2004) 1.96
    1.9561697 = sum of:
      1.9561697 = product of:
        3.9123394 = sum of:
          3.9123394 = weight(author_txt:melucci in 3226) [ClassicSimilarity], result of:
            3.9123394 = score(doc=3226,freq=1.0), product of:
              0.6728322 = queryWeight, product of:
                9.303573 = idf(docFreq=10, maxDocs=44421)
                0.07231977 = queryNorm
              5.814733 = fieldWeight in 3226, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.303573 = idf(docFreq=10, maxDocs=44421)
                0.625 = fieldNorm(doc=3226)
        0.5 = coord(1/2)
    
  3. Melucci, M.: Contextual search : a computational framework (2012) 1.96
    1.9561697 = sum of:
      1.9561697 = product of:
        3.9123394 = sum of:
          3.9123394 = weight(author_txt:melucci in 913) [ClassicSimilarity], result of:
            3.9123394 = score(doc=913,freq=1.0), product of:
              0.6728322 = queryWeight, product of:
                9.303573 = idf(docFreq=10, maxDocs=44421)
                0.07231977 = queryNorm
              5.814733 = fieldWeight in 913, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.303573 = idf(docFreq=10, maxDocs=44421)
                0.625 = fieldNorm(doc=913)
        0.5 = coord(1/2)
    
  4. Ferro, N.; Silvello, G.: NESTOR: a formal model for digital archives (2013) 1.80
    1.8042781 = sum of:
      1.8042781 = product of:
        3.6085563 = sum of:
          3.6085563 = weight(author_txt:ferro in 3707) [ClassicSimilarity], result of:
            3.6085563 = score(doc=3707,freq=1.0), product of:
              0.739795 = queryWeight, product of:
                1.0485818 = boost
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.07231977 = queryNorm
              4.8777785 = fieldWeight in 3707, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.5 = fieldNorm(doc=3707)
        0.5 = coord(1/2)
    
  5. Ferro, N.; Silvello, G.: Toward an anatomy of IR system component performances (2018) 1.80
    1.8042781 = sum of:
      1.8042781 = product of:
        3.6085563 = sum of:
          3.6085563 = weight(author_txt:ferro in 35) [ClassicSimilarity], result of:
            3.6085563 = score(doc=35,freq=1.0), product of:
              0.739795 = queryWeight, product of:
                1.0485818 = boost
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.07231977 = queryNorm
              4.8777785 = fieldWeight in 35, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.5 = fieldNorm(doc=35)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Melucci, M.; Orio, N.: Design, implementation, and evaluation of a methodology for automatic stemmer generation (2007) 0.33
    0.32825777 = sum of:
      0.32825777 = product of:
        1.1723492 = sum of:
          0.033200584 = weight(abstract_txt:requires in 1268) [ClassicSimilarity], result of:
            0.033200584 = score(doc=1268,freq=1.0), product of:
              0.06158065 = queryWeight, product of:
                5.750825 = idf(docFreq=383, maxDocs=44421)
                0.010708142 = queryNorm
              0.53913987 = fieldWeight in 1268, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.750825 = idf(docFreq=383, maxDocs=44421)
                0.09375 = fieldNorm(doc=1268)
          0.033706512 = weight(abstract_txt:written in 1268) [ClassicSimilarity], result of:
            0.033706512 = score(doc=1268,freq=1.0), product of:
              0.06220467 = queryWeight, product of:
                1.0050539 = boost
                5.779889 = idf(docFreq=372, maxDocs=44421)
                0.010708142 = queryNorm
              0.54186463 = fieldWeight in 1268, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.779889 = idf(docFreq=372, maxDocs=44421)
                0.09375 = fieldNorm(doc=1268)
          0.048481856 = weight(abstract_txt:linguistic in 1268) [ClassicSimilarity], result of:
            0.048481856 = score(doc=1268,freq=2.0), product of:
              0.06291053 = queryWeight, product of:
                1.0107402 = boost
                5.8125896 = idf(docFreq=360, maxDocs=44421)
                0.010708142 = queryNorm
              0.77064776 = fieldWeight in 1268, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.8125896 = idf(docFreq=360, maxDocs=44421)
                0.09375 = fieldNorm(doc=1268)
          0.03509023 = weight(abstract_txt:amount in 1268) [ClassicSimilarity], result of:
            0.03509023 = score(doc=1268,freq=1.0), product of:
              0.06389565 = queryWeight, product of:
                1.0186231 = boost
                5.857923 = idf(docFreq=344, maxDocs=44421)
                0.010708142 = queryNorm
              0.54918027 = fieldWeight in 1268, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.857923 = idf(docFreq=344, maxDocs=44421)
                0.09375 = fieldNorm(doc=1268)
          0.034161784 = weight(abstract_txt:proposed in 1268) [ClassicSimilarity], result of:
            0.034161784 = score(doc=1268,freq=1.0), product of:
              0.07907712 = queryWeight, product of:
                1.6025747 = boost
                4.608063 = idf(docFreq=1203, maxDocs=44421)
                0.010708142 = queryNorm
              0.43200594 = fieldWeight in 1268, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.608063 = idf(docFreq=1203, maxDocs=44421)
                0.09375 = fieldNorm(doc=1268)
          0.06901325 = weight(abstract_txt:languages in 1268) [ClassicSimilarity], result of:
            0.06901325 = score(doc=1268,freq=2.0), product of:
              0.10030028 = queryWeight, product of:
                1.8048618 = boost
                5.189722 = idf(docFreq=672, maxDocs=44421)
                0.010708142 = queryNorm
              0.6880664 = fieldWeight in 1268, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.189722 = idf(docFreq=672, maxDocs=44421)
                0.09375 = fieldNorm(doc=1268)
          0.918695 = weight(abstract_txt:stemmers in 1268) [ClassicSimilarity], result of:
            0.918695 = score(doc=1268,freq=2.0), product of:
              0.7646125 = queryWeight, product of:
                7.879226 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.010708142 = queryNorm
              1.2015171 = fieldWeight in 1268, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.09375 = fieldNorm(doc=1268)
        0.28 = coord(7/25)
    
  2. Dolamic, L.; Savoy, J.: Indexing and searching strategies for the Russian language (2009) 0.29
    0.28557757 = sum of:
      0.28557757 = product of:
        1.0199199 = sum of:
          0.022821954 = weight(abstract_txt:independent in 288) [ClassicSimilarity], result of:
            0.022821954 = score(doc=288,freq=1.0), product of:
              0.06285066 = queryWeight, product of:
                1.0102592 = boost
                5.8098235 = idf(docFreq=361, maxDocs=44421)
                0.010708142 = queryNorm
              0.36311397 = fieldWeight in 288, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8098235 = idf(docFreq=361, maxDocs=44421)
                0.0625 = fieldNorm(doc=288)
          0.009654472 = weight(abstract_txt:paper in 288) [ClassicSimilarity], result of:
            0.009654472 = score(doc=288,freq=1.0), product of:
              0.044624187 = queryWeight, product of:
                1.2038656 = boost
                3.4616103 = idf(docFreq=3788, maxDocs=44421)
                0.010708142 = queryNorm
              0.21635064 = fieldWeight in 288, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4616103 = idf(docFreq=3788, maxDocs=44421)
                0.0625 = fieldNorm(doc=288)
          0.013830431 = weight(abstract_txt:retrieval in 288) [ClassicSimilarity], result of:
            0.013830431 = score(doc=288,freq=2.0), product of:
              0.045008905 = queryWeight, product of:
                1.209044 = boost
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.010708142 = queryNorm
              0.3072821 = fieldWeight in 288, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.0625 = fieldNorm(doc=288)
          0.12885039 = weight(abstract_txt:stemmer in 288) [ClassicSimilarity], result of:
            0.12885039 = score(doc=288,freq=2.0), product of:
              0.15816917 = queryWeight, product of:
                1.6026503 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.010708142 = queryNorm
              0.8146366 = fieldWeight in 288, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.0625 = fieldNorm(doc=288)
          0.05198866 = weight(abstract_txt:model in 288) [ClassicSimilarity], result of:
            0.05198866 = score(doc=288,freq=2.0), product of:
              0.14768183 = queryWeight, product of:
                3.462792 = boost
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.010708142 = queryNorm
              0.35203153 = fieldWeight in 288, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.0625 = fieldNorm(doc=288)
          0.18031062 = weight(abstract_txt:probabilistic in 288) [ClassicSimilarity], result of:
            0.18031062 = score(doc=288,freq=1.0), product of:
              0.42632964 = queryWeight, product of:
                5.883498 = boost
                6.7669935 = idf(docFreq=138, maxDocs=44421)
                0.010708142 = queryNorm
              0.4229371 = fieldWeight in 288, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7669935 = idf(docFreq=138, maxDocs=44421)
                0.0625 = fieldNorm(doc=288)
          0.61246336 = weight(abstract_txt:stemmers in 288) [ClassicSimilarity], result of:
            0.61246336 = score(doc=288,freq=2.0), product of:
              0.7646125 = queryWeight, product of:
                7.879226 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.010708142 = queryNorm
              0.80101144 = fieldWeight in 288, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.0625 = fieldNorm(doc=288)
        0.28 = coord(7/25)
    
  3. Fox, B.; Fox, C.J.: Efficient stemmer generation (2002) 0.28
    0.278677 = sum of:
      0.278677 = product of:
        1.7417313 = sum of:
          0.055612907 = weight(abstract_txt:written in 3585) [ClassicSimilarity], result of:
            0.055612907 = score(doc=3585,freq=2.0), product of:
              0.06220467 = queryWeight, product of:
                1.0050539 = boost
                5.779889 = idf(docFreq=372, maxDocs=44421)
                0.010708142 = queryNorm
              0.89403105 = fieldWeight in 3585, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.779889 = idf(docFreq=372, maxDocs=44421)
                0.109375 = fieldNorm(doc=3585)
          0.016895326 = weight(abstract_txt:paper in 3585) [ClassicSimilarity], result of:
            0.016895326 = score(doc=3585,freq=1.0), product of:
              0.044624187 = queryWeight, product of:
                1.2038656 = boost
                3.4616103 = idf(docFreq=3788, maxDocs=44421)
                0.010708142 = queryNorm
              0.37861362 = fieldWeight in 3585, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4616103 = idf(docFreq=3788, maxDocs=44421)
                0.109375 = fieldNorm(doc=3585)
          0.35652816 = weight(abstract_txt:stemmer in 3585) [ClassicSimilarity], result of:
            0.35652816 = score(doc=3585,freq=5.0), product of:
              0.15816917 = queryWeight, product of:
                1.6026503 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.010708142 = queryNorm
              2.254094 = fieldWeight in 3585, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.109375 = fieldNorm(doc=3585)
          1.3126949 = weight(abstract_txt:stemmers in 3585) [ClassicSimilarity], result of:
            1.3126949 = score(doc=3585,freq=3.0), product of:
              0.7646125 = queryWeight, product of:
                7.879226 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.010708142 = queryNorm
              1.7168107 = fieldWeight in 3585, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.109375 = fieldNorm(doc=3585)
        0.16 = coord(4/25)
    
  4. Xu, J.; Weischedel, R.: Empirical studies on the impact of lexical resources on CLIR performance (2005) 0.21
    0.20744541 = sum of:
      0.20744541 = product of:
        0.8643559 = sum of:
          0.01206809 = weight(abstract_txt:paper in 2020) [ClassicSimilarity], result of:
            0.01206809 = score(doc=2020,freq=1.0), product of:
              0.044624187 = queryWeight, product of:
                1.2038656 = boost
                3.4616103 = idf(docFreq=3788, maxDocs=44421)
                0.010708142 = queryNorm
              0.2704383 = fieldWeight in 2020, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4616103 = idf(docFreq=3788, maxDocs=44421)
                0.078125 = fieldNorm(doc=2020)
          0.012224489 = weight(abstract_txt:retrieval in 2020) [ClassicSimilarity], result of:
            0.012224489 = score(doc=2020,freq=1.0), product of:
              0.045008905 = queryWeight, product of:
                1.209044 = boost
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.010708142 = queryNorm
              0.27160156 = fieldWeight in 2020, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.078125 = fieldNorm(doc=2020)
          0.027376922 = weight(abstract_txt:several in 2020) [ClassicSimilarity], result of:
            0.027376922 = score(doc=2020,freq=1.0), product of:
              0.07704321 = queryWeight, product of:
                1.5818309 = boost
                4.548416 = idf(docFreq=1277, maxDocs=44421)
                0.010708142 = queryNorm
              0.355345 = fieldWeight in 2020, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.548416 = idf(docFreq=1277, maxDocs=44421)
                0.078125 = fieldNorm(doc=2020)
          0.045951918 = weight(abstract_txt:model in 2020) [ClassicSimilarity], result of:
            0.045951918 = score(doc=2020,freq=1.0), product of:
              0.14768183 = queryWeight, product of:
                3.462792 = boost
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.010708142 = queryNorm
              0.31115484 = fieldWeight in 2020, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.078125 = fieldNorm(doc=2020)
          0.22538829 = weight(abstract_txt:probabilistic in 2020) [ClassicSimilarity], result of:
            0.22538829 = score(doc=2020,freq=1.0), product of:
              0.42632964 = queryWeight, product of:
                5.883498 = boost
                6.7669935 = idf(docFreq=138, maxDocs=44421)
                0.010708142 = queryNorm
              0.5286714 = fieldWeight in 2020, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7669935 = idf(docFreq=138, maxDocs=44421)
                0.078125 = fieldNorm(doc=2020)
          0.54134625 = weight(abstract_txt:stemmers in 2020) [ClassicSimilarity], result of:
            0.54134625 = score(doc=2020,freq=1.0), product of:
              0.7646125 = queryWeight, product of:
                7.879226 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.010708142 = queryNorm
              0.7080008 = fieldWeight in 2020, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.078125 = fieldNorm(doc=2020)
        0.24 = coord(6/25)
    
  5. Flores, F.N.; Moreira, V.P.: Assessing the impact of stemming accuracy on information retrieval : a multilingual perspective (2016) 0.17
    0.16676952 = sum of:
      0.16676952 = product of:
        1.0423095 = sum of:
          0.03666916 = weight(abstract_txt:ones in 4187) [ClassicSimilarity], result of:
            0.03666916 = score(doc=4187,freq=1.0), product of:
              0.07430225 = queryWeight, product of:
                1.0984464 = boost
                6.3169727 = idf(docFreq=217, maxDocs=44421)
                0.010708142 = queryNorm
              0.4935135 = fieldWeight in 4187, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3169727 = idf(docFreq=217, maxDocs=44421)
                0.078125 = fieldNorm(doc=4187)
          0.02733479 = weight(abstract_txt:retrieval in 4187) [ClassicSimilarity], result of:
            0.02733479 = score(doc=4187,freq=5.0), product of:
              0.045008905 = queryWeight, product of:
                1.209044 = boost
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.010708142 = queryNorm
              0.6073196 = fieldWeight in 4187, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.078125 = fieldNorm(doc=4187)
          0.040666454 = weight(abstract_txt:languages in 4187) [ClassicSimilarity], result of:
            0.040666454 = score(doc=4187,freq=1.0), product of:
              0.10030028 = queryWeight, product of:
                1.8048618 = boost
                5.189722 = idf(docFreq=672, maxDocs=44421)
                0.010708142 = queryNorm
              0.40544704 = fieldWeight in 4187, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.189722 = idf(docFreq=672, maxDocs=44421)
                0.078125 = fieldNorm(doc=4187)
          0.9376392 = weight(abstract_txt:stemmers in 4187) [ClassicSimilarity], result of:
            0.9376392 = score(doc=4187,freq=3.0), product of:
              0.7646125 = queryWeight, product of:
                7.879226 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.010708142 = queryNorm
              1.2262933 = fieldWeight in 4187, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.078125 = fieldNorm(doc=4187)
        0.16 = coord(4/25)