Document (#32269)

Author
Melucci, M.
Orio, N.
Title
Design, implementation, and evaluation of a methodology for automatic stemmer generation
Source
Journal of the American Society for Information Science and Technology. 58(2007) no.5, S.673-686
Year
2007
Abstract
The authors describe a statistical approach based on hidden Markov models (HMMs), for generating stemmers automatically. The proposed approach requires little effort to insert new languages in the system even if minimal linguistic knowledge is available. This is a key advantage especially for digital libraries, which are often developed for a specific institution or government because the program can manage a great amount of documents written in local languages. The evaluation described in the article shows that the stemmers implemented by means of HMMs are as effective as those based on linguistic rules.
Theme
Computerlinguistik

Similar documents (author)

  1. Melucci, M.: Passage retrieval : a probabilistic technique (1998) 5.81
    5.814733 = sum of:
      5.814733 = weight(author_txt:melucci in 2150) [ClassicSimilarity], result of:
        5.814733 = fieldWeight in 2150, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.303573 = idf(docFreq=10, maxDocs=44421)
          0.625 = fieldNorm(doc=2150)
    
  2. Melucci, M.: Making digital libraries effective : automatic generation of links for similarity search across hyper-textbooks (2004) 5.81
    5.814733 = sum of:
      5.814733 = weight(author_txt:melucci in 3226) [ClassicSimilarity], result of:
        5.814733 = fieldWeight in 3226, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.303573 = idf(docFreq=10, maxDocs=44421)
          0.625 = fieldNorm(doc=3226)
    
  3. Melucci, M.: Contextual search : a computational framework (2012) 5.81
    5.814733 = sum of:
      5.814733 = weight(author_txt:melucci in 913) [ClassicSimilarity], result of:
        5.814733 = fieldWeight in 913, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.303573 = idf(docFreq=10, maxDocs=44421)
          0.625 = fieldNorm(doc=913)
    
  4. Agosti, M.; Melucci, M.: Information retrieval techniques for the automatic construction of hypertext (2000) 4.65
    4.6517863 = sum of:
      4.6517863 = weight(author_txt:melucci in 5671) [ClassicSimilarity], result of:
        4.6517863 = fieldWeight in 5671, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.303573 = idf(docFreq=10, maxDocs=44421)
          0.5 = fieldNorm(doc=5671)
    
  5. Melucci, M.; Orio, N.: Combining melody processing and information retrieval techniques : methodology, evaluation, and system implementation (2004) 4.65
    4.6517863 = sum of:
      4.6517863 = weight(author_txt:melucci in 4087) [ClassicSimilarity], result of:
        4.6517863 = fieldWeight in 4087, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.303573 = idf(docFreq=10, maxDocs=44421)
          0.5 = fieldNorm(doc=4087)
    

Similar documents (content)

  1. Fox, B.; Fox, C.J.: Efficient stemmer generation (2002) 0.49
    0.49037108 = sum of:
      0.49037108 = product of:
        2.043213 = sum of:
          0.0685018 = weight(abstract_txt:generation in 3585) [ClassicSimilarity], result of:
            0.0685018 = score(doc=3585,freq=1.0), product of:
              0.11159212 = queryWeight, product of:
                1.0118655 = boost
                5.612423 = idf(docFreq=440, maxDocs=44421)
                0.0196499 = queryNorm
              0.61385876 = fieldWeight in 3585, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.612423 = idf(docFreq=440, maxDocs=44421)
                0.109375 = fieldNorm(doc=3585)
          0.07148178 = weight(abstract_txt:program in 3585) [ClassicSimilarity], result of:
            0.07148178 = score(doc=3585,freq=1.0), product of:
              0.11480544 = queryWeight, product of:
                1.0263306 = boost
                5.6926546 = idf(docFreq=406, maxDocs=44421)
                0.0196499 = queryNorm
              0.6226341 = fieldWeight in 3585, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6926546 = idf(docFreq=406, maxDocs=44421)
                0.109375 = fieldNorm(doc=3585)
          0.105809435 = weight(abstract_txt:written in 3585) [ClassicSimilarity], result of:
            0.105809435 = score(doc=3585,freq=2.0), product of:
              0.11835096 = queryWeight, product of:
                1.0420581 = boost
                5.779889 = idf(docFreq=372, maxDocs=44421)
                0.0196499 = queryNorm
              0.89403105 = fieldWeight in 3585, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.779889 = idf(docFreq=372, maxDocs=44421)
                0.109375 = fieldNorm(doc=3585)
          0.120071 = weight(abstract_txt:generating in 3585) [ClassicSimilarity], result of:
            0.120071 = score(doc=3585,freq=1.0), product of:
              0.16222744 = queryWeight, product of:
                1.2200234 = boost
                6.7669935 = idf(docFreq=138, maxDocs=44421)
                0.0196499 = queryNorm
              0.7401399 = fieldWeight in 3585, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7669935 = idf(docFreq=138, maxDocs=44421)
                0.109375 = fieldNorm(doc=3585)
          0.67833245 = weight(abstract_txt:stemmer in 3585) [ClassicSimilarity], result of:
            0.67833245 = score(doc=3585,freq=5.0), product of:
              0.30093354 = queryWeight, product of:
                1.6616569 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.0196499 = queryNorm
              2.254094 = fieldWeight in 3585, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.109375 = fieldNorm(doc=3585)
          0.99901646 = weight(abstract_txt:stemmers in 3585) [ClassicSimilarity], result of:
            0.99901646 = score(doc=3585,freq=3.0), product of:
              0.5819025 = queryWeight, product of:
                3.2677298 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.0196499 = queryNorm
              1.7168107 = fieldWeight in 3585, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.109375 = fieldNorm(doc=3585)
        0.24 = coord(6/25)
    
  2. Bacchin, M.; Ferro, N.; Melucci, M.: ¬A probabilistic model for stemmer generation (2005) 0.44
    0.43869966 = sum of:
      0.43869966 = product of:
        1.3709365 = sum of:
          0.052006066 = weight(abstract_txt:effort in 2001) [ClassicSimilarity], result of:
            0.052006066 = score(doc=2001,freq=1.0), product of:
              0.11622162 = queryWeight, product of:
                1.0326413 = boost
                5.727658 = idf(docFreq=392, maxDocs=44421)
                0.0196499 = queryNorm
              0.44747326 = fieldWeight in 2001, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.727658 = idf(docFreq=392, maxDocs=44421)
                0.078125 = fieldNorm(doc=2001)
          0.052639674 = weight(abstract_txt:requires in 2001) [ClassicSimilarity], result of:
            0.052639674 = score(doc=2001,freq=1.0), product of:
              0.11716369 = queryWeight, product of:
                1.036818 = boost
                5.750825 = idf(docFreq=383, maxDocs=44421)
                0.0196499 = queryNorm
              0.44928318 = fieldWeight in 2001, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.750825 = idf(docFreq=383, maxDocs=44421)
                0.078125 = fieldNorm(doc=2001)
          0.05344183 = weight(abstract_txt:written in 2001) [ClassicSimilarity], result of:
            0.05344183 = score(doc=2001,freq=1.0), product of:
              0.11835096 = queryWeight, product of:
                1.0420581 = boost
                5.779889 = idf(docFreq=372, maxDocs=44421)
                0.0196499 = queryNorm
              0.45155382 = fieldWeight in 2001, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.779889 = idf(docFreq=372, maxDocs=44421)
                0.078125 = fieldNorm(doc=2001)
          0.05563573 = weight(abstract_txt:amount in 2001) [ClassicSimilarity], result of:
            0.05563573 = score(doc=2001,freq=1.0), product of:
              0.12156823 = queryWeight, product of:
                1.0561268 = boost
                5.857923 = idf(docFreq=344, maxDocs=44421)
                0.0196499 = queryNorm
              0.45765024 = fieldWeight in 2001, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.857923 = idf(docFreq=344, maxDocs=44421)
                0.078125 = fieldNorm(doc=2001)
          0.017852273 = weight(abstract_txt:based in 2001) [ClassicSimilarity], result of:
            0.017852273 = score(doc=2001,freq=1.0), product of:
              0.07178878 = queryWeight, product of:
                1.1477553 = boost
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.0196499 = queryNorm
              0.24867775 = fieldWeight in 2001, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.078125 = fieldNorm(doc=2001)
          0.10942085 = weight(abstract_txt:languages in 2001) [ClassicSimilarity], result of:
            0.10942085 = score(doc=2001,freq=2.0), product of:
              0.1908319 = queryWeight, product of:
                1.8713133 = boost
                5.189722 = idf(docFreq=672, maxDocs=44421)
                0.0196499 = queryNorm
              0.5733887 = fieldWeight in 2001, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.189722 = idf(docFreq=672, maxDocs=44421)
                0.078125 = fieldNorm(doc=2001)
          0.10870807 = weight(abstract_txt:linguistic in 2001) [ClassicSimilarity], result of:
            0.10870807 = score(doc=2001,freq=1.0), product of:
              0.23938784 = queryWeight, product of:
                2.0959072 = boost
                5.8125896 = idf(docFreq=360, maxDocs=44421)
                0.0196499 = queryNorm
              0.45410857 = fieldWeight in 2001, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8125896 = idf(docFreq=360, maxDocs=44421)
                0.078125 = fieldNorm(doc=2001)
          0.921232 = weight(abstract_txt:stemmers in 2001) [ClassicSimilarity], result of:
            0.921232 = score(doc=2001,freq=5.0), product of:
              0.5819025 = queryWeight, product of:
                3.2677298 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.0196499 = queryNorm
              1.583138 = fieldWeight in 2001, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.078125 = fieldNorm(doc=2001)
        0.32 = coord(8/25)
    
  3. Dunning, T.: Statistical identification of language (1994) 0.14
    0.1392871 = sum of:
      0.1392871 = product of:
        0.49745393 = sum of:
          0.047228653 = weight(abstract_txt:statistical in 4627) [ClassicSimilarity], result of:
            0.047228653 = score(doc=4627,freq=1.0), product of:
              0.10899033 = queryWeight, product of:
                5.5466094 = idf(docFreq=470, maxDocs=44421)
                0.0196499 = queryNorm
              0.43332887 = fieldWeight in 4627, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5466094 = idf(docFreq=470, maxDocs=44421)
                0.078125 = fieldNorm(doc=4627)
          0.11417009 = weight(abstract_txt:program in 4627) [ClassicSimilarity], result of:
            0.11417009 = score(doc=4627,freq=5.0), product of:
              0.11480544 = queryWeight, product of:
                1.0263306 = boost
                5.6926546 = idf(docFreq=406, maxDocs=44421)
                0.0196499 = queryNorm
              0.9944659 = fieldWeight in 4627, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.6926546 = idf(docFreq=406, maxDocs=44421)
                0.078125 = fieldNorm(doc=4627)
          0.05344183 = weight(abstract_txt:written in 4627) [ClassicSimilarity], result of:
            0.05344183 = score(doc=4627,freq=1.0), product of:
              0.11835096 = queryWeight, product of:
                1.0420581 = boost
                5.779889 = idf(docFreq=372, maxDocs=44421)
                0.0196499 = queryNorm
              0.45155382 = fieldWeight in 4627, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.779889 = idf(docFreq=372, maxDocs=44421)
                0.078125 = fieldNorm(doc=4627)
          0.078680806 = weight(abstract_txt:amount in 4627) [ClassicSimilarity], result of:
            0.078680806 = score(doc=4627,freq=2.0), product of:
              0.12156823 = queryWeight, product of:
                1.0561268 = boost
                5.857923 = idf(docFreq=344, maxDocs=44421)
                0.0196499 = queryNorm
              0.6472152 = fieldWeight in 4627, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.857923 = idf(docFreq=344, maxDocs=44421)
                0.078125 = fieldNorm(doc=4627)
          0.017852273 = weight(abstract_txt:based in 4627) [ClassicSimilarity], result of:
            0.017852273 = score(doc=4627,freq=1.0), product of:
              0.07178878 = queryWeight, product of:
                1.1477553 = boost
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.0196499 = queryNorm
              0.24867775 = fieldWeight in 4627, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.078125 = fieldNorm(doc=4627)
          0.07737223 = weight(abstract_txt:languages in 4627) [ClassicSimilarity], result of:
            0.07737223 = score(doc=4627,freq=1.0), product of:
              0.1908319 = queryWeight, product of:
                1.8713133 = boost
                5.189722 = idf(docFreq=672, maxDocs=44421)
                0.0196499 = queryNorm
              0.40544704 = fieldWeight in 4627, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.189722 = idf(docFreq=672, maxDocs=44421)
                0.078125 = fieldNorm(doc=4627)
          0.10870807 = weight(abstract_txt:linguistic in 4627) [ClassicSimilarity], result of:
            0.10870807 = score(doc=4627,freq=1.0), product of:
              0.23938784 = queryWeight, product of:
                2.0959072 = boost
                5.8125896 = idf(docFreq=360, maxDocs=44421)
                0.0196499 = queryNorm
              0.45410857 = fieldWeight in 4627, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8125896 = idf(docFreq=360, maxDocs=44421)
                0.078125 = fieldNorm(doc=4627)
        0.28 = coord(7/25)
    
  4. Aldebei, K.; He, X.; Jia, W.; Yeh, W.: SUDMAD: Sequential and unsupervised decomposition of a multi-author document based on a hidden markov model (2018) 0.14
    0.13890873 = sum of:
      0.13890873 = product of:
        0.4340898 = sum of:
          0.03306006 = weight(abstract_txt:statistical in 37) [ClassicSimilarity], result of:
            0.03306006 = score(doc=37,freq=1.0), product of:
              0.10899033 = queryWeight, product of:
                5.5466094 = idf(docFreq=470, maxDocs=44421)
                0.0196499 = queryNorm
              0.3033302 = fieldWeight in 37, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5466094 = idf(docFreq=470, maxDocs=44421)
                0.0546875 = fieldNorm(doc=37)
          0.052904718 = weight(abstract_txt:written in 37) [ClassicSimilarity], result of:
            0.052904718 = score(doc=37,freq=2.0), product of:
              0.11835096 = queryWeight, product of:
                1.0420581 = boost
                5.779889 = idf(docFreq=372, maxDocs=44421)
                0.0196499 = queryNorm
              0.44701552 = fieldWeight in 37, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.779889 = idf(docFreq=372, maxDocs=44421)
                0.0546875 = fieldNorm(doc=37)
          0.04014829 = weight(abstract_txt:great in 37) [ClassicSimilarity], result of:
            0.04014829 = score(doc=37,freq=1.0), product of:
              0.12405956 = queryWeight, product of:
                1.0668937 = boost
                5.9176426 = idf(docFreq=324, maxDocs=44421)
                0.0196499 = queryNorm
              0.3236211 = fieldWeight in 37, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9176426 = idf(docFreq=324, maxDocs=44421)
                0.0546875 = fieldNorm(doc=37)
          0.012496591 = weight(abstract_txt:based in 37) [ClassicSimilarity], result of:
            0.012496591 = score(doc=37,freq=1.0), product of:
              0.07178878 = queryWeight, product of:
                1.1477553 = boost
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.0196499 = queryNorm
              0.17407443 = fieldWeight in 37, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.0546875 = fieldNorm(doc=37)
          0.097093545 = weight(abstract_txt:hidden in 37) [ClassicSimilarity], result of:
            0.097093545 = score(doc=37,freq=2.0), product of:
              0.17740634 = queryWeight, product of:
                1.2758235 = boost
                7.0764947 = idf(docFreq=101, maxDocs=44421)
                0.0196499 = queryNorm
              0.5472947 = fieldWeight in 37, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.0764947 = idf(docFreq=101, maxDocs=44421)
                0.0546875 = fieldNorm(doc=37)
          0.040578324 = weight(abstract_txt:approach in 37) [ClassicSimilarity], result of:
            0.040578324 = score(doc=37,freq=4.0), product of:
              0.09916802 = queryWeight, product of:
                1.3489841 = boost
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.0196499 = queryNorm
              0.40918761 = fieldWeight in 37, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.0546875 = fieldNorm(doc=37)
          0.10364771 = weight(abstract_txt:markov in 37) [ClassicSimilarity], result of:
            0.10364771 = score(doc=37,freq=1.0), product of:
              0.23346692 = queryWeight, product of:
                1.4635875 = boost
                8.117949 = idf(docFreq=35, maxDocs=44421)
                0.0196499 = queryNorm
              0.4439503 = fieldWeight in 37, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.117949 = idf(docFreq=35, maxDocs=44421)
                0.0546875 = fieldNorm(doc=37)
          0.05416056 = weight(abstract_txt:languages in 37) [ClassicSimilarity], result of:
            0.05416056 = score(doc=37,freq=1.0), product of:
              0.1908319 = queryWeight, product of:
                1.8713133 = boost
                5.189722 = idf(docFreq=672, maxDocs=44421)
                0.0196499 = queryNorm
              0.28381294 = fieldWeight in 37, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.189722 = idf(docFreq=672, maxDocs=44421)
                0.0546875 = fieldNorm(doc=37)
        0.32 = coord(8/25)
    
  5. Fautsch, C.; Savoy, J.: Algorithmic stemmers or morphological analysis? : an evaluation (2009) 0.14
    0.13870971 = sum of:
      0.13870971 = product of:
        0.86693573 = sum of:
          0.017852273 = weight(abstract_txt:based in 3950) [ClassicSimilarity], result of:
            0.017852273 = score(doc=3950,freq=1.0), product of:
              0.07178878 = queryWeight, product of:
                1.1477553 = boost
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.0196499 = queryNorm
              0.24867775 = fieldWeight in 3950, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.078125 = fieldNorm(doc=3950)
          0.049759936 = weight(abstract_txt:evaluation in 3950) [ClassicSimilarity], result of:
            0.049759936 = score(doc=3950,freq=1.0), product of:
              0.14218293 = queryWeight, product of:
                1.6152686 = boost
                4.479632 = idf(docFreq=1368, maxDocs=44421)
                0.0196499 = queryNorm
              0.34997123 = fieldWeight in 3950, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.479632 = idf(docFreq=1368, maxDocs=44421)
                0.078125 = fieldNorm(doc=3950)
          0.21668534 = weight(abstract_txt:stemmer in 3950) [ClassicSimilarity], result of:
            0.21668534 = score(doc=3950,freq=1.0), product of:
              0.30093354 = queryWeight, product of:
                1.6616569 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.0196499 = queryNorm
              0.72004384 = fieldWeight in 3950, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.078125 = fieldNorm(doc=3950)
          0.5826382 = weight(abstract_txt:stemmers in 3950) [ClassicSimilarity], result of:
            0.5826382 = score(doc=3950,freq=2.0), product of:
              0.5819025 = queryWeight, product of:
                3.2677298 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.0196499 = queryNorm
              1.0012643 = fieldWeight in 3950, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.078125 = fieldNorm(doc=3950)
        0.16 = coord(4/25)