Document (#38061)

Author
Toutanova, K.
Manning, C.D.
Title
Enriching the knowledge sources used in a maximum entropy Part-of-Speech Tagger
Source
Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC-2000)
Imprint
xx : xx
Year
2000
Pages
S.63-70
Abstract
This paper presents results for a maximumentropy-based part of speech tagger, which achieves superior performance principally by enriching the information sources used for tagging. In particular, we get improved results by incorporating these features: (i) more extensive treatment of capitalization for unknown words; (ii) features for the disambiguation of the tense forms of verbs; (iii) features for disambiguating particles from prepositions and adverbs. The best resulting accuracy for the tagger on the Penn Treebank is 96.86% overall, and 86.91% on previously unseen words.
Content
Vgl.: http://nlp.stanford.edu/software/tagger.shtml.
Theme
Computerlinguistik
Object
Stanford POS Tagger

Similar documents (author)

  1. Manning, R.W.: ¬The Anglo-American Cataloguing Rules and their future (1999) 5.76
    5.7603507 = sum of:
      5.7603507 = weight(author_txt:manning in 809) [ClassicSimilarity], result of:
        5.7603507 = fieldWeight in 809, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.216561 = idf(docFreq=11, maxDocs=44421)
          0.625 = fieldNorm(doc=809)
    
  2. Manning, R.W.: ¬The Anglo American Cataloguing Rules and their future (2000) 5.76
    5.7603507 = sum of:
      5.7603507 = weight(author_txt:manning in 314) [ClassicSimilarity], result of:
        5.7603507 = fieldWeight in 314, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.216561 = idf(docFreq=11, maxDocs=44421)
          0.625 = fieldNorm(doc=314)
    
  3. Manning, C.D.: Part-of-Speech Tagging from 97% to 100% : is it time for some linguistics? (2011) 5.76
    5.7603507 = sum of:
      5.7603507 = weight(author_txt:manning in 2121) [ClassicSimilarity], result of:
        5.7603507 = fieldWeight in 2121, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.216561 = idf(docFreq=11, maxDocs=44421)
          0.625 = fieldNorm(doc=2121)
    
  4. Mallett, J.; Manning, C.: Multimedia and database design : a discussion of database technology and its use in multimedia (1993) 4.61
    4.6082807 = sum of:
      4.6082807 = weight(author_txt:manning in 6276) [ClassicSimilarity], result of:
        4.6082807 = fieldWeight in 6276, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.216561 = idf(docFreq=11, maxDocs=44421)
          0.5 = fieldNorm(doc=6276)
    
  5. Manning, C.D.; Schütze, H.: Foundations of statistical natural language processing (2000) 4.61
    4.6082807 = sum of:
      4.6082807 = weight(author_txt:manning in 2603) [ClassicSimilarity], result of:
        4.6082807 = fieldWeight in 2603, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.216561 = idf(docFreq=11, maxDocs=44421)
          0.5 = fieldNorm(doc=2603)
    

Similar documents (content)

  1. Toutanova, K.; Klein, D.; Manning, C.D.; Singer, Y.: Feature-rich Part-of-Speech Tagging with a cyclic dependency network (2003) 0.35
    0.34956446 = sum of:
      0.34956446 = product of:
        1.2484446 = sum of:
          0.085066706 = weight(abstract_txt:unknown in 2059) [ClassicSimilarity], result of:
            0.085066706 = score(doc=2059,freq=1.0), product of:
              0.12599574 = queryWeight, product of:
                1.1046023 = boost
                7.201658 = idf(docFreq=89, maxDocs=44421)
                0.015838623 = queryNorm
              0.6751554 = fieldWeight in 2059, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.201658 = idf(docFreq=89, maxDocs=44421)
                0.09375 = fieldNorm(doc=2059)
          0.17370182 = weight(abstract_txt:penn in 2059) [ClassicSimilarity], result of:
            0.17370182 = score(doc=2059,freq=1.0), product of:
              0.20279272 = queryWeight, product of:
                1.4013745 = boost
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.015838623 = queryNorm
              0.8565486 = fieldWeight in 2059, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.09375 = fieldNorm(doc=2059)
          0.043790475 = weight(abstract_txt:part in 2059) [ClassicSimilarity], result of:
            0.043790475 = score(doc=2059,freq=1.0), product of:
              0.101963766 = queryWeight, product of:
                1.40529 = boost
                4.581023 = idf(docFreq=1236, maxDocs=44421)
                0.015838623 = queryNorm
              0.42947093 = fieldWeight in 2059, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.581023 = idf(docFreq=1236, maxDocs=44421)
                0.09375 = fieldNorm(doc=2059)
          0.06997981 = weight(abstract_txt:words in 2059) [ClassicSimilarity], result of:
            0.06997981 = score(doc=2059,freq=1.0), product of:
              0.13937171 = queryWeight, product of:
                1.6429727 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.015838623 = queryNorm
              0.50210917 = fieldWeight in 2059, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.09375 = fieldNorm(doc=2059)
          0.09045746 = weight(abstract_txt:features in 2059) [ClassicSimilarity], result of:
            0.09045746 = score(doc=2059,freq=2.0), product of:
              0.15025981 = queryWeight, product of:
                2.0893445 = boost
                4.5406218 = idf(docFreq=1287, maxDocs=44421)
                0.015838623 = queryNorm
              0.60200703 = fieldWeight in 2059, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.5406218 = idf(docFreq=1287, maxDocs=44421)
                0.09375 = fieldNorm(doc=2059)
          0.14789732 = weight(abstract_txt:speech in 2059) [ClassicSimilarity], result of:
            0.14789732 = score(doc=2059,freq=1.0), product of:
              0.22952658 = queryWeight, product of:
                2.1084316 = boost
                6.8731537 = idf(docFreq=124, maxDocs=44421)
                0.015838623 = queryNorm
              0.64435816 = fieldWeight in 2059, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8731537 = idf(docFreq=124, maxDocs=44421)
                0.09375 = fieldNorm(doc=2059)
          0.6375509 = weight(abstract_txt:tagger in 2059) [ClassicSimilarity], result of:
            0.6375509 = score(doc=2059,freq=2.0), product of:
              0.55236113 = queryWeight, product of:
                4.005901 = boost
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.015838623 = queryNorm
              1.1542283 = fieldWeight in 2059, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.09375 = fieldNorm(doc=2059)
        0.28 = coord(7/25)
    
  2. L'Homme, D.; L'Homme, M.-C.; Lemay, C.: Benchmarking the performance of two Part-of-Speech (POS) taggers for terminological purposes (2002) 0.24
    0.24492574 = sum of:
      0.24492574 = product of:
        0.87473476 = sum of:
          0.014362916 = weight(abstract_txt:used in 2855) [ClassicSimilarity], result of:
            0.014362916 = score(doc=2855,freq=1.0), product of:
              0.054761503 = queryWeight, product of:
                1.029866 = boost
                3.3572001 = idf(docFreq=4205, maxDocs=44421)
                0.015838623 = queryNorm
              0.26228127 = fieldWeight in 2855, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3572001 = idf(docFreq=4205, maxDocs=44421)
                0.078125 = fieldNorm(doc=2855)
          0.015978701 = weight(abstract_txt:results in 2855) [ClassicSimilarity], result of:
            0.015978701 = score(doc=2855,freq=1.0), product of:
              0.058795113 = queryWeight, product of:
                1.0671209 = boost
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.015838623 = queryNorm
              0.2717692 = fieldWeight in 2855, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.078125 = fieldNorm(doc=2855)
          0.07088892 = weight(abstract_txt:unknown in 2855) [ClassicSimilarity], result of:
            0.07088892 = score(doc=2855,freq=1.0), product of:
              0.12599574 = queryWeight, product of:
                1.1046023 = boost
                7.201658 = idf(docFreq=89, maxDocs=44421)
                0.015838623 = queryNorm
              0.5626295 = fieldWeight in 2855, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.201658 = idf(docFreq=89, maxDocs=44421)
                0.078125 = fieldNorm(doc=2855)
          0.03649206 = weight(abstract_txt:part in 2855) [ClassicSimilarity], result of:
            0.03649206 = score(doc=2855,freq=1.0), product of:
              0.101963766 = queryWeight, product of:
                1.40529 = boost
                4.581023 = idf(docFreq=1236, maxDocs=44421)
                0.015838623 = queryNorm
              0.35789245 = fieldWeight in 2855, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.581023 = idf(docFreq=1236, maxDocs=44421)
                0.078125 = fieldNorm(doc=2855)
          0.082472 = weight(abstract_txt:words in 2855) [ClassicSimilarity], result of:
            0.082472 = score(doc=2855,freq=2.0), product of:
              0.13937171 = queryWeight, product of:
                1.6429727 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.015838623 = queryNorm
              0.5917413 = fieldWeight in 2855, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.078125 = fieldNorm(doc=2855)
          0.12324777 = weight(abstract_txt:speech in 2855) [ClassicSimilarity], result of:
            0.12324777 = score(doc=2855,freq=1.0), product of:
              0.22952658 = queryWeight, product of:
                2.1084316 = boost
                6.8731537 = idf(docFreq=124, maxDocs=44421)
                0.015838623 = queryNorm
              0.53696513 = fieldWeight in 2855, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8731537 = idf(docFreq=124, maxDocs=44421)
                0.078125 = fieldNorm(doc=2855)
          0.5312924 = weight(abstract_txt:tagger in 2855) [ClassicSimilarity], result of:
            0.5312924 = score(doc=2855,freq=2.0), product of:
              0.55236113 = queryWeight, product of:
                4.005901 = boost
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.015838623 = queryNorm
              0.9618569 = fieldWeight in 2855, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.078125 = fieldNorm(doc=2855)
        0.28 = coord(7/25)
    
  3. Rishel, T.; Perkins, L.A.; Yenduri, S.; Zand, F.: Determining the context of text using augmented latent semantic indexing (2007) 0.16
    0.15961093 = sum of:
      0.15961093 = product of:
        0.79805464 = sum of:
          0.0248773 = weight(abstract_txt:used in 2316) [ClassicSimilarity], result of:
            0.0248773 = score(doc=2316,freq=3.0), product of:
              0.054761503 = queryWeight, product of:
                1.029866 = boost
                3.3572001 = idf(docFreq=4205, maxDocs=44421)
                0.015838623 = queryNorm
              0.4542845 = fieldWeight in 2316, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.3572001 = idf(docFreq=4205, maxDocs=44421)
                0.078125 = fieldNorm(doc=2316)
          0.015978701 = weight(abstract_txt:results in 2316) [ClassicSimilarity], result of:
            0.015978701 = score(doc=2316,freq=1.0), product of:
              0.058795113 = queryWeight, product of:
                1.0671209 = boost
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.015838623 = queryNorm
              0.2717692 = fieldWeight in 2316, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.078125 = fieldNorm(doc=2316)
          0.051607568 = weight(abstract_txt:part in 2316) [ClassicSimilarity], result of:
            0.051607568 = score(doc=2316,freq=2.0), product of:
              0.101963766 = queryWeight, product of:
                1.40529 = boost
                4.581023 = idf(docFreq=1236, maxDocs=44421)
                0.015838623 = queryNorm
              0.50613636 = fieldWeight in 2316, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.581023 = idf(docFreq=1236, maxDocs=44421)
                0.078125 = fieldNorm(doc=2316)
          0.17429867 = weight(abstract_txt:speech in 2316) [ClassicSimilarity], result of:
            0.17429867 = score(doc=2316,freq=2.0), product of:
              0.22952658 = queryWeight, product of:
                2.1084316 = boost
                6.8731537 = idf(docFreq=124, maxDocs=44421)
                0.015838623 = queryNorm
              0.7593834 = fieldWeight in 2316, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.8731537 = idf(docFreq=124, maxDocs=44421)
                0.078125 = fieldNorm(doc=2316)
          0.5312924 = weight(abstract_txt:tagger in 2316) [ClassicSimilarity], result of:
            0.5312924 = score(doc=2316,freq=2.0), product of:
              0.55236113 = queryWeight, product of:
                4.005901 = boost
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.015838623 = queryNorm
              0.9618569 = fieldWeight in 2316, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.078125 = fieldNorm(doc=2316)
        0.2 = coord(5/25)
    
  4. Manning, C.D.: Part-of-Speech Tagging from 97% to 100% : is it time for some linguistics? (2011) 0.14
    0.13969319 = sum of:
      0.13969319 = product of:
        0.582055 = sum of:
          0.011490333 = weight(abstract_txt:used in 2121) [ClassicSimilarity], result of:
            0.011490333 = score(doc=2121,freq=1.0), product of:
              0.054761503 = queryWeight, product of:
                1.029866 = boost
                3.3572001 = idf(docFreq=4205, maxDocs=44421)
                0.015838623 = queryNorm
              0.20982501 = fieldWeight in 2121, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3572001 = idf(docFreq=4205, maxDocs=44421)
                0.0625 = fieldNorm(doc=2121)
          0.041286055 = weight(abstract_txt:part in 2121) [ClassicSimilarity], result of:
            0.041286055 = score(doc=2121,freq=2.0), product of:
              0.101963766 = queryWeight, product of:
                1.40529 = boost
                4.581023 = idf(docFreq=1236, maxDocs=44421)
                0.015838623 = queryNorm
              0.40490907 = fieldWeight in 2121, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.581023 = idf(docFreq=1236, maxDocs=44421)
                0.0625 = fieldNorm(doc=2121)
          0.046653207 = weight(abstract_txt:words in 2121) [ClassicSimilarity], result of:
            0.046653207 = score(doc=2121,freq=1.0), product of:
              0.13937171 = queryWeight, product of:
                1.6429727 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.015838623 = queryNorm
              0.33473945 = fieldWeight in 2121, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.0625 = fieldNorm(doc=2121)
          0.04264206 = weight(abstract_txt:features in 2121) [ClassicSimilarity], result of:
            0.04264206 = score(doc=2121,freq=1.0), product of:
              0.15025981 = queryWeight, product of:
                2.0893445 = boost
                4.5406218 = idf(docFreq=1287, maxDocs=44421)
                0.015838623 = queryNorm
              0.28378886 = fieldWeight in 2121, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5406218 = idf(docFreq=1287, maxDocs=44421)
                0.0625 = fieldNorm(doc=2121)
          0.13943893 = weight(abstract_txt:speech in 2121) [ClassicSimilarity], result of:
            0.13943893 = score(doc=2121,freq=2.0), product of:
              0.22952658 = queryWeight, product of:
                2.1084316 = boost
                6.8731537 = idf(docFreq=124, maxDocs=44421)
                0.015838623 = queryNorm
              0.6075067 = fieldWeight in 2121, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.8731537 = idf(docFreq=124, maxDocs=44421)
                0.0625 = fieldNorm(doc=2121)
          0.30054435 = weight(abstract_txt:tagger in 2121) [ClassicSimilarity], result of:
            0.30054435 = score(doc=2121,freq=1.0), product of:
              0.55236113 = queryWeight, product of:
                4.005901 = boost
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.015838623 = queryNorm
              0.54410845 = fieldWeight in 2121, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.0625 = fieldNorm(doc=2121)
        0.24 = coord(6/25)
    
  5. Brychcín, T.; Konopík, M.: HPS: High precision stemmer (2015) 0.11
    0.11425605 = sum of:
      0.11425605 = product of:
        0.4080573 = sum of:
          0.016249785 = weight(abstract_txt:used in 3686) [ClassicSimilarity], result of:
            0.016249785 = score(doc=3686,freq=2.0), product of:
              0.054761503 = queryWeight, product of:
                1.029866 = boost
                3.3572001 = idf(docFreq=4205, maxDocs=44421)
                0.015838623 = queryNorm
              0.29673737 = fieldWeight in 3686, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.3572001 = idf(docFreq=4205, maxDocs=44421)
                0.0625 = fieldNorm(doc=3686)
          0.018077835 = weight(abstract_txt:results in 3686) [ClassicSimilarity], result of:
            0.018077835 = score(doc=3686,freq=2.0), product of:
              0.058795113 = queryWeight, product of:
                1.0671209 = boost
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.015838623 = queryNorm
              0.30747172 = fieldWeight in 3686, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.0625 = fieldNorm(doc=3686)
          0.057791952 = weight(abstract_txt:maximum in 3686) [ClassicSimilarity], result of:
            0.057791952 = score(doc=3686,freq=1.0), product of:
              0.12759154 = queryWeight, product of:
                1.1115755 = boost
                7.2471204 = idf(docFreq=85, maxDocs=44421)
                0.015838623 = queryNorm
              0.45294502 = fieldWeight in 3686, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2471204 = idf(docFreq=85, maxDocs=44421)
                0.0625 = fieldNorm(doc=3686)
          0.076688744 = weight(abstract_txt:entropy in 3686) [ClassicSimilarity], result of:
            0.076688744 = score(doc=3686,freq=1.0), product of:
              0.15407471 = queryWeight, product of:
                1.2215006 = boost
                7.963798 = idf(docFreq=41, maxDocs=44421)
                0.015838623 = queryNorm
              0.49773738 = fieldWeight in 3686, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.963798 = idf(docFreq=41, maxDocs=44421)
                0.0625 = fieldNorm(doc=3686)
          0.115801215 = weight(abstract_txt:unseen in 3686) [ClassicSimilarity], result of:
            0.115801215 = score(doc=3686,freq=1.0), product of:
              0.20279272 = queryWeight, product of:
                1.4013745 = boost
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.015838623 = queryNorm
              0.5710324 = fieldWeight in 3686, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.0625 = fieldNorm(doc=3686)
          0.08080573 = weight(abstract_txt:words in 3686) [ClassicSimilarity], result of:
            0.08080573 = score(doc=3686,freq=3.0), product of:
              0.13937171 = queryWeight, product of:
                1.6429727 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.015838623 = queryNorm
              0.5797857 = fieldWeight in 3686, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.0625 = fieldNorm(doc=3686)
          0.04264206 = weight(abstract_txt:features in 3686) [ClassicSimilarity], result of:
            0.04264206 = score(doc=3686,freq=1.0), product of:
              0.15025981 = queryWeight, product of:
                2.0893445 = boost
                4.5406218 = idf(docFreq=1287, maxDocs=44421)
                0.015838623 = queryNorm
              0.28378886 = fieldWeight in 3686, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5406218 = idf(docFreq=1287, maxDocs=44421)
                0.0625 = fieldNorm(doc=3686)
        0.28 = coord(7/25)