Document (#37125)

Author
Schmid, H.
Title
Improvements in Part-of-Speech tagging with an application to German
Source
ftp://ftp.ims.uni-stuttgart.de/pub/corpora/tree-tagger2.pdf
Year
1995
Abstract
This paper presents a couple of extensions to a basic Markov Model tagger (called TreeTagger) which improve its accuracy when trained on small corpora. The basic tagger was originally developed for English Schmid, 1994. The extensions together reduced error rates on a German test corpus by more than a third.
Content
Beitrag für: Proceedings of the ACL SIGDAT-Workshop. Dublin, Ireland, 1995. Für die Software TreeTagger, vgl.: http://www.ims.uni-stuttgart.de/~schmid/.
Theme
Computerlinguistik
Object
TreeTagger

Similar documents (author)

  1. Schmid, F.: Weitere Betrachtungen zum alphabetischen Sachkatalog (1925) 5.38
    5.3815155 = sum of:
      5.3815155 = weight(author_txt:schmid in 618) [ClassicSimilarity], result of:
        5.3815155 = fieldWeight in 618, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.610425 = idf(docFreq=21, maxDocs=44421)
          0.625 = fieldNorm(doc=618)
    
  2. Schmid, F.: ¬Der alphabetische Sachkatalog mit besonderer Beziehung auf die Landesbibliothek in Stuttgart (1924) 5.38
    5.3815155 = sum of:
      5.3815155 = weight(author_txt:schmid in 619) [ClassicSimilarity], result of:
        5.3815155 = fieldWeight in 619, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.610425 = idf(docFreq=21, maxDocs=44421)
          0.625 = fieldNorm(doc=619)
    
  3. Schmid, F.: Mein letztes Wort zum alphabetischen Sachkatalog (1927) 5.38
    5.3815155 = sum of:
      5.3815155 = weight(author_txt:schmid in 620) [ClassicSimilarity], result of:
        5.3815155 = fieldWeight in 620, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.610425 = idf(docFreq=21, maxDocs=44421)
          0.625 = fieldNorm(doc=620)
    
  4. Schmid, B.: ¬Der Information Highway als Infrastruktur der Informationsgesellschaft (1996) 5.38
    5.3815155 = sum of:
      5.3815155 = weight(author_txt:schmid in 5034) [ClassicSimilarity], result of:
        5.3815155 = fieldWeight in 5034, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.610425 = idf(docFreq=21, maxDocs=44421)
          0.625 = fieldNorm(doc=5034)
    
  5. Schmid, B.: Elektronische Märkte (1993) 5.38
    5.3815155 = sum of:
      5.3815155 = weight(author_txt:schmid in 370) [ClassicSimilarity], result of:
        5.3815155 = fieldWeight in 370, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.610425 = idf(docFreq=21, maxDocs=44421)
          0.625 = fieldNorm(doc=370)
    

Similar documents (content)

  1. L'Homme, D.; L'Homme, M.-C.; Lemay, C.: Benchmarking the performance of two Part-of-Speech (POS) taggers for terminological purposes (2002) 0.27
    0.26517949 = sum of:
      0.26517949 = product of:
        0.9470696 = sum of:
          0.04479049 = weight(abstract_txt:english in 2855) [ClassicSimilarity], result of:
            0.04479049 = score(doc=2855,freq=1.0), product of:
              0.102844775 = queryWeight, product of:
                1.1047499 = boost
                5.5745983 = idf(docFreq=457, maxDocs=44421)
                0.016699547 = queryNorm
              0.4355155 = fieldWeight in 2855, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5745983 = idf(docFreq=457, maxDocs=44421)
                0.078125 = fieldNorm(doc=2855)
          0.054607004 = weight(abstract_txt:accuracy in 2855) [ClassicSimilarity], result of:
            0.054607004 = score(doc=2855,freq=1.0), product of:
              0.11737004 = queryWeight, product of:
                1.1801888 = boost
                5.9552646 = idf(docFreq=312, maxDocs=44421)
                0.016699547 = queryNorm
              0.46525505 = fieldWeight in 2855, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9552646 = idf(docFreq=312, maxDocs=44421)
                0.078125 = fieldNorm(doc=2855)
          0.06475152 = weight(abstract_txt:tagging in 2855) [ClassicSimilarity], result of:
            0.06475152 = score(doc=2855,freq=1.0), product of:
              0.13148968 = queryWeight, product of:
                1.2491618 = boost
                6.3033047 = idf(docFreq=220, maxDocs=44421)
                0.016699547 = queryNorm
              0.49244568 = fieldWeight in 2855, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3033047 = idf(docFreq=220, maxDocs=44421)
                0.078125 = fieldNorm(doc=2855)
          0.083948575 = weight(abstract_txt:speech in 2855) [ClassicSimilarity], result of:
            0.083948575 = score(doc=2855,freq=1.0), product of:
              0.15633897 = queryWeight, product of:
                1.3620921 = boost
                6.8731537 = idf(docFreq=124, maxDocs=44421)
                0.016699547 = queryNorm
              0.53696513 = fieldWeight in 2855, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8731537 = idf(docFreq=124, maxDocs=44421)
                0.078125 = fieldNorm(doc=2855)
          0.12596111 = weight(abstract_txt:corpora in 2855) [ClassicSimilarity], result of:
            0.12596111 = score(doc=2855,freq=2.0), product of:
              0.162632 = queryWeight, product of:
                1.3892355 = boost
                7.01012 = idf(docFreq=108, maxDocs=44421)
                0.016699547 = queryNorm
              0.77451617 = fieldWeight in 2855, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.01012 = idf(docFreq=108, maxDocs=44421)
                0.078125 = fieldNorm(doc=2855)
          0.09050066 = weight(abstract_txt:trained in 2855) [ClassicSimilarity], result of:
            0.09050066 = score(doc=2855,freq=1.0), product of:
              0.16437137 = queryWeight, product of:
                1.3966447 = boost
                7.0475073 = idf(docFreq=104, maxDocs=44421)
                0.016699547 = queryNorm
              0.5505865 = fieldWeight in 2855, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0475073 = idf(docFreq=104, maxDocs=44421)
                0.078125 = fieldNorm(doc=2855)
          0.4825102 = weight(abstract_txt:tagger in 2855) [ClassicSimilarity], result of:
            0.4825102 = score(doc=2855,freq=2.0), product of:
              0.5016445 = queryWeight, product of:
                3.4505305 = boost
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.016699547 = queryNorm
              0.9618569 = fieldWeight in 2855, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.078125 = fieldNorm(doc=2855)
        0.28 = coord(7/25)
    
  2. Manning, C.D.: Part-of-Speech Tagging from 97% to 100% : is it time for some linguistics? (2011) 0.24
    0.24100742 = sum of:
      0.24100742 = product of:
        0.7531482 = sum of:
          0.03143819 = weight(abstract_txt:small in 2121) [ClassicSimilarity], result of:
            0.03143819 = score(doc=2121,freq=1.0), product of:
              0.09425478 = queryWeight, product of:
                1.0576075 = boost
                5.3367167 = idf(docFreq=580, maxDocs=44421)
                0.016699547 = queryNorm
              0.3335448 = fieldWeight in 2121, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3367167 = idf(docFreq=580, maxDocs=44421)
                0.0625 = fieldNorm(doc=2121)
          0.07566568 = weight(abstract_txt:accuracy in 2121) [ClassicSimilarity], result of:
            0.07566568 = score(doc=2121,freq=3.0), product of:
              0.11737004 = queryWeight, product of:
                1.1801888 = boost
                5.9552646 = idf(docFreq=312, maxDocs=44421)
                0.016699547 = queryNorm
              0.64467627 = fieldWeight in 2121, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.9552646 = idf(docFreq=312, maxDocs=44421)
                0.0625 = fieldNorm(doc=2121)
          0.049300827 = weight(abstract_txt:improvements in 2121) [ClassicSimilarity], result of:
            0.049300827 = score(doc=2121,freq=1.0), product of:
              0.12722364 = queryWeight, product of:
                1.2287309 = boost
                6.2002096 = idf(docFreq=244, maxDocs=44421)
                0.016699547 = queryNorm
              0.3875131 = fieldWeight in 2121, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2002096 = idf(docFreq=244, maxDocs=44421)
                0.0625 = fieldNorm(doc=2121)
          0.08972234 = weight(abstract_txt:tagging in 2121) [ClassicSimilarity], result of:
            0.08972234 = score(doc=2121,freq=3.0), product of:
              0.13148968 = queryWeight, product of:
                1.2491618 = boost
                6.3033047 = idf(docFreq=220, maxDocs=44421)
                0.016699547 = queryNorm
              0.6823527 = fieldWeight in 2121, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.3033047 = idf(docFreq=220, maxDocs=44421)
                0.0625 = fieldNorm(doc=2121)
          0.06669463 = weight(abstract_txt:error in 2121) [ClassicSimilarity], result of:
            0.06669463 = score(doc=2121,freq=1.0), product of:
              0.15561768 = queryWeight, product of:
                1.3589464 = boost
                6.8572807 = idf(docFreq=126, maxDocs=44421)
                0.016699547 = queryNorm
              0.42858005 = fieldWeight in 2121, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8572807 = idf(docFreq=126, maxDocs=44421)
                0.0625 = fieldNorm(doc=2121)
          0.09497698 = weight(abstract_txt:speech in 2121) [ClassicSimilarity], result of:
            0.09497698 = score(doc=2121,freq=2.0), product of:
              0.15633897 = queryWeight, product of:
                1.3620921 = boost
                6.8731537 = idf(docFreq=124, maxDocs=44421)
                0.016699547 = queryNorm
              0.6075067 = fieldWeight in 2121, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.8731537 = idf(docFreq=124, maxDocs=44421)
                0.0625 = fieldNorm(doc=2121)
          0.072400525 = weight(abstract_txt:trained in 2121) [ClassicSimilarity], result of:
            0.072400525 = score(doc=2121,freq=1.0), product of:
              0.16437137 = queryWeight, product of:
                1.3966447 = boost
                7.0475073 = idf(docFreq=104, maxDocs=44421)
                0.016699547 = queryNorm
              0.4404692 = fieldWeight in 2121, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0475073 = idf(docFreq=104, maxDocs=44421)
                0.0625 = fieldNorm(doc=2121)
          0.272949 = weight(abstract_txt:tagger in 2121) [ClassicSimilarity], result of:
            0.272949 = score(doc=2121,freq=1.0), product of:
              0.5016445 = queryWeight, product of:
                3.4505305 = boost
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.016699547 = queryNorm
              0.54410845 = fieldWeight in 2121, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.0625 = fieldNorm(doc=2121)
        0.32 = coord(8/25)
    
  3. Toutanova, K.; Klein, D.; Manning, C.D.; Singer, Y.: Feature-rich Part-of-Speech Tagging with a cyclic dependency network (2003) 0.23
    0.23232605 = sum of:
      0.23232605 = product of:
        0.9680252 = sum of:
          0.045002505 = weight(abstract_txt:together in 2059) [ClassicSimilarity], result of:
            0.045002505 = score(doc=2059,freq=1.0), product of:
              0.09136124 = queryWeight, product of:
                1.0412472 = boost
                5.254162 = idf(docFreq=630, maxDocs=44421)
                0.016699547 = queryNorm
              0.49257767 = fieldWeight in 2059, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.254162 = idf(docFreq=630, maxDocs=44421)
                0.09375 = fieldNorm(doc=2059)
          0.0655284 = weight(abstract_txt:accuracy in 2059) [ClassicSimilarity], result of:
            0.0655284 = score(doc=2059,freq=1.0), product of:
              0.11737004 = queryWeight, product of:
                1.1801888 = boost
                5.9552646 = idf(docFreq=312, maxDocs=44421)
                0.016699547 = queryNorm
              0.55830604 = fieldWeight in 2059, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9552646 = idf(docFreq=312, maxDocs=44421)
                0.09375 = fieldNorm(doc=2059)
          0.07770183 = weight(abstract_txt:tagging in 2059) [ClassicSimilarity], result of:
            0.07770183 = score(doc=2059,freq=1.0), product of:
              0.13148968 = queryWeight, product of:
                1.2491618 = boost
                6.3033047 = idf(docFreq=220, maxDocs=44421)
                0.016699547 = queryNorm
              0.5909348 = fieldWeight in 2059, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3033047 = idf(docFreq=220, maxDocs=44421)
                0.09375 = fieldNorm(doc=2059)
          0.10004195 = weight(abstract_txt:error in 2059) [ClassicSimilarity], result of:
            0.10004195 = score(doc=2059,freq=1.0), product of:
              0.15561768 = queryWeight, product of:
                1.3589464 = boost
                6.8572807 = idf(docFreq=126, maxDocs=44421)
                0.016699547 = queryNorm
              0.64287007 = fieldWeight in 2059, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8572807 = idf(docFreq=126, maxDocs=44421)
                0.09375 = fieldNorm(doc=2059)
          0.100738294 = weight(abstract_txt:speech in 2059) [ClassicSimilarity], result of:
            0.100738294 = score(doc=2059,freq=1.0), product of:
              0.15633897 = queryWeight, product of:
                1.3620921 = boost
                6.8731537 = idf(docFreq=124, maxDocs=44421)
                0.016699547 = queryNorm
              0.64435816 = fieldWeight in 2059, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8731537 = idf(docFreq=124, maxDocs=44421)
                0.09375 = fieldNorm(doc=2059)
          0.5790123 = weight(abstract_txt:tagger in 2059) [ClassicSimilarity], result of:
            0.5790123 = score(doc=2059,freq=2.0), product of:
              0.5016445 = queryWeight, product of:
                3.4505305 = boost
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.016699547 = queryNorm
              1.1542283 = fieldWeight in 2059, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.09375 = fieldNorm(doc=2059)
        0.24 = coord(6/25)
    
  4. Bergler, S.: Generative lexicon principles for machine translation : a case for meta-lexical structure (1994/95) 0.16
    0.15749256 = sum of:
      0.15749256 = product of:
        0.7874628 = sum of:
          0.076011986 = weight(abstract_txt:english in 4140) [ClassicSimilarity], result of:
            0.076011986 = score(doc=4140,freq=2.0), product of:
              0.102844775 = queryWeight, product of:
                1.1047499 = boost
                5.5745983 = idf(docFreq=457, maxDocs=44421)
                0.016699547 = queryNorm
              0.73909426 = fieldWeight in 4140, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.5745983 = idf(docFreq=457, maxDocs=44421)
                0.09375 = fieldNorm(doc=4140)
          0.17448385 = weight(abstract_txt:speech in 4140) [ClassicSimilarity], result of:
            0.17448385 = score(doc=4140,freq=3.0), product of:
              0.15633897 = queryWeight, product of:
                1.3620921 = boost
                6.8731537 = idf(docFreq=124, maxDocs=44421)
                0.016699547 = queryNorm
              1.1160611 = fieldWeight in 4140, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.8731537 = idf(docFreq=124, maxDocs=44421)
                0.09375 = fieldNorm(doc=4140)
          0.07630791 = weight(abstract_txt:basic in 4140) [ClassicSimilarity], result of:
            0.07630791 = score(doc=4140,freq=1.0), product of:
              0.16367935 = queryWeight, product of:
                1.9709917 = boost
                4.972839 = idf(docFreq=835, maxDocs=44421)
                0.016699547 = queryNorm
              0.46620363 = fieldWeight in 4140, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.972839 = idf(docFreq=835, maxDocs=44421)
                0.09375 = fieldNorm(doc=4140)
          0.21607548 = weight(abstract_txt:german in 4140) [ClassicSimilarity], result of:
            0.21607548 = score(doc=4140,freq=2.0), product of:
              0.26002064 = queryWeight, product of:
                2.4842298 = boost
                6.2677455 = idf(docFreq=228, maxDocs=44421)
                0.016699547 = queryNorm
              0.83099353 = fieldWeight in 4140, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.2677455 = idf(docFreq=228, maxDocs=44421)
                0.09375 = fieldNorm(doc=4140)
          0.24458356 = weight(abstract_txt:extensions in 4140) [ClassicSimilarity], result of:
            0.24458356 = score(doc=4140,freq=1.0), product of:
              0.35582164 = queryWeight, product of:
                2.9060564 = boost
                7.33202 = idf(docFreq=78, maxDocs=44421)
                0.016699547 = queryNorm
              0.68737686 = fieldWeight in 4140, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.33202 = idf(docFreq=78, maxDocs=44421)
                0.09375 = fieldNorm(doc=4140)
        0.2 = coord(5/25)
    
  5. Toutanova, K.; Manning, C.D.: Enriching the knowledge sources used in a maximum entropy Part-of-Speech Tagger (2000) 0.13
    0.13167691 = sum of:
      0.13167691 = product of:
        0.82298076 = sum of:
          0.0655284 = weight(abstract_txt:accuracy in 2060) [ClassicSimilarity], result of:
            0.0655284 = score(doc=2060,freq=1.0), product of:
              0.11737004 = queryWeight, product of:
                1.1801888 = boost
                5.9552646 = idf(docFreq=312, maxDocs=44421)
                0.016699547 = queryNorm
              0.55830604 = fieldWeight in 2060, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9552646 = idf(docFreq=312, maxDocs=44421)
                0.09375 = fieldNorm(doc=2060)
          0.07770183 = weight(abstract_txt:tagging in 2060) [ClassicSimilarity], result of:
            0.07770183 = score(doc=2060,freq=1.0), product of:
              0.13148968 = queryWeight, product of:
                1.2491618 = boost
                6.3033047 = idf(docFreq=220, maxDocs=44421)
                0.016699547 = queryNorm
              0.5909348 = fieldWeight in 2060, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3033047 = idf(docFreq=220, maxDocs=44421)
                0.09375 = fieldNorm(doc=2060)
          0.100738294 = weight(abstract_txt:speech in 2060) [ClassicSimilarity], result of:
            0.100738294 = score(doc=2060,freq=1.0), product of:
              0.15633897 = queryWeight, product of:
                1.3620921 = boost
                6.8731537 = idf(docFreq=124, maxDocs=44421)
                0.016699547 = queryNorm
              0.64435816 = fieldWeight in 2060, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8731537 = idf(docFreq=124, maxDocs=44421)
                0.09375 = fieldNorm(doc=2060)
          0.5790123 = weight(abstract_txt:tagger in 2060) [ClassicSimilarity], result of:
            0.5790123 = score(doc=2060,freq=2.0), product of:
              0.5016445 = queryWeight, product of:
                3.4505305 = boost
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.016699547 = queryNorm
              1.1542283 = fieldWeight in 2060, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.09375 = fieldNorm(doc=2060)
        0.16 = coord(4/25)