Document (#37644)

Author
Dias, G.
Title
Multiword unit hybrid extraction
Source
http://acl.ldc.upenn.edu/W/W03/W03-1806.pdf
Year
o.J.
Abstract
This paper describes an original hybrid system that extracts multiword unit candidates from part-of-speech tagged corpora. While classical hybrid systems manually define local part-of-speech patterns that lead to the identification of well-known multiword units (mainly compound nouns), our solution automatically identifies relevant syntactical patterns from the corpus. Word statistics are then combined with the endogenously acquired linguistic information in order to extract the most relevant sequences of words. As a result, (1) human intervention is avoided providing total flexibility of use of the system and (2) different multiword units like phrasal verbs, adverbial locutions and prepositional locutions may be identified. The system has been tested on the Brown Corpus leading to encouraging results
Theme
Computerlinguistik

Similar documents (content)

  1. Nagy T., I.: Detecting multiword expressions and named entities in natural language texts (2014) 0.19
    0.1854538 = sum of:
      0.1854538 = product of:
        0.927269 = sum of:
          0.03614747 = weight(abstract_txt:compound in 2536) [ClassicSimilarity], result of:
            0.03614747 = score(doc=2536,freq=1.0), product of:
              0.12343645 = queryWeight, product of:
                1.0694218 = boost
                7.496775 = idf(docFreq=66, maxDocs=44421)
                0.0153964255 = queryNorm
              0.29284278 = fieldWeight in 2536, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.496775 = idf(docFreq=66, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2536)
          0.041864555 = weight(abstract_txt:candidates in 2536) [ClassicSimilarity], result of:
            0.041864555 = score(doc=2536,freq=1.0), product of:
              0.1361306 = queryWeight, product of:
                1.1230658 = boost
                7.872826 = idf(docFreq=45, maxDocs=44421)
                0.0153964255 = queryNorm
              0.30753228 = fieldWeight in 2536, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.872826 = idf(docFreq=45, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2536)
          0.038793884 = weight(abstract_txt:corpus in 2536) [ClassicSimilarity], result of:
            0.038793884 = score(doc=2536,freq=1.0), product of:
              0.16302103 = queryWeight, product of:
                1.7380575 = boost
                6.0919957 = idf(docFreq=272, maxDocs=44421)
                0.0153964255 = queryNorm
              0.23796858 = fieldWeight in 2536, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0919957 = idf(docFreq=272, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2536)
          0.07204073 = weight(abstract_txt:unit in 2536) [ClassicSimilarity], result of:
            0.07204073 = score(doc=2536,freq=2.0), product of:
              0.19548354 = queryWeight, product of:
                1.9032569 = boost
                6.6710296 = idf(docFreq=152, maxDocs=44421)
                0.0153964255 = queryNorm
              0.3685258 = fieldWeight in 2536, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.6710296 = idf(docFreq=152, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2536)
          0.73842233 = weight(abstract_txt:multiword in 2536) [ClassicSimilarity], result of:
            0.73842233 = score(doc=2536,freq=11.0), product of:
              0.6583908 = queryWeight, product of:
                4.9396844 = boost
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.0153964255 = queryNorm
              1.1215563 = fieldWeight in 2536, product of:
                3.3166249 = tf(freq=11.0), with freq of:
                  11.0 = termFreq=11.0
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2536)
        0.2 = coord(5/25)
    
  2. Warner, J.: Analogies between linguistics and information theory (2007) 0.16
    0.16243546 = sum of:
      0.16243546 = product of:
        0.8121773 = sum of:
          0.11237222 = weight(abstract_txt:sequences in 1138) [ClassicSimilarity], result of:
            0.11237222 = score(doc=1138,freq=4.0), product of:
              0.12107769 = queryWeight, product of:
                1.0591546 = boost
                7.4248013 = idf(docFreq=71, maxDocs=44421)
                0.0153964255 = queryNorm
              0.92810017 = fieldWeight in 1138, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.4248013 = idf(docFreq=71, maxDocs=44421)
                0.0625 = fieldNorm(doc=1138)
          0.04014877 = weight(abstract_txt:patterns in 1138) [ClassicSimilarity], result of:
            0.04014877 = score(doc=1138,freq=1.0), product of:
              0.121927865 = queryWeight, product of:
                1.5031205 = boost
                5.2685275 = idf(docFreq=621, maxDocs=44421)
                0.0153964255 = queryNorm
              0.32928297 = fieldWeight in 1138, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2685275 = idf(docFreq=621, maxDocs=44421)
                0.0625 = fieldNorm(doc=1138)
          0.074368596 = weight(abstract_txt:units in 1138) [ClassicSimilarity], result of:
            0.074368596 = score(doc=1138,freq=1.0), product of:
              0.18389978 = queryWeight, product of:
                1.8460052 = boost
                6.470359 = idf(docFreq=186, maxDocs=44421)
                0.0153964255 = queryNorm
              0.40439743 = fieldWeight in 1138, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.470359 = idf(docFreq=186, maxDocs=44421)
                0.0625 = fieldNorm(doc=1138)
          0.08150478 = weight(abstract_txt:unit in 1138) [ClassicSimilarity], result of:
            0.08150478 = score(doc=1138,freq=1.0), product of:
              0.19548354 = queryWeight, product of:
                1.9032569 = boost
                6.6710296 = idf(docFreq=152, maxDocs=44421)
                0.0153964255 = queryNorm
              0.41693935 = fieldWeight in 1138, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6710296 = idf(docFreq=152, maxDocs=44421)
                0.0625 = fieldNorm(doc=1138)
          0.5037829 = weight(abstract_txt:multiword in 1138) [ClassicSimilarity], result of:
            0.5037829 = score(doc=1138,freq=2.0), product of:
              0.6583908 = queryWeight, product of:
                4.9396844 = boost
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.0153964255 = queryNorm
              0.7651731 = fieldWeight in 1138, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.0625 = fieldNorm(doc=1138)
        0.2 = coord(5/25)
    
  3. Nissim, M.; Zaninello, A,: Modeling the internal variability of multiword expressions through a pattern-based method (2013) 0.13
    0.13380913 = sum of:
      0.13380913 = product of:
        0.6690457 = sum of:
          0.026393194 = weight(abstract_txt:part in 1990) [ClassicSimilarity], result of:
            0.026393194 = score(doc=1990,freq=1.0), product of:
              0.0921827 = queryWeight, product of:
                1.3069743 = boost
                4.581023 = idf(docFreq=1236, maxDocs=44421)
                0.0153964255 = queryNorm
              0.28631395 = fieldWeight in 1990, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.581023 = idf(docFreq=1236, maxDocs=44421)
                0.0625 = fieldNorm(doc=1990)
          0.089775376 = weight(abstract_txt:patterns in 1990) [ClassicSimilarity], result of:
            0.089775376 = score(doc=1990,freq=5.0), product of:
              0.121927865 = queryWeight, product of:
                1.5031205 = boost
                5.2685275 = idf(docFreq=621, maxDocs=44421)
                0.0153964255 = queryNorm
              0.7362991 = fieldWeight in 1990, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.2685275 = idf(docFreq=621, maxDocs=44421)
                0.0625 = fieldNorm(doc=1990)
          0.10750876 = weight(abstract_txt:corpus in 1990) [ClassicSimilarity], result of:
            0.10750876 = score(doc=1990,freq=3.0), product of:
              0.16302103 = queryWeight, product of:
                1.7380575 = boost
                6.0919957 = idf(docFreq=272, maxDocs=44421)
                0.0153964255 = queryNorm
              0.6594779 = fieldWeight in 1990, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.0919957 = idf(docFreq=272, maxDocs=44421)
                0.0625 = fieldNorm(doc=1990)
          0.08914 = weight(abstract_txt:speech in 1990) [ClassicSimilarity], result of:
            0.08914 = score(doc=1990,freq=1.0), product of:
              0.20750882 = queryWeight, product of:
                1.9609233 = boost
                6.8731537 = idf(docFreq=124, maxDocs=44421)
                0.0153964255 = queryNorm
              0.4295721 = fieldWeight in 1990, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8731537 = idf(docFreq=124, maxDocs=44421)
                0.0625 = fieldNorm(doc=1990)
          0.35622832 = weight(abstract_txt:multiword in 1990) [ClassicSimilarity], result of:
            0.35622832 = score(doc=1990,freq=1.0), product of:
              0.6583908 = queryWeight, product of:
                4.9396844 = boost
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.0153964255 = queryNorm
              0.5410591 = fieldWeight in 1990, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.0625 = fieldNorm(doc=1990)
        0.2 = coord(5/25)
    
  4. Gödert, W.: Detecting multiword phrases in mathematical text corpora (2012) 0.11
    0.111903355 = sum of:
      0.111903355 = product of:
        0.93252796 = sum of:
          0.07093198 = weight(abstract_txt:corpora in 1466) [ClassicSimilarity], result of:
            0.07093198 = score(doc=1466,freq=1.0), product of:
              0.10793079 = queryWeight, product of:
                7.01012 = idf(docFreq=108, maxDocs=44421)
                0.0153964255 = queryNorm
              0.6571987 = fieldWeight in 1466, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.01012 = idf(docFreq=108, maxDocs=44421)
                0.09375 = fieldNorm(doc=1466)
          0.105921544 = weight(abstract_txt:nouns in 1466) [ClassicSimilarity], result of:
            0.105921544 = score(doc=1466,freq=1.0), product of:
              0.14100684 = queryWeight, product of:
                1.1430031 = boost
                8.0125885 = idf(docFreq=39, maxDocs=44421)
                0.0153964255 = queryNorm
              0.7511802 = fieldWeight in 1466, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.0125885 = idf(docFreq=39, maxDocs=44421)
                0.09375 = fieldNorm(doc=1466)
          0.7556744 = weight(abstract_txt:multiword in 1466) [ClassicSimilarity], result of:
            0.7556744 = score(doc=1466,freq=2.0), product of:
              0.6583908 = queryWeight, product of:
                4.9396844 = boost
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.0153964255 = queryNorm
              1.1477597 = fieldWeight in 1466, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.09375 = fieldNorm(doc=1466)
        0.12 = coord(3/25)
    
  5. Ramisch, C.: Multiword expressions acquisition : a generic and open framework (2015) 0.11
    0.10852024 = sum of:
      0.10852024 = product of:
        0.9043354 = sum of:
          0.045714352 = weight(abstract_txt:part in 2649) [ClassicSimilarity], result of:
            0.045714352 = score(doc=2649,freq=3.0), product of:
              0.0921827 = queryWeight, product of:
                1.3069743 = boost
                4.581023 = idf(docFreq=1236, maxDocs=44421)
                0.0153964255 = queryNorm
              0.4959103 = fieldWeight in 2649, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.581023 = idf(docFreq=1236, maxDocs=44421)
                0.0625 = fieldNorm(doc=2649)
          0.062070213 = weight(abstract_txt:corpus in 2649) [ClassicSimilarity], result of:
            0.062070213 = score(doc=2649,freq=1.0), product of:
              0.16302103 = queryWeight, product of:
                1.7380575 = boost
                6.0919957 = idf(docFreq=272, maxDocs=44421)
                0.0153964255 = queryNorm
              0.38074973 = fieldWeight in 2649, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0919957 = idf(docFreq=272, maxDocs=44421)
                0.0625 = fieldNorm(doc=2649)
          0.7965508 = weight(abstract_txt:multiword in 2649) [ClassicSimilarity], result of:
            0.7965508 = score(doc=2649,freq=5.0), product of:
              0.6583908 = queryWeight, product of:
                4.9396844 = boost
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.0153964255 = queryNorm
              1.209845 = fieldWeight in 2649, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.0625 = fieldNorm(doc=2649)
        0.12 = coord(3/25)