Document (#38162)

Author
Kiela, D.
Clark, S.
Title
Detecting compositionality of multi-word expressions using nearest neighbours in vector space models
Source
http://www.cl.cam.ac.uk/~dk427/papers/emnlp2013.pdf
Year
2013
Abstract
We present a novel unsupervised approach to detecting the compositionality of multi-word expressions. We compute the compositionality of a phrase through substituting the constituent words with their "neighbours" in a semantic vector space and averaging over the distance between the original phrase and the substituted neighbour phrases. Several methods of obtaining neighbours are presented. The results are compared to existing supervised results and achieve state-of-the-art performance on a verb-object dataset of human compositionality ratings.
Theme
Computerlinguistik

Similar documents (author)

  1. Clark, K.: CD-ROM retrieval software : the year in review (1992) 5.13
    5.1281 = sum of:
      5.1281 = weight(author_txt:clark in 2337) [ClassicSimilarity], result of:
        5.1281 = fieldWeight in 2337, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.20496 = idf(docFreq=32, maxDocs=44421)
          0.625 = fieldNorm(doc=2337)
    
  2. Clark, K.: CD-ROM retrieval software : the year 1992 in review (1993) 5.13
    5.1281 = sum of:
      5.1281 = weight(author_txt:clark in 2353) [ClassicSimilarity], result of:
        5.1281 = fieldWeight in 2353, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.20496 = idf(docFreq=32, maxDocs=44421)
          0.625 = fieldNorm(doc=2353)
    
  3. Clark, A.J.: Education and training for librarianship and information work : annual bibliography, 1990 (1991) 5.13
    5.1281 = sum of:
      5.1281 = weight(author_txt:clark in 2691) [ClassicSimilarity], result of:
        5.1281 = fieldWeight in 2691, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.20496 = idf(docFreq=32, maxDocs=44421)
          0.625 = fieldNorm(doc=2691)
    
  4. Clark, D.: Mad cows, metathesauri, and meaning (1999) 5.13
    5.1281 = sum of:
      5.1281 = weight(author_txt:clark in 2727) [ClassicSimilarity], result of:
        5.1281 = fieldWeight in 2727, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.20496 = idf(docFreq=32, maxDocs=44421)
          0.625 = fieldNorm(doc=2727)
    
  5. Clark, K.: To cancel or not to cancel (print indexes) (1992) 5.13
    5.1281 = sum of:
      5.1281 = weight(author_txt:clark in 3684) [ClassicSimilarity], result of:
        5.1281 = fieldWeight in 3684, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.20496 = idf(docFreq=32, maxDocs=44421)
          0.625 = fieldNorm(doc=3684)
    

Similar documents (content)

  1. Cruys, T. van de; Moirón, B.V.: Semantics-based multiword expression extraction (2007) 0.12
    0.12072407 = sum of:
      0.12072407 = product of:
        1.0060339 = sum of:
          0.07055113 = weight(abstract_txt:unsupervised in 3919) [ClassicSimilarity], result of:
            0.07055113 = score(doc=3919,freq=1.0), product of:
              0.098708324 = queryWeight, product of:
                1.1792575 = boost
                7.62393 = idf(docFreq=58, maxDocs=44421)
                0.010979087 = queryNorm
              0.71474344 = fieldWeight in 3919, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.62393 = idf(docFreq=58, maxDocs=44421)
                0.09375 = fieldNorm(doc=3919)
          0.0993052 = weight(abstract_txt:expressions in 3919) [ClassicSimilarity], result of:
            0.0993052 = score(doc=3919,freq=1.0), product of:
              0.15619811 = queryWeight, product of:
                2.0978994 = boost
                6.7814865 = idf(docFreq=136, maxDocs=44421)
                0.010979087 = queryNorm
              0.63576436 = fieldWeight in 3919, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7814865 = idf(docFreq=136, maxDocs=44421)
                0.09375 = fieldNorm(doc=3919)
          0.8361776 = weight(abstract_txt:compositionality in 3919) [ClassicSimilarity], result of:
            0.8361776 = score(doc=3919,freq=2.0), product of:
              0.64648753 = queryWeight, product of:
                6.0358973 = boost
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.010979087 = queryNorm
              1.2934164 = fieldWeight in 3919, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.09375 = fieldNorm(doc=3919)
        0.12 = coord(3/25)
    
  2. Rayson, P.; Piao, S.; Sharoff, S.; Evert, S.; Moiron, B.V.: Multiword expressions : hard going or plain sailing? (2015) 0.12
    0.11979652 = sum of:
      0.11979652 = product of:
        0.7487283 = sum of:
          0.06034833 = weight(abstract_txt:word in 3918) [ClassicSimilarity], result of:
            0.06034833 = score(doc=3918,freq=2.0), product of:
              0.10044203 = queryWeight, product of:
                1.6823041 = boost
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.010979087 = queryNorm
              0.60082746 = fieldWeight in 3918, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.078125 = fieldNorm(doc=3918)
          0.07862531 = weight(abstract_txt:multi in 3918) [ClassicSimilarity], result of:
            0.07862531 = score(doc=3918,freq=2.0), product of:
              0.11981565 = queryWeight, product of:
                1.8373992 = boost
                5.9394164 = idf(docFreq=317, maxDocs=44421)
                0.010979087 = queryNorm
              0.656219 = fieldWeight in 3918, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.9394164 = idf(docFreq=317, maxDocs=44421)
                0.078125 = fieldNorm(doc=3918)
          0.11703229 = weight(abstract_txt:expressions in 3918) [ClassicSimilarity], result of:
            0.11703229 = score(doc=3918,freq=2.0), product of:
              0.15619811 = queryWeight, product of:
                2.0978994 = boost
                6.7814865 = idf(docFreq=136, maxDocs=44421)
                0.010979087 = queryNorm
              0.7492555 = fieldWeight in 3918, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.7814865 = idf(docFreq=136, maxDocs=44421)
                0.078125 = fieldNorm(doc=3918)
          0.49272236 = weight(abstract_txt:compositionality in 3918) [ClassicSimilarity], result of:
            0.49272236 = score(doc=3918,freq=1.0), product of:
              0.64648753 = queryWeight, product of:
                6.0358973 = boost
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.010979087 = queryNorm
              0.7621529 = fieldWeight in 3918, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.078125 = fieldNorm(doc=3918)
        0.16 = coord(4/25)
    
  3. Snajder, J.; Almic, P.: Modeling semantic compositionality of Croatian multiword expressions (2015) 0.10
    0.1025471 = sum of:
      0.1025471 = product of:
        1.2818388 = sum of:
          0.0993052 = weight(abstract_txt:expressions in 3920) [ClassicSimilarity], result of:
            0.0993052 = score(doc=3920,freq=1.0), product of:
              0.15619811 = queryWeight, product of:
                2.0978994 = boost
                6.7814865 = idf(docFreq=136, maxDocs=44421)
                0.010979087 = queryNorm
              0.63576436 = fieldWeight in 3920, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7814865 = idf(docFreq=136, maxDocs=44421)
                0.09375 = fieldNorm(doc=3920)
          1.1825336 = weight(abstract_txt:compositionality in 3920) [ClassicSimilarity], result of:
            1.1825336 = score(doc=3920,freq=4.0), product of:
              0.64648753 = queryWeight, product of:
                6.0358973 = boost
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.010979087 = queryNorm
              1.8291669 = fieldWeight in 3920, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.09375 = fieldNorm(doc=3920)
        0.08 = coord(2/25)
    
  4. Mohan, K.C.: Boolean and nearest neighbour text searching in a multi-strategy retrieval system (1996) 0.09
    0.08844973 = sum of:
      0.08844973 = product of:
        0.44224864 = sum of:
          0.015637763 = weight(abstract_txt:results in 324) [ClassicSimilarity], result of:
            0.015637763 = score(doc=324,freq=1.0), product of:
              0.04110043 = queryWeight, product of:
                1.0761429 = boost
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.010979087 = queryNorm
              0.38047686 = fieldWeight in 324, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.109375 = fieldNorm(doc=324)
          0.10040755 = weight(abstract_txt:nearest in 324) [ClassicSimilarity], result of:
            0.10040755 = score(doc=324,freq=1.0), product of:
              0.11269315 = queryWeight, product of:
                1.260029 = boost
                8.146119 = idf(docFreq=34, maxDocs=44421)
                0.010979087 = queryNorm
              0.8909818 = fieldWeight in 324, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.146119 = idf(docFreq=34, maxDocs=44421)
                0.109375 = fieldNorm(doc=324)
          0.14541888 = weight(abstract_txt:neighbour in 324) [ClassicSimilarity], result of:
            0.14541888 = score(doc=324,freq=1.0), product of:
              0.14425598 = queryWeight, product of:
                1.4256033 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.010979087 = queryNorm
              1.0080614 = fieldWeight in 324, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.109375 = fieldNorm(doc=324)
          0.07783508 = weight(abstract_txt:multi in 324) [ClassicSimilarity], result of:
            0.07783508 = score(doc=324,freq=1.0), product of:
              0.11981565 = queryWeight, product of:
                1.8373992 = boost
                5.9394164 = idf(docFreq=317, maxDocs=44421)
                0.010979087 = queryNorm
              0.6496237 = fieldWeight in 324, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9394164 = idf(docFreq=317, maxDocs=44421)
                0.109375 = fieldNorm(doc=324)
          0.102949366 = weight(abstract_txt:vector in 324) [ClassicSimilarity], result of:
            0.102949366 = score(doc=324,freq=1.0), product of:
              0.1443707 = queryWeight, product of:
                2.016909 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.010979087 = queryNorm
              0.7130904 = fieldWeight in 324, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.109375 = fieldNorm(doc=324)
        0.2 = coord(5/25)
    
  5. Nagy T., I.: Detecting multiword expressions and named entities in natural language texts (2014) 0.07
    0.06747061 = sum of:
      0.06747061 = product of:
        0.28112754 = sum of:
          0.017925313 = weight(abstract_txt:distance in 2536) [ClassicSimilarity], result of:
            0.017925313 = score(doc=2536,freq=1.0), product of:
              0.07098008 = queryWeight, product of:
                6.4650254 = idf(docFreq=187, maxDocs=44421)
                0.010979087 = queryNorm
              0.25254005 = fieldWeight in 2536, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.4650254 = idf(docFreq=187, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2536)
          0.0055849156 = weight(abstract_txt:results in 2536) [ClassicSimilarity], result of:
            0.0055849156 = score(doc=2536,freq=1.0), product of:
              0.04110043 = queryWeight, product of:
                1.0761429 = boost
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.010979087 = queryNorm
              0.1358846 = fieldWeight in 2536, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2536)
          0.039063618 = weight(abstract_txt:supervised in 2536) [ClassicSimilarity], result of:
            0.039063618 = score(doc=2536,freq=2.0), product of:
              0.0946957 = queryWeight, product of:
                1.1550397 = boost
                7.467361 = idf(docFreq=68, maxDocs=44421)
                0.010979087 = queryNorm
              0.4125173 = fieldWeight in 2536, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.467361 = idf(docFreq=68, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2536)
          0.08018504 = weight(abstract_txt:verb in 2536) [ClassicSimilarity], result of:
            0.08018504 = score(doc=2536,freq=5.0), product of:
              0.11269315 = queryWeight, product of:
                1.260029 = boost
                8.146119 = idf(docFreq=34, maxDocs=44421)
                0.010979087 = queryNorm
              0.71153426 = fieldWeight in 2536, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                8.146119 = idf(docFreq=34, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2536)
          0.021336356 = weight(abstract_txt:word in 2536) [ClassicSimilarity], result of:
            0.021336356 = score(doc=2536,freq=1.0), product of:
              0.10044203 = queryWeight, product of:
                1.6823041 = boost
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.010979087 = queryNorm
              0.21242458 = fieldWeight in 2536, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2536)
          0.11703229 = weight(abstract_txt:expressions in 2536) [ClassicSimilarity], result of:
            0.11703229 = score(doc=2536,freq=8.0), product of:
              0.15619811 = queryWeight, product of:
                2.0978994 = boost
                6.7814865 = idf(docFreq=136, maxDocs=44421)
                0.010979087 = queryNorm
              0.7492555 = fieldWeight in 2536, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                6.7814865 = idf(docFreq=136, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2536)
        0.24 = coord(6/25)