Document (#39921)

Author
Snajder, J.
Almic, P.
Title
Modeling semantic compositionality of Croatian multiword expressions
Source
Informatica. 39(2015) H.3, S.301-309
Year
2015
Abstract
A distinguishing feature of many multiword expressions (MWEs) is their semantic non-compositionality. Determining the semantic compositionality of MWEs is important for many natural language processing tasks. We address the task of modeling semantic compositionality of Croatian MWEs. We adopt a composition-based approach within the distributional semantics framework. We build and evaluate models based on Latent Semantic Analysis and the recently proposed neural network-based Skip-gram model, and experiment with different composition functions. We show that the compositionality scores predicted by the Skip-gram additive models correlate well with human judgments (=0.50). When framed as a classification task, the model achieves an accuracy of 0.64.
Content
Vgl. unter: http://takelab.fer.hr/data/cromwesc/. The dataset is available from here: TakeLab-CroMWEsc.tar.gz. The archive contains one file, which contains a list of 200 Croatian multiword expressions annotated with semantic compositionality scores. Twenty expressions were annotated by 24 annotators (denoted by "*") and the rest of them were annotated by 6 annotators. Besides median, we provide mode, mean, and standard deviation for each expression. Consult the above mentioned paper for details.
Theme
Computerlinguistik

Similar documents (content)

  1. Cruys, T. van de; Moirón, B.V.: Semantics-based multiword expression extraction (2007) 0.44
    0.43798107 = sum of:
      0.43798107 = product of:
        1.8249211 = sum of:
          0.11485003 = weight(abstract_txt:distributional in 3919) [ClassicSimilarity], result of:
            0.11485003 = score(doc=3919,freq=1.0), product of:
              0.13034177 = queryWeight, product of:
                1.3658812 = boost
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.010153001 = queryNorm
              0.88114524 = fieldWeight in 3919, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.09375 = fieldNorm(doc=3919)
          0.013383336 = weight(abstract_txt:based in 3919) [ClassicSimilarity], result of:
            0.013383336 = score(doc=3919,freq=1.0), product of:
              0.044848323 = queryWeight, product of:
                1.3877296 = boost
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.010153001 = queryNorm
              0.2984133 = fieldWeight in 3919, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.09375 = fieldNorm(doc=3919)
          0.0862794 = weight(abstract_txt:expressions in 3919) [ClassicSimilarity], result of:
            0.0862794 = score(doc=3919,freq=1.0), product of:
              0.13570972 = queryWeight, product of:
                1.9710226 = boost
                6.7814865 = idf(docFreq=136, maxDocs=44421)
                0.010153001 = queryNorm
              0.63576436 = fieldWeight in 3919, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7814865 = idf(docFreq=136, maxDocs=44421)
                0.09375 = fieldNorm(doc=3919)
          0.17948431 = weight(abstract_txt:multiword in 3919) [ClassicSimilarity], result of:
            0.17948431 = score(doc=3919,freq=1.0), product of:
              0.22115181 = queryWeight, product of:
                2.5161202 = boost
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.010153001 = queryNorm
              0.81158864 = fieldWeight in 3919, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.09375 = fieldNorm(doc=3919)
          0.52280325 = weight(abstract_txt:mwes in 3919) [ClassicSimilarity], result of:
            0.52280325 = score(doc=3919,freq=2.0), product of:
              0.40981275 = queryWeight, product of:
                4.1949277 = boost
                9.622026 = idf(docFreq=7, maxDocs=44421)
                0.010153001 = queryNorm
              1.2757125 = fieldWeight in 3919, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.622026 = idf(docFreq=7, maxDocs=44421)
                0.09375 = fieldNorm(doc=3919)
          0.9081209 = weight(abstract_txt:compositionality in 3919) [ClassicSimilarity], result of:
            0.9081209 = score(doc=3919,freq=2.0), product of:
              0.70211023 = queryWeight, product of:
                7.0885725 = boost
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.010153001 = queryNorm
              1.2934164 = fieldWeight in 3919, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.09375 = fieldNorm(doc=3919)
        0.24 = coord(6/25)
    
  2. Nagy T., I.: Detecting multiword expressions and named entities in natural language texts (2014) 0.19
    0.19378972 = sum of:
      0.19378972 = product of:
        0.6921061 = sum of:
          0.0072825053 = weight(abstract_txt:model in 2536) [ClassicSimilarity], result of:
            0.0072825053 = score(doc=2536,freq=1.0), product of:
              0.046809524 = queryWeight, product of:
                1.157586 = boost
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.010153001 = queryNorm
              0.15557742 = fieldWeight in 2536, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2536)
          0.007827665 = weight(abstract_txt:many in 2536) [ClassicSimilarity], result of:
            0.007827665 = score(doc=2536,freq=1.0), product of:
              0.049117375 = queryWeight, product of:
                1.1857789 = boost
                4.0797825 = idf(docFreq=2041, maxDocs=44421)
                0.010153001 = queryNorm
              0.1593665 = fieldWeight in 2536, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0797825 = idf(docFreq=2041, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2536)
          0.015772412 = weight(abstract_txt:based in 2536) [ClassicSimilarity], result of:
            0.015772412 = score(doc=2536,freq=8.0), product of:
              0.044848323 = queryWeight, product of:
                1.3877296 = boost
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.010153001 = queryNorm
              0.35168344 = fieldWeight in 2536, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2536)
          0.10168125 = weight(abstract_txt:expressions in 2536) [ClassicSimilarity], result of:
            0.10168125 = score(doc=2536,freq=8.0), product of:
              0.13570972 = queryWeight, product of:
                1.9710226 = boost
                6.7814865 = idf(docFreq=136, maxDocs=44421)
                0.010153001 = queryNorm
              0.7492555 = fieldWeight in 2536, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                6.7814865 = idf(docFreq=136, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2536)
          0.24803421 = weight(abstract_txt:multiword in 2536) [ClassicSimilarity], result of:
            0.24803421 = score(doc=2536,freq=11.0), product of:
              0.22115181 = queryWeight, product of:
                2.5161202 = boost
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.010153001 = queryNorm
              1.1215563 = fieldWeight in 2536, product of:
                3.3166249 = tf(freq=11.0), with freq of:
                  11.0 = termFreq=11.0
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2536)
          0.044716164 = weight(abstract_txt:semantic in 2536) [ClassicSimilarity], result of:
            0.044716164 = score(doc=2536,freq=3.0), product of:
              0.14770538 = queryWeight, product of:
                3.2512794 = boost
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.010153001 = queryNorm
              0.3027389 = fieldWeight in 2536, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2536)
          0.2667919 = weight(abstract_txt:mwes in 2536) [ClassicSimilarity], result of:
            0.2667919 = score(doc=2536,freq=3.0), product of:
              0.40981275 = queryWeight, product of:
                4.1949277 = boost
                9.622026 = idf(docFreq=7, maxDocs=44421)
                0.010153001 = queryNorm
              0.6510093 = fieldWeight in 2536, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.622026 = idf(docFreq=7, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2536)
        0.28 = coord(7/25)
    
  3. Rayson, P.; Piao, S.; Sharoff, S.; Evert, S.; Moiron, B.V.: Multiword expressions : hard going or plain sailing? (2015) 0.18
    0.17985596 = sum of:
      0.17985596 = product of:
        1.1240997 = sum of:
          0.10168125 = weight(abstract_txt:expressions in 3918) [ClassicSimilarity], result of:
            0.10168125 = score(doc=3918,freq=2.0), product of:
              0.13570972 = queryWeight, product of:
                1.9710226 = boost
                6.7814865 = idf(docFreq=136, maxDocs=44421)
                0.010153001 = queryNorm
              0.7492555 = fieldWeight in 3918, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.7814865 = idf(docFreq=136, maxDocs=44421)
                0.078125 = fieldNorm(doc=3918)
          0.05163378 = weight(abstract_txt:semantic in 3918) [ClassicSimilarity], result of:
            0.05163378 = score(doc=3918,freq=1.0), product of:
              0.14770538 = queryWeight, product of:
                3.2512794 = boost
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.010153001 = queryNorm
              0.34957278 = fieldWeight in 3918, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.078125 = fieldNorm(doc=3918)
          0.4356694 = weight(abstract_txt:mwes in 3918) [ClassicSimilarity], result of:
            0.4356694 = score(doc=3918,freq=2.0), product of:
              0.40981275 = queryWeight, product of:
                4.1949277 = boost
                9.622026 = idf(docFreq=7, maxDocs=44421)
                0.010153001 = queryNorm
              1.0630938 = fieldWeight in 3918, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.622026 = idf(docFreq=7, maxDocs=44421)
                0.078125 = fieldNorm(doc=3918)
          0.53511536 = weight(abstract_txt:compositionality in 3918) [ClassicSimilarity], result of:
            0.53511536 = score(doc=3918,freq=1.0), product of:
              0.70211023 = queryWeight, product of:
                7.0885725 = boost
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.010153001 = queryNorm
              0.7621529 = fieldWeight in 3918, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.078125 = fieldNorm(doc=3918)
        0.16 = coord(4/25)
    
  4. Kiela, D.; Clark, S.: Detecting compositionality of multi-word expressions using nearest neighbours in vector space models (2013) 0.18
    0.17646387 = sum of:
      0.17646387 = product of:
        1.4705323 = sum of:
          0.1006593 = weight(abstract_txt:expressions in 2161) [ClassicSimilarity], result of:
            0.1006593 = score(doc=2161,freq=1.0), product of:
              0.13570972 = queryWeight, product of:
                1.9710226 = boost
                6.7814865 = idf(docFreq=136, maxDocs=44421)
                0.010153001 = queryNorm
              0.7417251 = fieldWeight in 2161, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7814865 = idf(docFreq=136, maxDocs=44421)
                0.109375 = fieldNorm(doc=2161)
          0.07228729 = weight(abstract_txt:semantic in 2161) [ClassicSimilarity], result of:
            0.07228729 = score(doc=2161,freq=1.0), product of:
              0.14770538 = queryWeight, product of:
                3.2512794 = boost
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.010153001 = queryNorm
              0.4894019 = fieldWeight in 2161, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.109375 = fieldNorm(doc=2161)
          1.2975857 = weight(abstract_txt:compositionality in 2161) [ClassicSimilarity], result of:
            1.2975857 = score(doc=2161,freq=3.0), product of:
              0.70211023 = queryWeight, product of:
                7.0885725 = boost
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.010153001 = queryNorm
              1.8481225 = fieldWeight in 2161, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.109375 = fieldNorm(doc=2161)
        0.12 = coord(3/25)
    
  5. Nissim, M.; Zaninello, A,: Modeling the internal variability of multiword expressions through a pattern-based method (2013) 0.14
    0.14449781 = sum of:
      0.14449781 = product of:
        0.72248906 = sum of:
          0.012617929 = weight(abstract_txt:based in 1990) [ClassicSimilarity], result of:
            0.012617929 = score(doc=1990,freq=2.0), product of:
              0.044848323 = queryWeight, product of:
                1.3877296 = boost
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.010153001 = queryNorm
              0.28134674 = fieldWeight in 1990, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.0625 = fieldNorm(doc=1990)
          0.039791666 = weight(abstract_txt:modeling in 1990) [ClassicSimilarity], result of:
            0.039791666 = score(doc=1990,freq=1.0), product of:
              0.106152065 = queryWeight, product of:
                1.7432126 = boost
                5.997685 = idf(docFreq=299, maxDocs=44421)
                0.010153001 = queryNorm
              0.3748553 = fieldWeight in 1990, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.997685 = idf(docFreq=299, maxDocs=44421)
                0.0625 = fieldNorm(doc=1990)
          0.0575196 = weight(abstract_txt:expressions in 1990) [ClassicSimilarity], result of:
            0.0575196 = score(doc=1990,freq=1.0), product of:
              0.13570972 = queryWeight, product of:
                1.9710226 = boost
                6.7814865 = idf(docFreq=136, maxDocs=44421)
                0.010153001 = queryNorm
              0.4238429 = fieldWeight in 1990, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7814865 = idf(docFreq=136, maxDocs=44421)
                0.0625 = fieldNorm(doc=1990)
          0.1196562 = weight(abstract_txt:multiword in 1990) [ClassicSimilarity], result of:
            0.1196562 = score(doc=1990,freq=1.0), product of:
              0.22115181 = queryWeight, product of:
                2.5161202 = boost
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.010153001 = queryNorm
              0.5410591 = fieldWeight in 1990, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.0625 = fieldNorm(doc=1990)
          0.49290365 = weight(abstract_txt:mwes in 1990) [ClassicSimilarity], result of:
            0.49290365 = score(doc=1990,freq=4.0), product of:
              0.40981275 = queryWeight, product of:
                4.1949277 = boost
                9.622026 = idf(docFreq=7, maxDocs=44421)
                0.010153001 = queryNorm
              1.2027533 = fieldWeight in 1990, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                9.622026 = idf(docFreq=7, maxDocs=44421)
                0.0625 = fieldNorm(doc=1990)
        0.2 = coord(5/25)