Document (#37991)

Nissim, M.
Zaninello, A,
Modeling the internal variability of multiword expressions through a pattern-based method
ACM Transactions on Speech and Language Processing. 10(2013) no.2, Article7, S.1-26
Special issue on multiword expressions: from theory to practice and use
The issue of internal variability of multiword expressions (MWEs) is crucial towards their identification and extraction in running text.We present a corpus-supported and computational study on Italian MWEs, aimed at defining an automatic method for modeling internal variation, exploiting frequency and part-of-speech (POS) information. We do so by deriving an XML-encoded lexicon of MWEs based on a manually compiled dictionary, which is then projected onto a a large corpus. Since a search for fixed forms suffers from low recall, while an unconstrained flexible search for lemmas yields a loss in precision, we suggest a procedure aimed at maximizing precision in the identification of MWEs within a flexible search. Our method builds on the idea that internal variability can be modelled via the novel introduction of variation patterns, which work over POS patterns, and can be used as working tools for controlling precision. We also compare the performance of variation patterns to that of association measures, and explore the possibility of using variation patterns in MWE extraction in addition to identification. Finally, we suggest that corpus-derived, pattern-related information can be included in the original MWE lexicon by means of an enriched coding and the creation of an XML-based repository of patterns.
Vgl. für das Themenheft:

Similar documents (content)

  1. Nagy T., I.: Detecting multiword expressions and named entities in natural language texts (2014) 0.31
    0.31107363 = sum of:
      0.31107363 = product of:
        0.97210515 = sum of:
          0.016992275 = weight(abstract_txt:based in 2536) [ClassicSimilarity], result of:
            0.016992275 = score(doc=2536,freq=8.0), product of:
              0.04831696 = queryWeight, product of:
                1.1457918 = boost
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.013247898 = queryNorm
              0.35168344 = fieldWeight in 2536, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2536)
          0.05106751 = weight(abstract_txt:extraction in 2536) [ClassicSimilarity], result of:
            0.05106751 = score(doc=2536,freq=3.0), product of:
              0.121895455 = queryWeight, product of:
                1.4859494 = boost
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.013247898 = queryNorm
              0.41894513 = fieldWeight in 2536, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2536)
          0.0379265 = weight(abstract_txt:method in 2536) [ClassicSimilarity], result of:
            0.0379265 = score(doc=2536,freq=5.0), product of:
              0.09651624 = queryWeight, product of:
                1.6194074 = boost
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.013247898 = queryNorm
              0.3929546 = fieldWeight in 2536, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2536)
          0.10954542 = weight(abstract_txt:expressions in 2536) [ClassicSimilarity], result of:
            0.10954542 = score(doc=2536,freq=8.0), product of:
              0.1462057 = queryWeight, product of:
                1.6273929 = boost
                6.7814865 = idf(docFreq=136, maxDocs=44421)
                0.013247898 = queryNorm
              0.7492555 = fieldWeight in 2536, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                6.7814865 = idf(docFreq=136, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2536)
          0.26721752 = weight(abstract_txt:multiword in 2536) [ClassicSimilarity], result of:
            0.26721752 = score(doc=2536,freq=11.0), product of:
              0.23825599 = queryWeight, product of:
                2.0774577 = boost
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.013247898 = queryNorm
              1.1215563 = fieldWeight in 2536, product of:
                3.3166249 = tf(freq=11.0), with freq of:
                  11.0 = termFreq=11.0
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2536)
          0.064005464 = weight(abstract_txt:identification in 2536) [ClassicSimilarity], result of:
            0.064005464 = score(doc=2536,freq=3.0), product of:
              0.16220573 = queryWeight, product of:
                2.09937 = boost
                5.8321705 = idf(docFreq=353, maxDocs=44421)
                0.013247898 = queryNorm
              0.39459434 = fieldWeight in 2536, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.8321705 = idf(docFreq=353, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2536)
          0.042115755 = weight(abstract_txt:corpus in 2536) [ClassicSimilarity], result of:
            0.042115755 = score(doc=2536,freq=1.0), product of:
              0.17698032 = queryWeight, product of:
                2.1928978 = boost
                6.0919957 = idf(docFreq=272, maxDocs=44421)
                0.013247898 = queryNorm
              0.23796858 = fieldWeight in 2536, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0919957 = idf(docFreq=272, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2536)
          0.38323468 = weight(abstract_txt:mwes in 2536) [ClassicSimilarity], result of:
            0.38323468 = score(doc=2536,freq=3.0), product of:
              0.5886777 = queryWeight, product of:
                4.618108 = boost
                9.622026 = idf(docFreq=7, maxDocs=44421)
                0.013247898 = queryNorm
              0.6510093 = fieldWeight in 2536, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.622026 = idf(docFreq=7, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2536)
        0.32 = coord(8/25)
  2. Cruys, T. van de; Moirón, B.V.: Semantics-based multiword expression extraction (2007) 0.28
    0.28321192 = sum of:
      0.28321192 = product of:
        1.1800497 = sum of:
          0.014418423 = weight(abstract_txt:based in 3919) [ClassicSimilarity], result of:
            0.014418423 = score(doc=3919,freq=1.0), product of:
              0.04831696 = queryWeight, product of:
                1.1457918 = boost
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.013247898 = queryNorm
              0.2984133 = fieldWeight in 3919, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.09375 = fieldNorm(doc=3919)
          0.07076121 = weight(abstract_txt:extraction in 3919) [ClassicSimilarity], result of:
            0.07076121 = score(doc=3919,freq=1.0), product of:
              0.121895455 = queryWeight, product of:
                1.4859494 = boost
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.013247898 = queryNorm
              0.5805074 = fieldWeight in 3919, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.09375 = fieldNorm(doc=3919)
          0.05756837 = weight(abstract_txt:method in 3919) [ClassicSimilarity], result of:
            0.05756837 = score(doc=3919,freq=2.0), product of:
              0.09651624 = queryWeight, product of:
                1.6194074 = boost
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.013247898 = queryNorm
              0.5964631 = fieldWeight in 3919, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.09375 = fieldNorm(doc=3919)
          0.09295237 = weight(abstract_txt:expressions in 3919) [ClassicSimilarity], result of:
            0.09295237 = score(doc=3919,freq=1.0), product of:
              0.1462057 = queryWeight, product of:
                1.6273929 = boost
                6.7814865 = idf(docFreq=136, maxDocs=44421)
                0.013247898 = queryNorm
              0.63576436 = fieldWeight in 3919, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7814865 = idf(docFreq=136, maxDocs=44421)
                0.09375 = fieldNorm(doc=3919)
          0.19336586 = weight(abstract_txt:multiword in 3919) [ClassicSimilarity], result of:
            0.19336586 = score(doc=3919,freq=1.0), product of:
              0.23825599 = queryWeight, product of:
                2.0774577 = boost
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.013247898 = queryNorm
              0.81158864 = fieldWeight in 3919, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.09375 = fieldNorm(doc=3919)
          0.7509835 = weight(abstract_txt:mwes in 3919) [ClassicSimilarity], result of:
            0.7509835 = score(doc=3919,freq=2.0), product of:
              0.5886777 = queryWeight, product of:
                4.618108 = boost
                9.622026 = idf(docFreq=7, maxDocs=44421)
                0.013247898 = queryNorm
              1.2757125 = fieldWeight in 3919, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.622026 = idf(docFreq=7, maxDocs=44421)
                0.09375 = fieldNorm(doc=3919)
        0.24 = coord(6/25)
  3. Snajder, J.; Almic, P.: Modeling semantic compositionality of Croatian multiword expressions (2015) 0.26
    0.25907174 = sum of:
      0.25907174 = product of:
        1.2953587 = sum of:
          0.02497344 = weight(abstract_txt:based in 3920) [ClassicSimilarity], result of:
            0.02497344 = score(doc=3920,freq=3.0), product of:
              0.04831696 = queryWeight, product of:
                1.1457918 = boost
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.013247898 = queryNorm
              0.516867 = fieldWeight in 3920, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.09375 = fieldNorm(doc=3920)
          0.06430381 = weight(abstract_txt:modeling in 3920) [ClassicSimilarity], result of:
            0.06430381 = score(doc=3920,freq=1.0), product of:
              0.11436201 = queryWeight, product of:
                1.4392995 = boost
                5.997685 = idf(docFreq=299, maxDocs=44421)
                0.013247898 = queryNorm
              0.562283 = fieldWeight in 3920, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.997685 = idf(docFreq=299, maxDocs=44421)
                0.09375 = fieldNorm(doc=3920)
          0.09295237 = weight(abstract_txt:expressions in 3920) [ClassicSimilarity], result of:
            0.09295237 = score(doc=3920,freq=1.0), product of:
              0.1462057 = queryWeight, product of:
                1.6273929 = boost
                6.7814865 = idf(docFreq=136, maxDocs=44421)
                0.013247898 = queryNorm
              0.63576436 = fieldWeight in 3920, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7814865 = idf(docFreq=136, maxDocs=44421)
                0.09375 = fieldNorm(doc=3920)
          0.19336586 = weight(abstract_txt:multiword in 3920) [ClassicSimilarity], result of:
            0.19336586 = score(doc=3920,freq=1.0), product of:
              0.23825599 = queryWeight, product of:
                2.0774577 = boost
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.013247898 = queryNorm
              0.81158864 = fieldWeight in 3920, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.09375 = fieldNorm(doc=3920)
          0.91976315 = weight(abstract_txt:mwes in 3920) [ClassicSimilarity], result of:
            0.91976315 = score(doc=3920,freq=3.0), product of:
              0.5886777 = queryWeight, product of:
                4.618108 = boost
                9.622026 = idf(docFreq=7, maxDocs=44421)
                0.013247898 = queryNorm
              1.5624223 = fieldWeight in 3920, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.622026 = idf(docFreq=7, maxDocs=44421)
                0.09375 = fieldNorm(doc=3920)
        0.2 = coord(5/25)
  4. Ramisch, C.; Schreiner, P.; Idiart, M.; Villavicencio, A.: ¬An evaluation of methods for the extraction of multiword expressions (20xx) 0.15
    0.14522225 = sum of:
      0.14522225 = product of:
        1.2101854 = sum of:
          0.10844443 = weight(abstract_txt:expressions in 1962) [ClassicSimilarity], result of:
            0.10844443 = score(doc=1962,freq=1.0), product of:
              0.1462057 = queryWeight, product of:
                1.6273929 = boost
                6.7814865 = idf(docFreq=136, maxDocs=44421)
                0.013247898 = queryNorm
              0.7417251 = fieldWeight in 1962, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7814865 = idf(docFreq=136, maxDocs=44421)
                0.109375 = fieldNorm(doc=1962)
          0.22559349 = weight(abstract_txt:multiword in 1962) [ClassicSimilarity], result of:
            0.22559349 = score(doc=1962,freq=1.0), product of:
              0.23825599 = queryWeight, product of:
                2.0774577 = boost
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.013247898 = queryNorm
              0.9468534 = fieldWeight in 1962, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.109375 = fieldNorm(doc=1962)
          0.87614745 = weight(abstract_txt:mwes in 1962) [ClassicSimilarity], result of:
            0.87614745 = score(doc=1962,freq=2.0), product of:
              0.5886777 = queryWeight, product of:
                4.618108 = boost
                9.622026 = idf(docFreq=7, maxDocs=44421)
                0.013247898 = queryNorm
              1.4883313 = fieldWeight in 1962, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.622026 = idf(docFreq=7, maxDocs=44421)
                0.109375 = fieldNorm(doc=1962)
        0.12 = coord(3/25)
  5. Ferret, O.; Grau, B.; Hurault-Plantet, M.; Illouz, G.; Jacquemin, C.; Monceaux, L.; Robba, I.; Vilnat, A.: How NLP can improve question answering (2002) 0.12
    0.11768416 = sum of:
      0.11768416 = product of:
        0.5884208 = sum of:
          0.05896768 = weight(abstract_txt:extraction in 2850) [ClassicSimilarity], result of:
            0.05896768 = score(doc=2850,freq=1.0), product of:
              0.121895455 = queryWeight, product of:
                1.4859494 = boost
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.013247898 = queryNorm
              0.48375618 = fieldWeight in 2850, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.078125 = fieldNorm(doc=2850)
          0.1611382 = weight(abstract_txt:multiword in 2850) [ClassicSimilarity], result of:
            0.1611382 = score(doc=2850,freq=1.0), product of:
              0.23825599 = queryWeight, product of:
                2.0774577 = boost
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.013247898 = queryNorm
              0.67632383 = fieldWeight in 2850, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.078125 = fieldNorm(doc=2850)
          0.073907144 = weight(abstract_txt:identification in 2850) [ClassicSimilarity], result of:
            0.073907144 = score(doc=2850,freq=1.0), product of:
              0.16220573 = queryWeight, product of:
                2.09937 = boost
                5.8321705 = idf(docFreq=353, maxDocs=44421)
                0.013247898 = queryNorm
              0.45563832 = fieldWeight in 2850, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8321705 = idf(docFreq=353, maxDocs=44421)
                0.078125 = fieldNorm(doc=2850)
          0.12841839 = weight(abstract_txt:patterns in 2850) [ClassicSimilarity], result of:
            0.12841839 = score(doc=2850,freq=2.0), product of:
              0.22061396 = queryWeight, product of:
                3.1607983 = boost
                5.2685275 = idf(docFreq=621, maxDocs=44421)
                0.013247898 = queryNorm
              0.5820955 = fieldWeight in 2850, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.2685275 = idf(docFreq=621, maxDocs=44421)
                0.078125 = fieldNorm(doc=2850)
          0.1659894 = weight(abstract_txt:variation in 2850) [ClassicSimilarity], result of:
            0.1659894 = score(doc=2850,freq=1.0), product of:
              0.30617875 = queryWeight, product of:
                3.3305259 = boost
                6.939294 = idf(docFreq=116, maxDocs=44421)
                0.013247898 = queryNorm
              0.5421323 = fieldWeight in 2850, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.939294 = idf(docFreq=116, maxDocs=44421)
                0.078125 = fieldNorm(doc=2850)
        0.2 = coord(5/25)