Document (#36237)

Author
Yu, L.-C.
Wu, C.-H.
Chang, R.-Y.
Liu, C.-H.
Hovy, E.H.
Title
Annotation and verification of sense pools in OntoNotes
Source
Information processing and management. 46(2010) no.4, S.436-447
Year
2010
Abstract
The paper describes the OntoNotes, a multilingual (English, Chinese and Arabic) corpus with large-scale semantic annotations, including predicate-argument structure, word senses, ontology linking, and coreference. The underlying semantic model of OntoNotes involves word senses that are grouped into so-called sense pools, i.e., sets of near-synonymous senses of words. Such information is useful for many applications, including query expansion for information retrieval (IR) systems, (near-)duplicate detection for text summarization systems, and alternative word selection for writing support systems. Although a sense pool provides a set of near-synonymous senses of words, there is still no knowledge about whether two words in a pool are interchangeable in practical use. Therefore, this paper devises an unsupervised algorithm that incorporates Google n-grams and a statistical test to determine whether a word in a pool can be substituted by other words in the same pool. The n-gram features are used to measure the degree of context mismatch for a substitution. The statistical test is then applied to determine whether the substitution is adequate based on the degree of mismatch. The proposed method is compared with a supervised method, namely Linear Discriminant Analysis (LDA). Experimental results show that the proposed unsupervised method can achieve comparable performance with the supervised method.
Theme
Wissensrepräsentation
Multilinguale Probleme
Object
OntoNotes

Similar documents (author)

  1. Chang, R.: DBase, relational data models, and MARC records (1992) 4.79
    4.78651 = sum of:
      4.78651 = weight(author_txt:chang in 5056) [ClassicSimilarity], result of:
        4.78651 = fieldWeight in 5056, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.6584163 = idf(docFreq=56, maxDocs=44421)
          0.625 = fieldNorm(doc=5056)
    
  2. Chang, R.: ¬The development of indexing technology (1993) 4.79
    4.78651 = sum of:
      4.78651 = weight(author_txt:chang in 7023) [ClassicSimilarity], result of:
        4.78651 = fieldWeight in 7023, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.6584163 = idf(docFreq=56, maxDocs=44421)
          0.625 = fieldNorm(doc=7023)
    
  3. Chang, R.: Keyword searching and indexing (1993) 4.79
    4.78651 = sum of:
      4.78651 = weight(author_txt:chang in 7222) [ClassicSimilarity], result of:
        4.78651 = fieldWeight in 7222, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.6584163 = idf(docFreq=56, maxDocs=44421)
          0.625 = fieldNorm(doc=7222)
    
  4. Chang, R.H.: To classify or not to classify? : a new look at an old problem (1989) 4.79
    4.78651 = sum of:
      4.78651 = weight(author_txt:chang in 2578) [ClassicSimilarity], result of:
        4.78651 = fieldWeight in 2578, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.6584163 = idf(docFreq=56, maxDocs=44421)
          0.625 = fieldNorm(doc=2578)
    
  5. Chang, S.H.: ¬The current state of Web search engines (1999) 4.79
    4.78651 = sum of:
      4.78651 = weight(author_txt:chang in 1509) [ClassicSimilarity], result of:
        4.78651 = fieldWeight in 1509, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.6584163 = idf(docFreq=56, maxDocs=44421)
          0.625 = fieldNorm(doc=1509)
    

Similar documents (content)

  1. Krovetz, R.; Croft, W.B.: Lexical ambiguity and information retrieval (1992) 0.20
    0.19738916 = sum of:
      0.19738916 = product of:
        0.8224548 = sum of:
          0.042930253 = weight(abstract_txt:determine in 4027) [ClassicSimilarity], result of:
            0.042930253 = score(doc=4027,freq=1.0), product of:
              0.08744191 = queryWeight, product of:
                1.1703749 = boost
                5.2368793 = idf(docFreq=641, maxDocs=44421)
                0.014266652 = queryNorm
              0.49095744 = fieldWeight in 4027, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2368793 = idf(docFreq=641, maxDocs=44421)
                0.09375 = fieldNorm(doc=4027)
          0.05096088 = weight(abstract_txt:whether in 4027) [ClassicSimilarity], result of:
            0.05096088 = score(doc=4027,freq=1.0), product of:
              0.11221881 = queryWeight, product of:
                1.6238415 = boost
                4.8439536 = idf(docFreq=950, maxDocs=44421)
                0.014266652 = queryNorm
              0.45412064 = fieldWeight in 4027, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8439536 = idf(docFreq=950, maxDocs=44421)
                0.09375 = fieldNorm(doc=4027)
          0.08856071 = weight(abstract_txt:sense in 4027) [ClassicSimilarity], result of:
            0.08856071 = score(doc=4027,freq=1.0), product of:
              0.16220658 = queryWeight, product of:
                1.9522932 = boost
                5.823732 = idf(docFreq=356, maxDocs=44421)
                0.014266652 = queryNorm
              0.54597485 = fieldWeight in 4027, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.823732 = idf(docFreq=356, maxDocs=44421)
                0.09375 = fieldNorm(doc=4027)
          0.09184518 = weight(abstract_txt:words in 4027) [ClassicSimilarity], result of:
            0.09184518 = score(doc=4027,freq=1.0), product of:
              0.18291874 = queryWeight, product of:
                2.393918 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.014266652 = queryNorm
              0.50210917 = fieldWeight in 4027, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.09375 = fieldNorm(doc=4027)
          0.16652161 = weight(abstract_txt:word in 4027) [ClassicSimilarity], result of:
            0.16652161 = score(doc=4027,freq=3.0), product of:
              0.18857928 = queryWeight, product of:
                2.4306765 = boost
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.014266652 = queryNorm
              0.8830324 = fieldWeight in 4027, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.09375 = fieldNorm(doc=4027)
          0.38163617 = weight(abstract_txt:senses in 4027) [ClassicSimilarity], result of:
            0.38163617 = score(doc=4027,freq=1.0), product of:
              0.4727741 = queryWeight, product of:
                3.8486373 = boost
                8.610425 = idf(docFreq=21, maxDocs=44421)
                0.014266652 = queryNorm
              0.8072274 = fieldWeight in 4027, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.610425 = idf(docFreq=21, maxDocs=44421)
                0.09375 = fieldNorm(doc=4027)
        0.24 = coord(6/25)
    
  2. Kiela, D.; Clark, S.: Detecting compositionality of multi-word expressions using nearest neighbours in vector space models (2013) 0.17
    0.16800047 = sum of:
      0.16800047 = product of:
        0.700002 = sum of:
          0.031241748 = weight(abstract_txt:semantic in 2161) [ClassicSimilarity], result of:
            0.031241748 = score(doc=2161,freq=1.0), product of:
              0.06383659 = queryWeight, product of:
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.014266652 = queryNorm
              0.4894019 = fieldWeight in 2161, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.109375 = fieldNorm(doc=2161)
          0.1496977 = weight(abstract_txt:substituted in 2161) [ClassicSimilarity], result of:
            0.1496977 = score(doc=2161,freq=1.0), product of:
              0.14400566 = queryWeight, product of:
                1.0620377 = boost
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.014266652 = queryNorm
              1.0395266 = fieldWeight in 2161, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.109375 = fieldNorm(doc=2161)
          0.14520915 = weight(abstract_txt:supervised in 2161) [ClassicSimilarity], result of:
            0.14520915 = score(doc=2161,freq=1.0), product of:
              0.1777906 = queryWeight, product of:
                1.6688586 = boost
                7.467361 = idf(docFreq=68, maxDocs=44421)
                0.014266652 = queryNorm
              0.8167426 = fieldWeight in 2161, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.467361 = idf(docFreq=68, maxDocs=44421)
                0.109375 = fieldNorm(doc=2161)
          0.15453583 = weight(abstract_txt:unsupervised in 2161) [ClassicSimilarity], result of:
            0.15453583 = score(doc=2161,freq=1.0), product of:
              0.18532425 = queryWeight, product of:
                1.7038498 = boost
                7.62393 = idf(docFreq=58, maxDocs=44421)
                0.014266652 = queryNorm
              0.8338673 = fieldWeight in 2161, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.62393 = idf(docFreq=58, maxDocs=44421)
                0.109375 = fieldNorm(doc=2161)
          0.10715271 = weight(abstract_txt:words in 2161) [ClassicSimilarity], result of:
            0.10715271 = score(doc=2161,freq=1.0), product of:
              0.18291874 = queryWeight, product of:
                2.393918 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.014266652 = queryNorm
              0.58579403 = fieldWeight in 2161, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.109375 = fieldNorm(doc=2161)
          0.11216485 = weight(abstract_txt:word in 2161) [ClassicSimilarity], result of:
            0.11216485 = score(doc=2161,freq=1.0), product of:
              0.18857928 = queryWeight, product of:
                2.4306765 = boost
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.014266652 = queryNorm
              0.59478885 = fieldWeight in 2161, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.109375 = fieldNorm(doc=2161)
        0.24 = coord(6/25)
    
  3. Green, R.: WordNet (2009) 0.16
    0.16078904 = sum of:
      0.16078904 = product of:
        0.8039452 = sum of:
          0.037870716 = weight(abstract_txt:semantic in 696) [ClassicSimilarity], result of:
            0.037870716 = score(doc=696,freq=2.0), product of:
              0.06383659 = queryWeight, product of:
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.014266652 = queryNorm
              0.5932447 = fieldWeight in 696, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.09375 = fieldNorm(doc=696)
          0.15991332 = weight(abstract_txt:synonymous in 696) [ClassicSimilarity], result of:
            0.15991332 = score(doc=696,freq=1.0), product of:
              0.21011984 = queryWeight, product of:
                1.8142565 = boost
                8.117949 = idf(docFreq=35, maxDocs=44421)
                0.014266652 = queryNorm
              0.7610577 = fieldWeight in 696, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.117949 = idf(docFreq=35, maxDocs=44421)
                0.09375 = fieldNorm(doc=696)
          0.08856071 = weight(abstract_txt:sense in 696) [ClassicSimilarity], result of:
            0.08856071 = score(doc=696,freq=1.0), product of:
              0.16220658 = queryWeight, product of:
                1.9522932 = boost
                5.823732 = idf(docFreq=356, maxDocs=44421)
                0.014266652 = queryNorm
              0.54597485 = fieldWeight in 696, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.823732 = idf(docFreq=356, maxDocs=44421)
                0.09375 = fieldNorm(doc=696)
          0.13596432 = weight(abstract_txt:word in 696) [ClassicSimilarity], result of:
            0.13596432 = score(doc=696,freq=2.0), product of:
              0.18857928 = queryWeight, product of:
                2.4306765 = boost
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.014266652 = queryNorm
              0.7209929 = fieldWeight in 696, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.09375 = fieldNorm(doc=696)
          0.38163617 = weight(abstract_txt:senses in 696) [ClassicSimilarity], result of:
            0.38163617 = score(doc=696,freq=1.0), product of:
              0.4727741 = queryWeight, product of:
                3.8486373 = boost
                8.610425 = idf(docFreq=21, maxDocs=44421)
                0.014266652 = queryNorm
              0.8072274 = fieldWeight in 696, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.610425 = idf(docFreq=21, maxDocs=44421)
                0.09375 = fieldNorm(doc=696)
        0.2 = coord(5/25)
    
  4. Cribbin, T.: Discovering latent topical structure by second-order similarity analysis (2011) 0.14
    0.14311998 = sum of:
      0.14311998 = product of:
        0.44724992 = sum of:
          0.025247145 = weight(abstract_txt:semantic in 470) [ClassicSimilarity], result of:
            0.025247145 = score(doc=470,freq=2.0), product of:
              0.06383659 = queryWeight, product of:
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.014266652 = queryNorm
              0.39549646 = fieldWeight in 470, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.0625 = fieldNorm(doc=470)
          0.011864728 = weight(abstract_txt:systems in 470) [ClassicSimilarity], result of:
            0.011864728 = score(doc=470,freq=1.0), product of:
              0.055651102 = queryWeight, product of:
                1.1435304 = boost
                3.411175 = idf(docFreq=3984, maxDocs=44421)
                0.014266652 = queryNorm
              0.21319844 = fieldWeight in 470, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.411175 = idf(docFreq=3984, maxDocs=44421)
                0.0625 = fieldNorm(doc=470)
          0.02862017 = weight(abstract_txt:determine in 470) [ClassicSimilarity], result of:
            0.02862017 = score(doc=470,freq=1.0), product of:
              0.08744191 = queryWeight, product of:
                1.1703749 = boost
                5.2368793 = idf(docFreq=641, maxDocs=44421)
                0.014266652 = queryNorm
              0.32730496 = fieldWeight in 470, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2368793 = idf(docFreq=641, maxDocs=44421)
                0.0625 = fieldNorm(doc=470)
          0.10660888 = weight(abstract_txt:synonymous in 470) [ClassicSimilarity], result of:
            0.10660888 = score(doc=470,freq=1.0), product of:
              0.21011984 = queryWeight, product of:
                1.8142565 = boost
                8.117949 = idf(docFreq=35, maxDocs=44421)
                0.014266652 = queryNorm
              0.5073718 = fieldWeight in 470, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.117949 = idf(docFreq=35, maxDocs=44421)
                0.0625 = fieldNorm(doc=470)
          0.11834925 = weight(abstract_txt:mismatch in 470) [ClassicSimilarity], result of:
            0.11834925 = score(doc=470,freq=1.0), product of:
              0.22527611 = queryWeight, product of:
                1.8785499 = boost
                8.405631 = idf(docFreq=26, maxDocs=44421)
                0.014266652 = queryNorm
              0.52535194 = fieldWeight in 470, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.405631 = idf(docFreq=26, maxDocs=44421)
                0.0625 = fieldNorm(doc=470)
          0.059040476 = weight(abstract_txt:sense in 470) [ClassicSimilarity], result of:
            0.059040476 = score(doc=470,freq=1.0), product of:
              0.16220658 = queryWeight, product of:
                1.9522932 = boost
                5.823732 = idf(docFreq=356, maxDocs=44421)
                0.014266652 = queryNorm
              0.36398324 = fieldWeight in 470, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.823732 = idf(docFreq=356, maxDocs=44421)
                0.0625 = fieldNorm(doc=470)
          0.03628912 = weight(abstract_txt:method in 470) [ClassicSimilarity], result of:
            0.03628912 = score(doc=470,freq=1.0), product of:
              0.1290622 = queryWeight, product of:
                2.0108502 = boost
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.014266652 = queryNorm
              0.2811754 = fieldWeight in 470, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.0625 = fieldNorm(doc=470)
          0.06123012 = weight(abstract_txt:words in 470) [ClassicSimilarity], result of:
            0.06123012 = score(doc=470,freq=1.0), product of:
              0.18291874 = queryWeight, product of:
                2.393918 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.014266652 = queryNorm
              0.33473945 = fieldWeight in 470, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.0625 = fieldNorm(doc=470)
        0.32 = coord(8/25)
    
  5. Garcés, P.J.; Olivas, J.A.; Romero, F.P.: Concept-matching IR systems versus word-matching information retrieval systems : considering fuzzy interrelations for indexing Web pages (2006) 0.13
    0.12545645 = sum of:
      0.12545645 = product of:
        0.52273524 = sum of:
          0.025247145 = weight(abstract_txt:semantic in 288) [ClassicSimilarity], result of:
            0.025247145 = score(doc=288,freq=2.0), product of:
              0.06383659 = queryWeight, product of:
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.014266652 = queryNorm
              0.39549646 = fieldWeight in 288, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.0625 = fieldNorm(doc=288)
          0.019498887 = weight(abstract_txt:proposed in 288) [ClassicSimilarity], result of:
            0.019498887 = score(doc=288,freq=1.0), product of:
              0.06770354 = queryWeight, product of:
                1.0298426 = boost
                4.608063 = idf(docFreq=1203, maxDocs=44421)
                0.014266652 = queryNorm
              0.28800395 = fieldWeight in 288, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.608063 = idf(docFreq=1203, maxDocs=44421)
                0.0625 = fieldNorm(doc=288)
          0.051320564 = weight(abstract_txt:method in 288) [ClassicSimilarity], result of:
            0.051320564 = score(doc=288,freq=2.0), product of:
              0.1290622 = queryWeight, product of:
                2.0108502 = boost
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.014266652 = queryNorm
              0.39764208 = fieldWeight in 288, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.0625 = fieldNorm(doc=288)
          0.06123012 = weight(abstract_txt:words in 288) [ClassicSimilarity], result of:
            0.06123012 = score(doc=288,freq=1.0), product of:
              0.18291874 = queryWeight, product of:
                2.393918 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.014266652 = queryNorm
              0.33473945 = fieldWeight in 288, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.0625 = fieldNorm(doc=288)
          0.1110144 = weight(abstract_txt:word in 288) [ClassicSimilarity], result of:
            0.1110144 = score(doc=288,freq=3.0), product of:
              0.18857928 = queryWeight, product of:
                2.4306765 = boost
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.014266652 = queryNorm
              0.58868825 = fieldWeight in 288, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.0625 = fieldNorm(doc=288)
          0.25442412 = weight(abstract_txt:senses in 288) [ClassicSimilarity], result of:
            0.25442412 = score(doc=288,freq=1.0), product of:
              0.4727741 = queryWeight, product of:
                3.8486373 = boost
                8.610425 = idf(docFreq=21, maxDocs=44421)
                0.014266652 = queryNorm
              0.53815156 = fieldWeight in 288, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.610425 = idf(docFreq=21, maxDocs=44421)
                0.0625 = fieldNorm(doc=288)
        0.24 = coord(6/25)