Document (#42816)

Phan, M.C.
Sun, A.
Collective named entity recognition in user comments via parameterized label propagation
Journal of the Association for Information Science and Technology. 71(2020) no.5, S.568-577
Named entity recognition (NER) in the past has focused on extracting mentions in a local region, within a sentence or short paragraph. When dealing with user-generated text, the diverse and informal writing style makes traditional approaches much less effective. On the other hand, in many types of text on social media such as user comments, tweets, or question-answer posts, the contextual connections between documents do exist. Examples include posts in a thread discussing the same topic, tweets that share a hashtag about the same entity. Our idea in this work is utilizing the related contexts across documents to perform mention recognition in a collective manner. Intuitively, within a mention coreference graph, the labels of mentions are expected to propagate from more confidence cases to less confidence ones. To this end, we propose a novel semisupervised inference algorithm named parameterized label propagation. In our model, the propagation weights between mentions are learned by an attention-like mechanism, given their local contexts and the initial labels as input. We study the performance of our approach in the Yahoo! News data set, where comments and articles within a thread share similar context. The results show that our model significantly outperforms all other noncollective NER baselines.

Similar documents (content)

  1. Gao, N.; Dredze, M.; Oard, D.W.: Person entity linking in email with NIL detection (2017) 0.14
    0.13721122 = sum of:
      0.13721122 = product of:
        0.6860561 = sum of:
          0.0974243 = weight(abstract_txt:posts in 4830) [ClassicSimilarity], result of:
            0.0974243 = score(doc=4830,freq=1.0), product of:
              0.20709075 = queryWeight, product of:
                1.6611652 = boost
                7.5270805 = idf(docFreq=64, maxDocs=44421)
                0.016562326 = queryNorm
              0.47044253 = fieldWeight in 4830, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5270805 = idf(docFreq=64, maxDocs=44421)
                0.0625 = fieldNorm(doc=4830)
          0.10480219 = weight(abstract_txt:mention in 4830) [ClassicSimilarity], result of:
            0.10480219 = score(doc=4830,freq=1.0), product of:
              0.2174183 = queryWeight, product of:
                1.7020822 = boost
                7.7124834 = idf(docFreq=53, maxDocs=44421)
                0.016562326 = queryNorm
              0.4820302 = fieldWeight in 4830, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7124834 = idf(docFreq=53, maxDocs=44421)
                0.0625 = fieldNorm(doc=4830)
          0.22370279 = weight(abstract_txt:entity in 4830) [ClassicSimilarity], result of:
            0.22370279 = score(doc=4830,freq=7.0), product of:
              0.21568893 = queryWeight, product of:
                2.0763092 = boost
                6.272122 = idf(docFreq=227, maxDocs=44421)
                0.016562326 = queryNorm
              1.0371547 = fieldWeight in 4830, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                6.272122 = idf(docFreq=227, maxDocs=44421)
                0.0625 = fieldNorm(doc=4830)
          0.108276494 = weight(abstract_txt:named in 4830) [ClassicSimilarity], result of:
            0.108276494 = score(doc=4830,freq=1.0), product of:
              0.25435233 = queryWeight, product of:
                2.2547374 = boost
                6.8111186 = idf(docFreq=132, maxDocs=44421)
                0.016562326 = queryNorm
              0.4256949 = fieldWeight in 4830, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8111186 = idf(docFreq=132, maxDocs=44421)
                0.0625 = fieldNorm(doc=4830)
          0.15185027 = weight(abstract_txt:mentions in 4830) [ClassicSimilarity], result of:
            0.15185027 = score(doc=4830,freq=1.0), product of:
              0.31868136 = queryWeight, product of:
                2.5238087 = boost
                7.62393 = idf(docFreq=58, maxDocs=44421)
                0.016562326 = queryNorm
              0.47649562 = fieldWeight in 4830, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.62393 = idf(docFreq=58, maxDocs=44421)
                0.0625 = fieldNorm(doc=4830)
        0.2 = coord(5/25)
  2. Ebrahimi, M.; ShafieiBavani, E.; Wong, R.; Chen, F.: Twitter user geolocation by filtering of highly mentioned users (2018) 0.12
    0.11510849 = sum of:
      0.11510849 = product of:
        0.57554245 = sum of:
          0.057488274 = weight(abstract_txt:local in 286) [ClassicSimilarity], result of:
            0.057488274 = score(doc=286,freq=2.0), product of:
              0.09965185 = queryWeight, product of:
                1.1523255 = boost
                5.221423 = idf(docFreq=651, maxDocs=44421)
                0.016562326 = queryNorm
              0.5768912 = fieldWeight in 286, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.221423 = idf(docFreq=651, maxDocs=44421)
                0.078125 = fieldNorm(doc=286)
          0.036999885 = weight(abstract_txt:user in 286) [ClassicSimilarity], result of:
            0.036999885 = score(doc=286,freq=3.0), product of:
              0.07428471 = queryWeight, product of:
                1.2185065 = boost
                3.6808684 = idf(docFreq=3042, maxDocs=44421)
                0.016562326 = queryNorm
              0.4980821 = fieldWeight in 286, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.6808684 = idf(docFreq=3042, maxDocs=44421)
                0.078125 = fieldNorm(doc=286)
          0.108171776 = weight(abstract_txt:label in 286) [ClassicSimilarity], result of:
            0.108171776 = score(doc=286,freq=1.0), product of:
              0.1913603 = queryWeight, product of:
                1.5968289 = boost
                7.2355595 = idf(docFreq=86, maxDocs=44421)
                0.016562326 = queryNorm
              0.56527805 = fieldWeight in 286, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2355595 = idf(docFreq=86, maxDocs=44421)
                0.078125 = fieldNorm(doc=286)
          0.12178037 = weight(abstract_txt:posts in 286) [ClassicSimilarity], result of:
            0.12178037 = score(doc=286,freq=1.0), product of:
              0.20709075 = queryWeight, product of:
                1.6611652 = boost
                7.5270805 = idf(docFreq=64, maxDocs=44421)
                0.016562326 = queryNorm
              0.58805317 = fieldWeight in 286, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5270805 = idf(docFreq=64, maxDocs=44421)
                0.078125 = fieldNorm(doc=286)
          0.25110218 = weight(abstract_txt:propagation in 286) [ClassicSimilarity], result of:
            0.25110218 = score(doc=286,freq=1.0), product of:
              0.38403717 = queryWeight, product of:
                2.770542 = boost
                8.369263 = idf(docFreq=27, maxDocs=44421)
                0.016562326 = queryNorm
              0.65384865 = fieldWeight in 286, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.369263 = idf(docFreq=27, maxDocs=44421)
                0.078125 = fieldNorm(doc=286)
        0.2 = coord(5/25)
  3. Berg, A.; Nelimarkka, M.: Do you see what I see? : measuring the semantic differences in image-recognition services' outputs (2023) 0.11
    0.10602429 = sum of:
      0.10602429 = product of:
        0.53012145 = sum of:
          0.0418371 = weight(abstract_txt:less in 2072) [ClassicSimilarity], result of:
            0.0418371 = score(doc=2072,freq=1.0), product of:
              0.101582035 = queryWeight, product of:
                1.1634318 = boost
                5.271748 = idf(docFreq=619, maxDocs=44421)
                0.016562326 = queryNorm
              0.4118553 = fieldWeight in 2072, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.271748 = idf(docFreq=619, maxDocs=44421)
                0.078125 = fieldNorm(doc=2072)
          0.021361895 = weight(abstract_txt:user in 2072) [ClassicSimilarity], result of:
            0.021361895 = score(doc=2072,freq=1.0), product of:
              0.07428471 = queryWeight, product of:
                1.2185065 = boost
                3.6808684 = idf(docFreq=3042, maxDocs=44421)
                0.016562326 = queryNorm
              0.28756785 = fieldWeight in 2072, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6808684 = idf(docFreq=3042, maxDocs=44421)
                0.078125 = fieldNorm(doc=2072)
          0.108171776 = weight(abstract_txt:label in 2072) [ClassicSimilarity], result of:
            0.108171776 = score(doc=2072,freq=1.0), product of:
              0.1913603 = queryWeight, product of:
                1.5968289 = boost
                7.2355595 = idf(docFreq=86, maxDocs=44421)
                0.016562326 = queryNorm
              0.56527805 = fieldWeight in 2072, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2355595 = idf(docFreq=86, maxDocs=44421)
                0.078125 = fieldNorm(doc=2072)
          0.18917148 = weight(abstract_txt:labels in 2072) [ClassicSimilarity], result of:
            0.18917148 = score(doc=2072,freq=3.0), product of:
              0.19259243 = queryWeight, product of:
                1.6019615 = boost
                7.2588162 = idf(docFreq=84, maxDocs=44421)
                0.016562326 = queryNorm
              0.9822374 = fieldWeight in 2072, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.2588162 = idf(docFreq=84, maxDocs=44421)
                0.078125 = fieldNorm(doc=2072)
          0.16957918 = weight(abstract_txt:recognition in 2072) [ClassicSimilarity], result of:
            0.16957918 = score(doc=2072,freq=3.0), product of:
              0.20496555 = queryWeight, product of:
                2.0240374 = boost
                6.114219 = idf(docFreq=266, maxDocs=44421)
                0.016562326 = queryNorm
              0.82735455 = fieldWeight in 2072, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.114219 = idf(docFreq=266, maxDocs=44421)
                0.078125 = fieldNorm(doc=2072)
        0.2 = coord(5/25)
  4. Pereira, D.A.; Ribeiro-Neto, B.; Ziviani, N.; Laender, A.H.F.; Gonçalves, M.A.: ¬A generic Web-based entity resolution framework (2011) 0.10
    0.101872124 = sum of:
      0.101872124 = product of:
        0.5093606 = sum of:
          0.042194117 = weight(abstract_txt:same in 450) [ClassicSimilarity], result of:
            0.042194117 = score(doc=450,freq=3.0), product of:
              0.08219462 = queryWeight, product of:
                1.046536 = boost
                4.7420692 = idf(docFreq=1052, maxDocs=44421)
                0.016562326 = queryNorm
              0.51334405 = fieldWeight in 450, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.7420692 = idf(docFreq=1052, maxDocs=44421)
                0.0625 = fieldNorm(doc=450)
          0.05030097 = weight(abstract_txt:share in 450) [ClassicSimilarity], result of:
            0.05030097 = score(doc=450,freq=1.0), product of:
              0.13328055 = queryWeight, product of:
                1.3326492 = boost
                6.038507 = idf(docFreq=287, maxDocs=44421)
                0.016562326 = queryNorm
              0.3774067 = fieldWeight in 450, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.038507 = idf(docFreq=287, maxDocs=44421)
                0.0625 = fieldNorm(doc=450)
          0.122382395 = weight(abstract_txt:label in 450) [ClassicSimilarity], result of:
            0.122382395 = score(doc=450,freq=2.0), product of:
              0.1913603 = queryWeight, product of:
                1.5968289 = boost
                7.2355595 = idf(docFreq=86, maxDocs=44421)
                0.016562326 = queryNorm
              0.6395391 = fieldWeight in 450, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.2355595 = idf(docFreq=86, maxDocs=44421)
                0.0625 = fieldNorm(doc=450)
          0.08737457 = weight(abstract_txt:labels in 450) [ClassicSimilarity], result of:
            0.08737457 = score(doc=450,freq=1.0), product of:
              0.19259243 = queryWeight, product of:
                1.6019615 = boost
                7.2588162 = idf(docFreq=84, maxDocs=44421)
                0.016562326 = queryNorm
              0.45367602 = fieldWeight in 450, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2588162 = idf(docFreq=84, maxDocs=44421)
                0.0625 = fieldNorm(doc=450)
          0.20710854 = weight(abstract_txt:entity in 450) [ClassicSimilarity], result of:
            0.20710854 = score(doc=450,freq=6.0), product of:
              0.21568893 = queryWeight, product of:
                2.0763092 = boost
                6.272122 = idf(docFreq=227, maxDocs=44421)
                0.016562326 = queryNorm
              0.96021867 = fieldWeight in 450, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.272122 = idf(docFreq=227, maxDocs=44421)
                0.0625 = fieldNorm(doc=450)
        0.2 = coord(5/25)
  5. Billal, B.; Fonseca, A.; Sadat, F.; Lounis, H.: Semi-supervised learning and social media text analysis towards multi-labeling categorization (2017) 0.10
    0.09679284 = sum of:
      0.09679284 = product of:
        0.4839642 = sum of:
          0.092794426 = weight(abstract_txt:hashtag in 95) [ClassicSimilarity], result of:
            0.092794426 = score(doc=95,freq=1.0), product of:
              0.1739329 = queryWeight, product of:
                1.0764859 = boost
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.016562326 = queryNorm
              0.53350705 = fieldWeight in 95, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.0546875 = fieldNorm(doc=95)
          0.1514405 = weight(abstract_txt:label in 95) [ClassicSimilarity], result of:
            0.1514405 = score(doc=95,freq=4.0), product of:
              0.1913603 = queryWeight, product of:
                1.5968289 = boost
                7.2355595 = idf(docFreq=86, maxDocs=44421)
                0.016562326 = queryNorm
              0.79138935 = fieldWeight in 95, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.2355595 = idf(docFreq=86, maxDocs=44421)
                0.0546875 = fieldNorm(doc=95)
          0.07645275 = weight(abstract_txt:labels in 95) [ClassicSimilarity], result of:
            0.07645275 = score(doc=95,freq=1.0), product of:
              0.19259243 = queryWeight, product of:
                1.6019615 = boost
                7.2588162 = idf(docFreq=84, maxDocs=44421)
                0.016562326 = queryNorm
              0.39696652 = fieldWeight in 95, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2588162 = idf(docFreq=84, maxDocs=44421)
                0.0546875 = fieldNorm(doc=95)
          0.068534605 = weight(abstract_txt:recognition in 95) [ClassicSimilarity], result of:
            0.068534605 = score(doc=95,freq=1.0), product of:
              0.20496555 = queryWeight, product of:
                2.0240374 = boost
                6.114219 = idf(docFreq=266, maxDocs=44421)
                0.016562326 = queryNorm
              0.33437136 = fieldWeight in 95, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.114219 = idf(docFreq=266, maxDocs=44421)
                0.0546875 = fieldNorm(doc=95)
          0.09474193 = weight(abstract_txt:named in 95) [ClassicSimilarity], result of:
            0.09474193 = score(doc=95,freq=1.0), product of:
              0.25435233 = queryWeight, product of:
                2.2547374 = boost
                6.8111186 = idf(docFreq=132, maxDocs=44421)
                0.016562326 = queryNorm
              0.37248304 = fieldWeight in 95, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8111186 = idf(docFreq=132, maxDocs=44421)
                0.0546875 = fieldNorm(doc=95)
        0.2 = coord(5/25)