Document (#40831)

Author
Gao, N.
Dredze, M.
Oard, D.W.
Title
Person entity linking in email with NIL detection
Source
Journal of the Association for Information Science and Technology. 68(2017) no.10, S.2412-2424
Year
2017
Abstract
For each specific mention of an entity found in a text, the goal of entity linking is to determine whether the referenced entity is present in an existing knowledge base, and if so to determine which KB entity is the correct referent. Entity linking has been well explored for dissemination-oriented sources such as news stories, blogs, and microblog posts, but the limited work to date on "conversational" sources such as email or text chat has not yet attempted to determine when the referent entity is not in the knowledge base (a task known as "NIL detection"). This article presents a supervised machine learning system for linking named mentions of people in email messages to a collection-specific knowledge base, and that is also capable of NIL detection. This system learns from manually annotated training examples to leverage a rich set of features. The entity linking accuracy for entities present in the knowledge base is substantially and significantly better than the best previously reported results on the Enron email collection, comparable accuracy is reported for the challenging NIL detection task, and these results are for the first time replicated on a second email collection from a different source with comparable results.
Content
Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23888/full.
Theme
Internet

Similar documents (author)

  1. Oard, D.W.: Serving users in many languages : cross-language information retrieval for digital libraries (1997) 5.51
    5.506935 = sum of:
      5.506935 = weight(author_txt:oard in 2261) [ClassicSimilarity], result of:
        5.506935 = fieldWeight in 2261, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.811096 = idf(docFreq=17, maxDocs=44421)
          0.625 = fieldNorm(doc=2261)
    
  2. Oard, D.W.: Multilingual information access (2009) 5.51
    5.506935 = sum of:
      5.506935 = weight(author_txt:oard in 837) [ClassicSimilarity], result of:
        5.506935 = fieldWeight in 837, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.811096 = idf(docFreq=17, maxDocs=44421)
          0.625 = fieldNorm(doc=837)
    
  3. Oard, D.W.: Alternative approaches for cross-language text retrieval (1997) 5.51
    5.506935 = sum of:
      5.506935 = weight(author_txt:oard in 2164) [ClassicSimilarity], result of:
        5.506935 = fieldWeight in 2164, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.811096 = idf(docFreq=17, maxDocs=44421)
          0.625 = fieldNorm(doc=2164)
    
  4. Wang, J.; Oard, D.W.: Matching meaning for cross-language information retrieval (2012) 4.41
    4.405548 = sum of:
      4.405548 = weight(author_txt:oard in 7429) [ClassicSimilarity], result of:
        4.405548 = fieldWeight in 7429, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.811096 = idf(docFreq=17, maxDocs=44421)
          0.5 = fieldNorm(doc=7429)
    
  5. Oard, D.W.; Resnik, P.: Support for interactive document selection in cross-language information retrieval (1999) 4.41
    4.405548 = sum of:
      4.405548 = weight(author_txt:oard in 6006) [ClassicSimilarity], result of:
        4.405548 = fieldWeight in 6006, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.811096 = idf(docFreq=17, maxDocs=44421)
          0.5 = fieldNorm(doc=6006)
    

Similar documents (content)

  1. Zhao, G.; Wu, J.; Wang, D.; Li, T.: Entity disambiguation to Wikipedia using collective ranking (2016) 0.24
    0.24313584 = sum of:
      0.24313584 = product of:
        0.8683423 = sum of:
          0.057756484 = weight(abstract_txt:mention in 4266) [ClassicSimilarity], result of:
            0.057756484 = score(doc=4266,freq=1.0), product of:
              0.09585538 = queryWeight, product of:
                7.7124834 = idf(docFreq=53, maxDocs=44421)
                0.012428601 = queryNorm
              0.60253775 = fieldWeight in 4266, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7124834 = idf(docFreq=53, maxDocs=44421)
                0.078125 = fieldNorm(doc=4266)
          0.028776446 = weight(abstract_txt:text in 4266) [ClassicSimilarity], result of:
            0.028776446 = score(doc=4266,freq=3.0), product of:
              0.05262721 = queryWeight, product of:
                1.0478809 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.012428601 = queryNorm
              0.5467979 = fieldWeight in 4266, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.078125 = fieldNorm(doc=4266)
          0.029728116 = weight(abstract_txt:task in 4266) [ClassicSimilarity], result of:
            0.029728116 = score(doc=4266,freq=1.0), product of:
              0.077565916 = queryWeight, product of:
                1.2721614 = boost
                4.9057617 = idf(docFreq=893, maxDocs=44421)
                0.012428601 = queryNorm
              0.38326263 = fieldWeight in 4266, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9057617 = idf(docFreq=893, maxDocs=44421)
                0.078125 = fieldNorm(doc=4266)
          0.022461487 = weight(abstract_txt:knowledge in 4266) [ClassicSimilarity], result of:
            0.022461487 = score(doc=4266,freq=1.0), product of:
              0.081070274 = queryWeight, product of:
                1.8393 = boost
                3.5463927 = idf(docFreq=3480, maxDocs=44421)
                0.012428601 = queryNorm
              0.27706194 = fieldWeight in 4266, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5463927 = idf(docFreq=3480, maxDocs=44421)
                0.078125 = fieldNorm(doc=4266)
          0.090013355 = weight(abstract_txt:base in 4266) [ClassicSimilarity], result of:
            0.090013355 = score(doc=4266,freq=1.0), product of:
              0.20453797 = queryWeight, product of:
                2.921519 = boost
                5.633042 = idf(docFreq=431, maxDocs=44421)
                0.012428601 = queryNorm
              0.4400814 = fieldWeight in 4266, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.633042 = idf(docFreq=431, maxDocs=44421)
                0.078125 = fieldNorm(doc=4266)
          0.14257763 = weight(abstract_txt:linking in 4266) [ClassicSimilarity], result of:
            0.14257763 = score(doc=4266,freq=1.0), product of:
              0.29939204 = queryWeight, product of:
                3.9518175 = boost
                6.0956655 = idf(docFreq=271, maxDocs=44421)
                0.012428601 = queryNorm
              0.47622386 = fieldWeight in 4266, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0956655 = idf(docFreq=271, maxDocs=44421)
                0.078125 = fieldNorm(doc=4266)
          0.49702874 = weight(abstract_txt:entity in 4266) [ClassicSimilarity], result of:
            0.49702874 = score(doc=4266,freq=4.0), product of:
              0.50716233 = queryWeight, product of:
                6.505943 = boost
                6.272122 = idf(docFreq=227, maxDocs=44421)
                0.012428601 = queryNorm
              0.98001903 = fieldWeight in 4266, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.272122 = idf(docFreq=227, maxDocs=44421)
                0.078125 = fieldNorm(doc=4266)
        0.28 = coord(7/25)
    
  2. Lawrie, D.; Mayfield, J.; McNamee, P.; Oard, P.W.: Cross-language person-entity linking from 20 languages (2015) 0.24
    0.24139288 = sum of:
      0.24139288 = product of:
        1.0058037 = sum of:
          0.015899003 = weight(abstract_txt:results in 2848) [ClassicSimilarity], result of:
            0.015899003 = score(doc=2848,freq=1.0), product of:
              0.058501855 = queryWeight, product of:
                1.3531228 = boost
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.012428601 = queryNorm
              0.2717692 = fieldWeight in 2848, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.078125 = fieldNorm(doc=2848)
          0.04959087 = weight(abstract_txt:reported in 2848) [ClassicSimilarity], result of:
            0.04959087 = score(doc=2848,freq=1.0), product of:
              0.109100595 = queryWeight, product of:
                1.5087606 = boost
                5.8181453 = idf(docFreq=358, maxDocs=44421)
                0.012428601 = queryNorm
              0.4545426 = fieldWeight in 2848, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8181453 = idf(docFreq=358, maxDocs=44421)
                0.078125 = fieldNorm(doc=2848)
          0.07520838 = weight(abstract_txt:accuracy in 2848) [ClassicSimilarity], result of:
            0.07520838 = score(doc=2848,freq=2.0), product of:
              0.114303656 = queryWeight, product of:
                1.5443183 = boost
                5.9552646 = idf(docFreq=312, maxDocs=44421)
                0.012428601 = queryNorm
              0.65797 = fieldWeight in 2848, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.9552646 = idf(docFreq=312, maxDocs=44421)
                0.078125 = fieldNorm(doc=2848)
          0.08292143 = weight(abstract_txt:comparable in 2848) [ClassicSimilarity], result of:
            0.08292143 = score(doc=2848,freq=1.0), product of:
              0.1536988 = queryWeight, product of:
                1.790779 = boost
                6.905677 = idf(docFreq=120, maxDocs=44421)
                0.012428601 = queryNorm
              0.539506 = fieldWeight in 2848, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.905677 = idf(docFreq=120, maxDocs=44421)
                0.078125 = fieldNorm(doc=2848)
          0.28515527 = weight(abstract_txt:linking in 2848) [ClassicSimilarity], result of:
            0.28515527 = score(doc=2848,freq=4.0), product of:
              0.29939204 = queryWeight, product of:
                3.9518175 = boost
                6.0956655 = idf(docFreq=271, maxDocs=44421)
                0.012428601 = queryNorm
              0.9524477 = fieldWeight in 2848, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.0956655 = idf(docFreq=271, maxDocs=44421)
                0.078125 = fieldNorm(doc=2848)
          0.49702874 = weight(abstract_txt:entity in 2848) [ClassicSimilarity], result of:
            0.49702874 = score(doc=2848,freq=4.0), product of:
              0.50716233 = queryWeight, product of:
                6.505943 = boost
                6.272122 = idf(docFreq=227, maxDocs=44421)
                0.012428601 = queryNorm
              0.98001903 = fieldWeight in 2848, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.272122 = idf(docFreq=227, maxDocs=44421)
                0.078125 = fieldNorm(doc=2848)
        0.24 = coord(6/25)
    
  3. Lee, D.J.L.; Stvilia, B.: Developing a data identifier taxonomy (2014) 0.14
    0.13669994 = sum of:
      0.13669994 = product of:
        0.6834997 = sum of:
          0.079219446 = weight(abstract_txt:referenced in 2976) [ClassicSimilarity], result of:
            0.079219446 = score(doc=2976,freq=1.0), product of:
              0.10478915 = queryWeight, product of:
                1.0455623 = boost
                8.063882 = idf(docFreq=37, maxDocs=44421)
                0.012428601 = queryNorm
              0.75598896 = fieldWeight in 2976, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.063882 = idf(docFreq=37, maxDocs=44421)
                0.09375 = fieldNorm(doc=2976)
          0.026953785 = weight(abstract_txt:knowledge in 2976) [ClassicSimilarity], result of:
            0.026953785 = score(doc=2976,freq=1.0), product of:
              0.081070274 = queryWeight, product of:
                1.8393 = boost
                3.5463927 = idf(docFreq=3480, maxDocs=44421)
                0.012428601 = queryNorm
              0.33247432 = fieldWeight in 2976, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5463927 = idf(docFreq=3480, maxDocs=44421)
                0.09375 = fieldNorm(doc=2976)
          0.10801603 = weight(abstract_txt:base in 2976) [ClassicSimilarity], result of:
            0.10801603 = score(doc=2976,freq=1.0), product of:
              0.20453797 = queryWeight, product of:
                2.921519 = boost
                5.633042 = idf(docFreq=431, maxDocs=44421)
                0.012428601 = queryNorm
              0.5280977 = fieldWeight in 2976, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.633042 = idf(docFreq=431, maxDocs=44421)
                0.09375 = fieldNorm(doc=2976)
          0.17109317 = weight(abstract_txt:linking in 2976) [ClassicSimilarity], result of:
            0.17109317 = score(doc=2976,freq=1.0), product of:
              0.29939204 = queryWeight, product of:
                3.9518175 = boost
                6.0956655 = idf(docFreq=271, maxDocs=44421)
                0.012428601 = queryNorm
              0.57146865 = fieldWeight in 2976, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0956655 = idf(docFreq=271, maxDocs=44421)
                0.09375 = fieldNorm(doc=2976)
          0.29821727 = weight(abstract_txt:entity in 2976) [ClassicSimilarity], result of:
            0.29821727 = score(doc=2976,freq=1.0), product of:
              0.50716233 = queryWeight, product of:
                6.505943 = boost
                6.272122 = idf(docFreq=227, maxDocs=44421)
                0.012428601 = queryNorm
              0.58801144 = fieldWeight in 2976, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.272122 = idf(docFreq=227, maxDocs=44421)
                0.09375 = fieldNorm(doc=2976)
        0.2 = coord(5/25)
    
  4. Tang, X.; Chen, L.; Cui, J.; Wei, B.: Knowledge representation learning with entity descriptions, hierarchical types, and textual relations (2019) 0.12
    0.12153574 = sum of:
      0.12153574 = product of:
        0.6076787 = sum of:
          0.020099869 = weight(abstract_txt:specific in 101) [ClassicSimilarity], result of:
            0.020099869 = score(doc=101,freq=1.0), product of:
              0.059752323 = queryWeight, product of:
                1.1165653 = boost
                4.305746 = idf(docFreq=1628, maxDocs=44421)
                0.012428601 = queryNorm
              0.3363864 = fieldWeight in 101, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.305746 = idf(docFreq=1628, maxDocs=44421)
                0.078125 = fieldNorm(doc=101)
          0.029728116 = weight(abstract_txt:task in 101) [ClassicSimilarity], result of:
            0.029728116 = score(doc=101,freq=1.0), product of:
              0.077565916 = queryWeight, product of:
                1.2721614 = boost
                4.9057617 = idf(docFreq=893, maxDocs=44421)
                0.012428601 = queryNorm
              0.38326263 = fieldWeight in 101, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9057617 = idf(docFreq=893, maxDocs=44421)
                0.078125 = fieldNorm(doc=101)
          0.015899003 = weight(abstract_txt:results in 101) [ClassicSimilarity], result of:
            0.015899003 = score(doc=101,freq=1.0), product of:
              0.058501855 = queryWeight, product of:
                1.3531228 = boost
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.012428601 = queryNorm
              0.2717692 = fieldWeight in 101, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.078125 = fieldNorm(doc=101)
          0.044922974 = weight(abstract_txt:knowledge in 101) [ClassicSimilarity], result of:
            0.044922974 = score(doc=101,freq=4.0), product of:
              0.081070274 = queryWeight, product of:
                1.8393 = boost
                3.5463927 = idf(docFreq=3480, maxDocs=44421)
                0.012428601 = queryNorm
              0.5541239 = fieldWeight in 101, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.5463927 = idf(docFreq=3480, maxDocs=44421)
                0.078125 = fieldNorm(doc=101)
          0.49702874 = weight(abstract_txt:entity in 101) [ClassicSimilarity], result of:
            0.49702874 = score(doc=101,freq=4.0), product of:
              0.50716233 = queryWeight, product of:
                6.505943 = boost
                6.272122 = idf(docFreq=227, maxDocs=44421)
                0.012428601 = queryNorm
              0.98001903 = fieldWeight in 101, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.272122 = idf(docFreq=227, maxDocs=44421)
                0.078125 = fieldNorm(doc=101)
        0.2 = coord(5/25)
    
  5. Ku, C.-H.; Leroy, G.: ¬A crime reports analysis system to identify related crimes (2011) 0.12
    0.12006142 = sum of:
      0.12006142 = product of:
        0.6003071 = sum of:
          0.013291271 = weight(abstract_txt:text in 629) [ClassicSimilarity], result of:
            0.013291271 = score(doc=629,freq=1.0), product of:
              0.05262721 = queryWeight, product of:
                1.0478809 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.012428601 = queryNorm
              0.25255513 = fieldWeight in 629, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0625 = fieldNorm(doc=629)
          0.016079895 = weight(abstract_txt:specific in 629) [ClassicSimilarity], result of:
            0.016079895 = score(doc=629,freq=1.0), product of:
              0.059752323 = queryWeight, product of:
                1.1165653 = boost
                4.305746 = idf(docFreq=1628, maxDocs=44421)
                0.012428601 = queryNorm
              0.26910913 = fieldWeight in 629, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.305746 = idf(docFreq=1628, maxDocs=44421)
                0.0625 = fieldNorm(doc=629)
          0.023782494 = weight(abstract_txt:task in 629) [ClassicSimilarity], result of:
            0.023782494 = score(doc=629,freq=1.0), product of:
              0.077565916 = queryWeight, product of:
                1.2721614 = boost
                4.9057617 = idf(docFreq=893, maxDocs=44421)
                0.012428601 = queryNorm
              0.3066101 = fieldWeight in 629, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9057617 = idf(docFreq=893, maxDocs=44421)
                0.0625 = fieldNorm(doc=629)
          0.0601667 = weight(abstract_txt:accuracy in 629) [ClassicSimilarity], result of:
            0.0601667 = score(doc=629,freq=2.0), product of:
              0.114303656 = queryWeight, product of:
                1.5443183 = boost
                5.9552646 = idf(docFreq=312, maxDocs=44421)
                0.012428601 = queryNorm
              0.526376 = fieldWeight in 629, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.9552646 = idf(docFreq=312, maxDocs=44421)
                0.0625 = fieldNorm(doc=629)
          0.48698673 = weight(abstract_txt:entity in 629) [ClassicSimilarity], result of:
            0.48698673 = score(doc=629,freq=6.0), product of:
              0.50716233 = queryWeight, product of:
                6.505943 = boost
                6.272122 = idf(docFreq=227, maxDocs=44421)
                0.012428601 = queryNorm
              0.96021867 = fieldWeight in 629, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.272122 = idf(docFreq=227, maxDocs=44421)
                0.0625 = fieldNorm(doc=629)
        0.2 = coord(5/25)