Document (#43943)

Author
Das, S.
Paik, J.H.
Title
Gender tagging of named entities using retrieval-assisted multi-context aggregation : an unsupervised approach
Source
Journal of the Association for Information Science and Technology. 74(2023) no.4, S.461-475
Year
2023
Abstract
Inferring the gender of named entities present in a text has several practical applications in information sciences. Existing approaches toward name gender identification rely exclusively on using the gender distributions from labeled data. In the absence of such labeled data, these methods fail. In this article, we propose a two-stage model that is able to infer the gender of names present in text without requiring explicit name-gender labels. We use coreference resolution as the backbone for our proposed model. To aid coreference resolution where the existing contextual information does not suffice, we use a retrieval-assisted context aggregation framework. We demonstrate that state-of-the-art name gender inference is possible without supervision. Our proposed method matches or outperforms several supervised approaches and commercially used methods on five English language datasets from different domains.
Content
Vgl.: https://asistdl.onlinelibrary.wiley.com/doi/10.1002/asi.24735.
Theme
Formalerschließung

Similar documents (content)

  1. Herdagdelen, A.; Baroni, M.: Stereotypical gender actions can be extracted from web text (2011) 0.15
    0.1499743 = sum of:
      0.1499743 = product of:
        0.7498715 = sum of:
          0.022842279 = weight(abstract_txt:text in 752) [ClassicSimilarity], result of:
            0.022842279 = score(doc=752,freq=2.0), product of:
              0.06395408 = queryWeight, product of:
                1.0321187 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.015334246 = queryNorm
              0.3571669 = fieldWeight in 752, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0625 = fieldNorm(doc=752)
          0.017413631 = weight(abstract_txt:methods in 752) [ClassicSimilarity], result of:
            0.017413631 = score(doc=752,freq=1.0), product of:
              0.06724265 = queryWeight, product of:
                1.0583223 = boost
                4.1434727 = idf(docFreq=1915, maxDocs=44421)
                0.015334246 = queryNorm
              0.25896704 = fieldWeight in 752, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1434727 = idf(docFreq=1915, maxDocs=44421)
                0.0625 = fieldNorm(doc=752)
          0.020091532 = weight(abstract_txt:present in 752) [ClassicSimilarity], result of:
            0.020091532 = score(doc=752,freq=1.0), product of:
              0.073970854 = queryWeight, product of:
                1.1100073 = boost
                4.3458266 = idf(docFreq=1564, maxDocs=44421)
                0.015334246 = queryNorm
              0.27161416 = fieldWeight in 752, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3458266 = idf(docFreq=1564, maxDocs=44421)
                0.0625 = fieldNorm(doc=752)
          0.06964664 = weight(abstract_txt:name in 752) [ClassicSimilarity], result of:
            0.06964664 = score(doc=752,freq=1.0), product of:
              0.19394673 = queryWeight, product of:
                2.2013159 = boost
                5.7456303 = idf(docFreq=385, maxDocs=44421)
                0.015334246 = queryNorm
              0.3591019 = fieldWeight in 752, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7456303 = idf(docFreq=385, maxDocs=44421)
                0.0625 = fieldNorm(doc=752)
          0.6198774 = weight(abstract_txt:gender in 752) [ClassicSimilarity], result of:
            0.6198774 = score(doc=752,freq=5.0), product of:
              0.6460833 = queryWeight, product of:
                6.13725 = boost
                6.8651857 = idf(docFreq=125, maxDocs=44421)
                0.015334246 = queryNorm
              0.95943886 = fieldWeight in 752, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.8651857 = idf(docFreq=125, maxDocs=44421)
                0.0625 = fieldNorm(doc=752)
        0.2 = coord(5/25)
    
  2. Billal, B.; Fonseca, A.; Sadat, F.; Lounis, H.: Semi-supervised learning and social media text analysis towards multi-labeling categorization (2017) 0.12
    0.12337041 = sum of:
      0.12337041 = product of:
        0.34269556 = sum of:
          0.01353205 = weight(abstract_txt:model in 95) [ClassicSimilarity], result of:
            0.01353205 = score(doc=95,freq=1.0), product of:
              0.06212823 = queryWeight, product of:
                1.0172788 = boost
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.015334246 = queryNorm
              0.2178084 = fieldWeight in 95, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.0546875 = fieldNorm(doc=95)
          0.028265879 = weight(abstract_txt:text in 95) [ClassicSimilarity], result of:
            0.028265879 = score(doc=95,freq=4.0), product of:
              0.06395408 = queryWeight, product of:
                1.0321187 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.015334246 = queryNorm
              0.44197148 = fieldWeight in 95, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0546875 = fieldNorm(doc=95)
          0.021548267 = weight(abstract_txt:methods in 95) [ClassicSimilarity], result of:
            0.021548267 = score(doc=95,freq=2.0), product of:
              0.06724265 = queryWeight, product of:
                1.0583223 = boost
                4.1434727 = idf(docFreq=1915, maxDocs=44421)
                0.015334246 = queryNorm
              0.32045534 = fieldWeight in 95, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1434727 = idf(docFreq=1915, maxDocs=44421)
                0.0546875 = fieldNorm(doc=95)
          0.034909625 = weight(abstract_txt:several in 95) [ClassicSimilarity], result of:
            0.034909625 = score(doc=95,freq=3.0), product of:
              0.081028216 = queryWeight, product of:
                1.1617526 = boost
                4.548416 = idf(docFreq=1277, maxDocs=44421)
                0.015334246 = queryNorm
              0.43083292 = fieldWeight in 95, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.548416 = idf(docFreq=1277, maxDocs=44421)
                0.0546875 = fieldNorm(doc=95)
          0.020823421 = weight(abstract_txt:approaches in 95) [ClassicSimilarity], result of:
            0.020823421 = score(doc=95,freq=1.0), product of:
              0.08280972 = queryWeight, product of:
                1.1744545 = boost
                4.5981455 = idf(docFreq=1215, maxDocs=44421)
                0.015334246 = queryNorm
              0.2514611 = fieldWeight in 95, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5981455 = idf(docFreq=1215, maxDocs=44421)
                0.0546875 = fieldNorm(doc=95)
          0.020958453 = weight(abstract_txt:proposed in 95) [ClassicSimilarity], result of:
            0.020958453 = score(doc=95,freq=1.0), product of:
              0.08316732 = queryWeight, product of:
                1.1769875 = boost
                4.608063 = idf(docFreq=1203, maxDocs=44421)
                0.015334246 = queryNorm
              0.25200346 = fieldWeight in 95, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.608063 = idf(docFreq=1203, maxDocs=44421)
                0.0546875 = fieldNorm(doc=95)
          0.04249077 = weight(abstract_txt:entities in 95) [ClassicSimilarity], result of:
            0.04249077 = score(doc=95,freq=1.0), product of:
              0.1332221 = queryWeight, product of:
                1.489648 = boost
                5.8321705 = idf(docFreq=353, maxDocs=44421)
                0.015334246 = queryNorm
              0.31894684 = fieldWeight in 95, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8321705 = idf(docFreq=353, maxDocs=44421)
                0.0546875 = fieldNorm(doc=95)
          0.06767983 = weight(abstract_txt:named in 95) [ClassicSimilarity], result of:
            0.06767983 = score(doc=95,freq=1.0), product of:
              0.18169908 = queryWeight, product of:
                1.7396901 = boost
                6.8111186 = idf(docFreq=132, maxDocs=44421)
                0.015334246 = queryNorm
              0.37248304 = fieldWeight in 95, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8111186 = idf(docFreq=132, maxDocs=44421)
                0.0546875 = fieldNorm(doc=95)
          0.092487276 = weight(abstract_txt:labeled in 95) [ClassicSimilarity], result of:
            0.092487276 = score(doc=95,freq=1.0), product of:
              0.2237525 = queryWeight, product of:
                1.930543 = boost
                7.558333 = idf(docFreq=62, maxDocs=44421)
                0.015334246 = queryNorm
              0.41334632 = fieldWeight in 95, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.558333 = idf(docFreq=62, maxDocs=44421)
                0.0546875 = fieldNorm(doc=95)
        0.36 = coord(9/25)
    
  3. Shaalan, K.; Raza, H.: NERA: Named Entity Recognition for Arabic (2009) 0.11
    0.10852353 = sum of:
      0.10852353 = product of:
        0.4521814 = sum of:
          0.014132939 = weight(abstract_txt:text in 3953) [ClassicSimilarity], result of:
            0.014132939 = score(doc=3953,freq=1.0), product of:
              0.06395408 = queryWeight, product of:
                1.0321187 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.015334246 = queryNorm
              0.22098574 = fieldWeight in 3953, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3953)
          0.01758009 = weight(abstract_txt:present in 3953) [ClassicSimilarity], result of:
            0.01758009 = score(doc=3953,freq=1.0), product of:
              0.073970854 = queryWeight, product of:
                1.1100073 = boost
                4.3458266 = idf(docFreq=1564, maxDocs=44421)
                0.015334246 = queryNorm
              0.23766239 = fieldWeight in 3953, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3458266 = idf(docFreq=1564, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3953)
          0.02015508 = weight(abstract_txt:several in 3953) [ClassicSimilarity], result of:
            0.02015508 = score(doc=3953,freq=1.0), product of:
              0.081028216 = queryWeight, product of:
                1.1617526 = boost
                4.548416 = idf(docFreq=1277, maxDocs=44421)
                0.015334246 = queryNorm
              0.24874151 = fieldWeight in 3953, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.548416 = idf(docFreq=1277, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3953)
          0.08498154 = weight(abstract_txt:entities in 3953) [ClassicSimilarity], result of:
            0.08498154 = score(doc=3953,freq=4.0), product of:
              0.1332221 = queryWeight, product of:
                1.489648 = boost
                5.8321705 = idf(docFreq=353, maxDocs=44421)
                0.015334246 = queryNorm
              0.6378937 = fieldWeight in 3953, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.8321705 = idf(docFreq=353, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3953)
          0.17906399 = weight(abstract_txt:named in 3953) [ClassicSimilarity], result of:
            0.17906399 = score(doc=3953,freq=7.0), product of:
              0.18169908 = queryWeight, product of:
                1.7396901 = boost
                6.8111186 = idf(docFreq=132, maxDocs=44421)
                0.015334246 = queryNorm
              0.9854975 = fieldWeight in 3953, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                6.8111186 = idf(docFreq=132, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3953)
          0.1362678 = weight(abstract_txt:name in 3953) [ClassicSimilarity], result of:
            0.1362678 = score(doc=3953,freq=5.0), product of:
              0.19394673 = queryWeight, product of:
                2.2013159 = boost
                5.7456303 = idf(docFreq=385, maxDocs=44421)
                0.015334246 = queryNorm
              0.70260423 = fieldWeight in 3953, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.7456303 = idf(docFreq=385, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3953)
        0.24 = coord(6/25)
    
  4. Phan, M.C.; Sun, A.: Collective named entity recognition in user comments via parameterized label propagation (2020) 0.10
    0.09981225 = sum of:
      0.09981225 = product of:
        0.41588438 = sum of:
          0.021871096 = weight(abstract_txt:model in 815) [ClassicSimilarity], result of:
            0.021871096 = score(doc=815,freq=2.0), product of:
              0.06212823 = queryWeight, product of:
                1.0172788 = boost
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.015334246 = queryNorm
              0.35203153 = fieldWeight in 815, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.0625 = fieldNorm(doc=815)
          0.022842279 = weight(abstract_txt:text in 815) [ClassicSimilarity], result of:
            0.022842279 = score(doc=815,freq=2.0), product of:
              0.06395408 = queryWeight, product of:
                1.0321187 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.015334246 = queryNorm
              0.3571669 = fieldWeight in 815, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0625 = fieldNorm(doc=815)
          0.019915922 = weight(abstract_txt:context in 815) [ClassicSimilarity], result of:
            0.019915922 = score(doc=815,freq=1.0), product of:
              0.0735392 = queryWeight, product of:
                1.1067638 = boost
                4.333128 = idf(docFreq=1584, maxDocs=44421)
                0.015334246 = queryNorm
              0.2708205 = fieldWeight in 815, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.333128 = idf(docFreq=1584, maxDocs=44421)
                0.0625 = fieldNorm(doc=815)
          0.023798196 = weight(abstract_txt:approaches in 815) [ClassicSimilarity], result of:
            0.023798196 = score(doc=815,freq=1.0), product of:
              0.08280972 = queryWeight, product of:
                1.1744545 = boost
                4.5981455 = idf(docFreq=1215, maxDocs=44421)
                0.015334246 = queryNorm
              0.2873841 = fieldWeight in 815, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5981455 = idf(docFreq=1215, maxDocs=44421)
                0.0625 = fieldNorm(doc=815)
          0.109387115 = weight(abstract_txt:named in 815) [ClassicSimilarity], result of:
            0.109387115 = score(doc=815,freq=2.0), product of:
              0.18169908 = queryWeight, product of:
                1.7396901 = boost
                6.8111186 = idf(docFreq=132, maxDocs=44421)
                0.015334246 = queryNorm
              0.6020235 = fieldWeight in 815, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.8111186 = idf(docFreq=132, maxDocs=44421)
                0.0625 = fieldNorm(doc=815)
          0.21806978 = weight(abstract_txt:coreference in 815) [ClassicSimilarity], result of:
            0.21806978 = score(doc=815,freq=1.0), product of:
              0.36261764 = queryWeight, product of:
                2.4576497 = boost
                9.622026 = idf(docFreq=7, maxDocs=44421)
                0.015334246 = queryNorm
              0.60137665 = fieldWeight in 815, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.622026 = idf(docFreq=7, maxDocs=44421)
                0.0625 = fieldNorm(doc=815)
        0.24 = coord(6/25)
    
  5. Pirkola, A.; Jarvelin, K.: ¬The effect of anaphor and ellipsis resolution on proximity searching in a text database (1995) 0.10
    0.098664306 = sum of:
      0.098664306 = product of:
        0.49332154 = sum of:
          0.027975963 = weight(abstract_txt:text in 4156) [ClassicSimilarity], result of:
            0.027975963 = score(doc=4156,freq=3.0), product of:
              0.06395408 = queryWeight, product of:
                1.0321187 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.015334246 = queryNorm
              0.4374383 = fieldWeight in 4156, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0625 = fieldNorm(doc=4156)
          0.017413631 = weight(abstract_txt:methods in 4156) [ClassicSimilarity], result of:
            0.017413631 = score(doc=4156,freq=1.0), product of:
              0.06724265 = queryWeight, product of:
                1.0583223 = boost
                4.1434727 = idf(docFreq=1915, maxDocs=44421)
                0.015334246 = queryNorm
              0.25896704 = fieldWeight in 4156, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1434727 = idf(docFreq=1915, maxDocs=44421)
                0.0625 = fieldNorm(doc=4156)
          0.019915922 = weight(abstract_txt:context in 4156) [ClassicSimilarity], result of:
            0.019915922 = score(doc=4156,freq=1.0), product of:
              0.0735392 = queryWeight, product of:
                1.1067638 = boost
                4.333128 = idf(docFreq=1584, maxDocs=44421)
                0.015334246 = queryNorm
              0.2708205 = fieldWeight in 4156, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.333128 = idf(docFreq=1584, maxDocs=44421)
                0.0625 = fieldNorm(doc=4156)
          0.2574173 = weight(abstract_txt:resolution in 4156) [ClassicSimilarity], result of:
            0.2574173 = score(doc=4156,freq=8.0), product of:
              0.2025103 = queryWeight, product of:
                1.836619 = boost
                7.190608 = idf(docFreq=90, maxDocs=44421)
                0.015334246 = queryNorm
              1.2711319 = fieldWeight in 4156, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                7.190608 = idf(docFreq=90, maxDocs=44421)
                0.0625 = fieldNorm(doc=4156)
          0.17059873 = weight(abstract_txt:name in 4156) [ClassicSimilarity], result of:
            0.17059873 = score(doc=4156,freq=6.0), product of:
              0.19394673 = queryWeight, product of:
                2.2013159 = boost
                5.7456303 = idf(docFreq=385, maxDocs=44421)
                0.015334246 = queryNorm
              0.87961644 = fieldWeight in 4156, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.7456303 = idf(docFreq=385, maxDocs=44421)
                0.0625 = fieldNorm(doc=4156)
        0.2 = coord(5/25)