Document (#43943)

Author
Das, S.
Paik, J.H.
Title
Gender tagging of named entities using retrieval-assisted multi-context aggregation : an unsupervised approach
Source
Journal of the Association for Information Science and Technology. 74(2023) no.4, S.461-475
Year
2023
Abstract
Inferring the gender of named entities present in a text has several practical applications in information sciences. Existing approaches toward name gender identification rely exclusively on using the gender distributions from labeled data. In the absence of such labeled data, these methods fail. In this article, we propose a two-stage model that is able to infer the gender of names present in text without requiring explicit name-gender labels. We use coreference resolution as the backbone for our proposed model. To aid coreference resolution where the existing contextual information does not suffice, we use a retrieval-assisted context aggregation framework. We demonstrate that state-of-the-art name gender inference is possible without supervision. Our proposed method matches or outperforms several supervised approaches and commercially used methods on five English language datasets from different domains.
Content
Vgl.: https://asistdl.onlinelibrary.wiley.com/doi/10.1002/asi.24735.
Theme
Formalerschließung

Similar documents (content)

  1. Herdagdelen, A.; Baroni, M.: Stereotypical gender actions can be extracted from web text (2011) 0.15
    0.15074074 = sum of:
      0.15074074 = product of:
        0.7537037 = sum of:
          0.022843614 = weight(abstract_txt:text in 4752) [ClassicSimilarity], result of:
            0.022843614 = score(doc=4752,freq=2.0), product of:
              0.06391061 = queryWeight, product of:
                1.0334855 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.015292261 = queryNorm
              0.3574307 = fieldWeight in 4752, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=4752)
          0.017417405 = weight(abstract_txt:methods in 4752) [ClassicSimilarity], result of:
            0.017417405 = score(doc=4752,freq=1.0), product of:
              0.067204036 = queryWeight, product of:
                1.0597798 = boost
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.015292261 = queryNorm
              0.259172 = fieldWeight in 4752, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.0625 = fieldNorm(doc=4752)
          0.020091414 = weight(abstract_txt:present in 4752) [ClassicSimilarity], result of:
            0.020091414 = score(doc=4752,freq=1.0), product of:
              0.07391741 = queryWeight, product of:
                1.1114535 = boost
                4.348943 = idf(docFreq=1552, maxDocs=44218)
                0.015292261 = queryNorm
              0.27180895 = fieldWeight in 4752, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.348943 = idf(docFreq=1552, maxDocs=44218)
                0.0625 = fieldNorm(doc=4752)
          0.06951892 = weight(abstract_txt:name in 4752) [ClassicSimilarity], result of:
            0.06951892 = score(doc=4752,freq=1.0), product of:
              0.19357035 = queryWeight, product of:
                2.20284 = boost
                5.746245 = idf(docFreq=383, maxDocs=44218)
                0.015292261 = queryNorm
              0.3591403 = fieldWeight in 4752, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.746245 = idf(docFreq=383, maxDocs=44218)
                0.0625 = fieldNorm(doc=4752)
          0.62383235 = weight(abstract_txt:gender in 4752) [ClassicSimilarity], result of:
            0.62383235 = score(doc=4752,freq=5.0), product of:
              0.64836216 = queryWeight, product of:
                6.1582994 = boost
                6.8847027 = idf(docFreq=122, maxDocs=44218)
                0.015292261 = queryNorm
              0.9621665 = fieldWeight in 4752, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.8847027 = idf(docFreq=122, maxDocs=44218)
                0.0625 = fieldNorm(doc=4752)
        0.2 = coord(5/25)
    
  2. Billal, B.; Fonseca, A.; Sadat, F.; Lounis, H.: Semi-supervised learning and social media text analysis towards multi-labeling categorization (2017) 0.12
    0.123270944 = sum of:
      0.123270944 = product of:
        0.34241927 = sum of:
          0.013538036 = weight(abstract_txt:model in 4095) [ClassicSimilarity], result of:
            0.013538036 = score(doc=4095,freq=1.0), product of:
              0.06210189 = queryWeight, product of:
                1.0187564 = boost
                3.986234 = idf(docFreq=2231, maxDocs=44218)
                0.015292261 = queryNorm
              0.21799716 = fieldWeight in 4095, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.986234 = idf(docFreq=2231, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4095)
          0.028267529 = weight(abstract_txt:text in 4095) [ClassicSimilarity], result of:
            0.028267529 = score(doc=4095,freq=4.0), product of:
              0.06391061 = queryWeight, product of:
                1.0334855 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.015292261 = queryNorm
              0.4422979 = fieldWeight in 4095, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4095)
          0.021552939 = weight(abstract_txt:methods in 4095) [ClassicSimilarity], result of:
            0.021552939 = score(doc=4095,freq=2.0), product of:
              0.067204036 = queryWeight, product of:
                1.0597798 = boost
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.015292261 = queryNorm
              0.320709 = fieldWeight in 4095, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4095)
          0.034819245 = weight(abstract_txt:several in 4095) [ClassicSimilarity], result of:
            0.034819245 = score(doc=4095,freq=3.0), product of:
              0.08083018 = queryWeight, product of:
                1.1622638 = boost
                4.5477557 = idf(docFreq=1272, maxDocs=44218)
                0.015292261 = queryNorm
              0.43077034 = fieldWeight in 4095, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.5477557 = idf(docFreq=1272, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4095)
          0.020918958 = weight(abstract_txt:approaches in 4095) [ClassicSimilarity], result of:
            0.020918958 = score(doc=4095,freq=1.0), product of:
              0.08300312 = queryWeight, product of:
                1.1777827 = boost
                4.6084785 = idf(docFreq=1197, maxDocs=44218)
                0.015292261 = queryNorm
              0.25202617 = fieldWeight in 4095, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6084785 = idf(docFreq=1197, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4095)
          0.02093033 = weight(abstract_txt:proposed in 4095) [ClassicSimilarity], result of:
            0.02093033 = score(doc=4095,freq=1.0), product of:
              0.0830332 = queryWeight, product of:
                1.177996 = boost
                4.6093135 = idf(docFreq=1196, maxDocs=44218)
                0.015292261 = queryNorm
              0.25207183 = fieldWeight in 4095, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6093135 = idf(docFreq=1196, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4095)
          0.042422917 = weight(abstract_txt:entities in 4095) [ClassicSimilarity], result of:
            0.042422917 = score(doc=4095,freq=1.0), product of:
              0.13298461 = queryWeight, product of:
                1.4907974 = boost
                5.8332562 = idf(docFreq=351, maxDocs=44218)
                0.015292261 = queryNorm
              0.3190062 = fieldWeight in 4095, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8332562 = idf(docFreq=351, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4095)
          0.06784897 = weight(abstract_txt:named in 4095) [ClassicSimilarity], result of:
            0.06784897 = score(doc=4095,freq=1.0), product of:
              0.18187091 = queryWeight, product of:
                1.7434101 = boost
                6.82169 = idf(docFreq=130, maxDocs=44218)
                0.015292261 = queryNorm
              0.37306118 = fieldWeight in 4095, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.82169 = idf(docFreq=130, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4095)
          0.09212036 = weight(abstract_txt:labeled in 4095) [ClassicSimilarity], result of:
            0.09212036 = score(doc=4095,freq=1.0), product of:
              0.22299996 = queryWeight, product of:
                1.9305023 = boost
                7.5537524 = idf(docFreq=62, maxDocs=44218)
                0.015292261 = queryNorm
              0.41309583 = fieldWeight in 4095, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5537524 = idf(docFreq=62, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4095)
        0.36 = coord(9/25)
    
  3. Shaalan, K.; Raza, H.: NERA: Named Entity Recognition for Arabic (2009) 0.11
    0.10852604 = sum of:
      0.10852604 = product of:
        0.45219186 = sum of:
          0.014133764 = weight(abstract_txt:text in 2953) [ClassicSimilarity], result of:
            0.014133764 = score(doc=2953,freq=1.0), product of:
              0.06391061 = queryWeight, product of:
                1.0334855 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.015292261 = queryNorm
              0.22114895 = fieldWeight in 2953, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2953)
          0.017579988 = weight(abstract_txt:present in 2953) [ClassicSimilarity], result of:
            0.017579988 = score(doc=2953,freq=1.0), product of:
              0.07391741 = queryWeight, product of:
                1.1114535 = boost
                4.348943 = idf(docFreq=1552, maxDocs=44218)
                0.015292261 = queryNorm
              0.23783283 = fieldWeight in 2953, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.348943 = idf(docFreq=1552, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2953)
          0.020102901 = weight(abstract_txt:several in 2953) [ClassicSimilarity], result of:
            0.020102901 = score(doc=2953,freq=1.0), product of:
              0.08083018 = queryWeight, product of:
                1.1622638 = boost
                4.5477557 = idf(docFreq=1272, maxDocs=44218)
                0.015292261 = queryNorm
              0.24870539 = fieldWeight in 2953, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5477557 = idf(docFreq=1272, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2953)
          0.08484583 = weight(abstract_txt:entities in 2953) [ClassicSimilarity], result of:
            0.08484583 = score(doc=2953,freq=4.0), product of:
              0.13298461 = queryWeight, product of:
                1.4907974 = boost
                5.8332562 = idf(docFreq=351, maxDocs=44218)
                0.015292261 = queryNorm
              0.6380124 = fieldWeight in 2953, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.8332562 = idf(docFreq=351, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2953)
          0.1795115 = weight(abstract_txt:named in 2953) [ClassicSimilarity], result of:
            0.1795115 = score(doc=2953,freq=7.0), product of:
              0.18187091 = queryWeight, product of:
                1.7434101 = boost
                6.82169 = idf(docFreq=130, maxDocs=44218)
                0.015292261 = queryNorm
              0.98702705 = fieldWeight in 2953, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                6.82169 = idf(docFreq=130, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2953)
          0.13601789 = weight(abstract_txt:name in 2953) [ClassicSimilarity], result of:
            0.13601789 = score(doc=2953,freq=5.0), product of:
              0.19357035 = queryWeight, product of:
                2.20284 = boost
                5.746245 = idf(docFreq=383, maxDocs=44218)
                0.015292261 = queryNorm
              0.7026794 = fieldWeight in 2953, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.746245 = idf(docFreq=383, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2953)
        0.24 = coord(6/25)
    
  4. Phan, M.C.; Sun, A.: Collective named entity recognition in user comments via parameterized label propagation (2020) 0.10
    0.09973175 = sum of:
      0.09973175 = product of:
        0.41554898 = sum of:
          0.02188077 = weight(abstract_txt:model in 5815) [ClassicSimilarity], result of:
            0.02188077 = score(doc=5815,freq=2.0), product of:
              0.06210189 = queryWeight, product of:
                1.0187564 = boost
                3.986234 = idf(docFreq=2231, maxDocs=44218)
                0.015292261 = queryNorm
              0.35233662 = fieldWeight in 5815, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.986234 = idf(docFreq=2231, maxDocs=44218)
                0.0625 = fieldNorm(doc=5815)
          0.022843614 = weight(abstract_txt:text in 5815) [ClassicSimilarity], result of:
            0.022843614 = score(doc=5815,freq=2.0), product of:
              0.06391061 = queryWeight, product of:
                1.0334855 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.015292261 = queryNorm
              0.3574307 = fieldWeight in 5815, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=5815)
          0.019967297 = weight(abstract_txt:context in 5815) [ClassicSimilarity], result of:
            0.019967297 = score(doc=5815,freq=1.0), product of:
              0.073612675 = queryWeight, product of:
                1.1091601 = boost
                4.339969 = idf(docFreq=1566, maxDocs=44218)
                0.015292261 = queryNorm
              0.27124807 = fieldWeight in 5815, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.339969 = idf(docFreq=1566, maxDocs=44218)
                0.0625 = fieldNorm(doc=5815)
          0.02390738 = weight(abstract_txt:approaches in 5815) [ClassicSimilarity], result of:
            0.02390738 = score(doc=5815,freq=1.0), product of:
              0.08300312 = queryWeight, product of:
                1.1777827 = boost
                4.6084785 = idf(docFreq=1197, maxDocs=44218)
                0.015292261 = queryNorm
              0.2880299 = fieldWeight in 5815, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6084785 = idf(docFreq=1197, maxDocs=44218)
                0.0625 = fieldNorm(doc=5815)
          0.1096605 = weight(abstract_txt:named in 5815) [ClassicSimilarity], result of:
            0.1096605 = score(doc=5815,freq=2.0), product of:
              0.18187091 = queryWeight, product of:
                1.7434101 = boost
                6.82169 = idf(docFreq=130, maxDocs=44218)
                0.015292261 = queryNorm
              0.6029579 = fieldWeight in 5815, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.82169 = idf(docFreq=130, maxDocs=44218)
                0.0625 = fieldNorm(doc=5815)
          0.21728942 = weight(abstract_txt:coreference in 5815) [ClassicSimilarity], result of:
            0.21728942 = score(doc=5815,freq=1.0), product of:
              0.3614921 = queryWeight, product of:
                2.4579177 = boost
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.015292261 = queryNorm
              0.6010904 = fieldWeight in 5815, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.0625 = fieldNorm(doc=5815)
        0.24 = coord(6/25)
    
  5. Pirkola, A.; Jarvelin, K.: ¬The effect of anaphor and ellipsis resolution on proximity searching in a text database (1995) 0.10
    0.09840407 = sum of:
      0.09840407 = product of:
        0.49202037 = sum of:
          0.027977599 = weight(abstract_txt:text in 4088) [ClassicSimilarity], result of:
            0.027977599 = score(doc=4088,freq=3.0), product of:
              0.06391061 = queryWeight, product of:
                1.0334855 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.015292261 = queryNorm
              0.4377614 = fieldWeight in 4088, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=4088)
          0.017417405 = weight(abstract_txt:methods in 4088) [ClassicSimilarity], result of:
            0.017417405 = score(doc=4088,freq=1.0), product of:
              0.067204036 = queryWeight, product of:
                1.0597798 = boost
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.015292261 = queryNorm
              0.259172 = fieldWeight in 4088, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.0625 = fieldNorm(doc=4088)
          0.019967297 = weight(abstract_txt:context in 4088) [ClassicSimilarity], result of:
            0.019967297 = score(doc=4088,freq=1.0), product of:
              0.073612675 = queryWeight, product of:
                1.1091601 = boost
                4.339969 = idf(docFreq=1566, maxDocs=44218)
                0.015292261 = queryNorm
              0.27124807 = fieldWeight in 4088, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.339969 = idf(docFreq=1566, maxDocs=44218)
                0.0625 = fieldNorm(doc=4088)
          0.25637218 = weight(abstract_txt:resolution in 4088) [ClassicSimilarity], result of:
            0.25637218 = score(doc=4088,freq=8.0), product of:
              0.20181665 = queryWeight, product of:
                1.8365233 = boost
                7.1860275 = idf(docFreq=90, maxDocs=44218)
                0.015292261 = queryNorm
              1.2703222 = fieldWeight in 4088, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                7.1860275 = idf(docFreq=90, maxDocs=44218)
                0.0625 = fieldNorm(doc=4088)
          0.17028588 = weight(abstract_txt:name in 4088) [ClassicSimilarity], result of:
            0.17028588 = score(doc=4088,freq=6.0), product of:
              0.19357035 = queryWeight, product of:
                2.20284 = boost
                5.746245 = idf(docFreq=383, maxDocs=44218)
                0.015292261 = queryNorm
              0.87971056 = fieldWeight in 4088, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.746245 = idf(docFreq=383, maxDocs=44218)
                0.0625 = fieldNorm(doc=4088)
        0.2 = coord(5/25)