Document (#42224)

Author
Karaulova, M.
Gök, A.
Shapira, P.
Title
Identifying author heritage using surname data : an application for Russian surnames
Source
Journal of the Association for Information Science and Technology. 70(2019) no.5, S.488-498
Year
2019
Abstract
This research article puts forward a method to identify the national heritage of authors based on the morphology of their surnames. Most studies in the field use variants of dictionary-based surname methods to identify ethnic communities, an approach that suffers from methodological limitations. Using the public file of ORCID (Open Researcher and Contributor ID) identifiers in 2015, we developed a surname-based identification method and applied it to infer Russian heritage from suffix-based morphological regularities. The method was developed conceptually and tested in an undersampled control set. Identification based on surname morphology was then complemented by using first-name data to eliminate false-positive results. The method achieved 98% precision and 94% recall rates-superior to most other methods that use name data. The procedure can be adapted to identify the heritage of a variety of national groups with morphologically regular naming traditions. We elaborate on how the method can be employed to overcome long-standing limitations of using name data in bibliometric datasets. This identification method can contribute to advancing research in scientific mobility and migration, patenting by certain groups, publishing and collaboration, transnational and scientific diaspora links, and the effects of diversity on the innovative performance of organizations, regions, and countries.
Content
Vgl.: https://onlinelibrary.wiley.com/doi/10.1002/asi.24104.
Theme
Formalerschließung

Similar documents (author)

  1. Shapira, B.: Hypertext browsing : a new model for information filtering based on user profiles and data clustering (1996) 5.62
    5.620886 = sum of:
      5.620886 = weight(author_txt:shapira in 4779) [ClassicSimilarity], result of:
        5.620886 = fieldWeight in 4779, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.993418 = idf(docFreq=14, maxDocs=44421)
          0.625 = fieldNorm(doc=4779)
    
  2. Shapira, B.; Zabar, B.: Personalized search : integrating collaboration and social networks (2011) 4.50
    4.496709 = sum of:
      4.496709 = weight(author_txt:shapira in 140) [ClassicSimilarity], result of:
        4.496709 = fieldWeight in 140, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.993418 = idf(docFreq=14, maxDocs=44421)
          0.5 = fieldNorm(doc=140)
    
  3. Shapira, B.; Shoval, P.; Hanani, U.: Stereotypes in information filtering systems (1997) 3.37
    3.3725317 = sum of:
      3.3725317 = weight(author_txt:shapira in 1157) [ClassicSimilarity], result of:
        3.3725317 = fieldWeight in 1157, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.993418 = idf(docFreq=14, maxDocs=44421)
          0.375 = fieldNorm(doc=1157)
    
  4. Shapira, B.; Kantor, P.B.; Melamed, B.: ¬The effect of extrinsic motivation on user behavior in a collaborative information finding system (2001) 3.37
    3.3725317 = sum of:
      3.3725317 = weight(author_txt:shapira in 525) [ClassicSimilarity], result of:
        3.3725317 = fieldWeight in 525, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.993418 = idf(docFreq=14, maxDocs=44421)
          0.375 = fieldNorm(doc=525)
    
  5. Kuflik, T.; Shapira, B.; Shoval, P.: Stereotype-based versus personal-based filtering rules in information filtering systems (2003) 3.37
    3.3725317 = sum of:
      3.3725317 = weight(author_txt:shapira in 2234) [ClassicSimilarity], result of:
        3.3725317 = fieldWeight in 2234, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.993418 = idf(docFreq=14, maxDocs=44421)
          0.375 = fieldNorm(doc=2234)
    

Similar documents (content)

  1. Strotmann, A.; Zhao, D.: Author name disambiguation : what difference does it make in author-based citation analysis? (2012) 0.18
    0.18191485 = sum of:
      0.18191485 = product of:
        0.75797856 = sum of:
          0.035465878 = weight(abstract_txt:using in 1389) [ClassicSimilarity], result of:
            0.035465878 = score(doc=1389,freq=4.0), product of:
              0.08207626 = queryWeight, product of:
                1.6059002 = boost
                3.4568708 = idf(docFreq=3806, maxDocs=44421)
                0.014784815 = queryNorm
              0.43210885 = fieldWeight in 1389, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.4568708 = idf(docFreq=3806, maxDocs=44421)
                0.0625 = fieldNorm(doc=1389)
          0.038485996 = weight(abstract_txt:identify in 1389) [ClassicSimilarity], result of:
            0.038485996 = score(doc=1389,freq=1.0), product of:
              0.12500268 = queryWeight, product of:
                1.7163271 = boost
                4.9261017 = idf(docFreq=875, maxDocs=44421)
                0.014784815 = queryNorm
              0.30788136 = fieldWeight in 1389, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9261017 = idf(docFreq=875, maxDocs=44421)
                0.0625 = fieldNorm(doc=1389)
          0.03461083 = weight(abstract_txt:based in 1389) [ClassicSimilarity], result of:
            0.03461083 = score(doc=1389,freq=4.0), product of:
              0.08698715 = queryWeight, product of:
                1.8483845 = boost
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.014784815 = queryNorm
              0.3978844 = fieldWeight in 1389, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.0625 = fieldNorm(doc=1389)
          0.12213373 = weight(abstract_txt:name in 1389) [ClassicSimilarity], result of:
            0.12213373 = score(doc=1389,freq=4.0), product of:
              0.17005442 = queryWeight, product of:
                2.001863 = boost
                5.7456303 = idf(docFreq=385, maxDocs=44421)
                0.014784815 = queryNorm
              0.7182038 = fieldWeight in 1389, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.7456303 = idf(docFreq=385, maxDocs=44421)
                0.0625 = fieldNorm(doc=1389)
          0.19120575 = weight(abstract_txt:surnames in 1389) [ClassicSimilarity], result of:
            0.19120575 = score(doc=1389,freq=1.0), product of:
              0.31794676 = queryWeight, product of:
                2.2349713 = boost
                9.622026 = idf(docFreq=7, maxDocs=44421)
                0.014784815 = queryNorm
              0.60137665 = fieldWeight in 1389, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.622026 = idf(docFreq=7, maxDocs=44421)
                0.0625 = fieldNorm(doc=1389)
          0.3360764 = weight(abstract_txt:surname in 1389) [ClassicSimilarity], result of:
            0.3360764 = score(doc=1389,freq=1.0), product of:
              0.58343047 = queryWeight, product of:
                4.2815824 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.014784815 = queryNorm
              0.5760351 = fieldWeight in 1389, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.0625 = fieldNorm(doc=1389)
        0.24 = coord(6/25)
    
  2. Piscitelli, F.A.: When does the forename end and the surname begin? : saints' names as compound forenames in Spanish (2019) 0.16
    0.15651114 = sum of:
      0.15651114 = product of:
        0.97819465 = sum of:
          0.057728995 = weight(abstract_txt:identify in 276) [ClassicSimilarity], result of:
            0.057728995 = score(doc=276,freq=1.0), product of:
              0.12500268 = queryWeight, product of:
                1.7163271 = boost
                4.9261017 = idf(docFreq=875, maxDocs=44421)
                0.014784815 = queryNorm
              0.46182203 = fieldWeight in 276, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9261017 = idf(docFreq=875, maxDocs=44421)
                0.09375 = fieldNorm(doc=276)
          0.12954238 = weight(abstract_txt:name in 276) [ClassicSimilarity], result of:
            0.12954238 = score(doc=276,freq=2.0), product of:
              0.17005442 = queryWeight, product of:
                2.001863 = boost
                5.7456303 = idf(docFreq=385, maxDocs=44421)
                0.014784815 = queryNorm
              0.7617701 = fieldWeight in 276, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.7456303 = idf(docFreq=385, maxDocs=44421)
                0.09375 = fieldNorm(doc=276)
          0.28680864 = weight(abstract_txt:surnames in 276) [ClassicSimilarity], result of:
            0.28680864 = score(doc=276,freq=1.0), product of:
              0.31794676 = queryWeight, product of:
                2.2349713 = boost
                9.622026 = idf(docFreq=7, maxDocs=44421)
                0.014784815 = queryNorm
              0.902065 = fieldWeight in 276, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.622026 = idf(docFreq=7, maxDocs=44421)
                0.09375 = fieldNorm(doc=276)
          0.5041146 = weight(abstract_txt:surname in 276) [ClassicSimilarity], result of:
            0.5041146 = score(doc=276,freq=1.0), product of:
              0.58343047 = queryWeight, product of:
                4.2815824 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.014784815 = queryNorm
              0.86405265 = fieldWeight in 276, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.09375 = fieldNorm(doc=276)
        0.16 = coord(4/25)
    
  3. Kilgour, F.G.; Moran, B.B.; Barden, J.R.: Retrieval effectiveness of surname-title-word searches for known items by academic library users (1999) 0.14
    0.14238721 = sum of:
      0.14238721 = product of:
        1.1865602 = sum of:
          0.026599407 = weight(abstract_txt:using in 4061) [ClassicSimilarity], result of:
            0.026599407 = score(doc=4061,freq=1.0), product of:
              0.08207626 = queryWeight, product of:
                1.6059002 = boost
                3.4568708 = idf(docFreq=3806, maxDocs=44421)
                0.014784815 = queryNorm
              0.32408163 = fieldWeight in 4061, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4568708 = idf(docFreq=3806, maxDocs=44421)
                0.09375 = fieldNorm(doc=4061)
          0.28680864 = weight(abstract_txt:surnames in 4061) [ClassicSimilarity], result of:
            0.28680864 = score(doc=4061,freq=1.0), product of:
              0.31794676 = queryWeight, product of:
                2.2349713 = boost
                9.622026 = idf(docFreq=7, maxDocs=44421)
                0.014784815 = queryNorm
              0.902065 = fieldWeight in 4061, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.622026 = idf(docFreq=7, maxDocs=44421)
                0.09375 = fieldNorm(doc=4061)
          0.87315214 = weight(abstract_txt:surname in 4061) [ClassicSimilarity], result of:
            0.87315214 = score(doc=4061,freq=3.0), product of:
              0.58343047 = queryWeight, product of:
                4.2815824 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.014784815 = queryNorm
              1.496583 = fieldWeight in 4061, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.09375 = fieldNorm(doc=4061)
        0.12 = coord(3/25)
    
  4. Zhang, Y.; Zhang, G.; Zhu, D.; Lu, J.: Scientific evolutionary pathways : identifying and visualizing relationships for scientific topics (2017) 0.09
    0.088705644 = sum of:
      0.088705644 = product of:
        0.31680587 = sum of:
          0.02091138 = weight(abstract_txt:national in 4758) [ClassicSimilarity], result of:
            0.02091138 = score(doc=4758,freq=1.0), product of:
              0.072712466 = queryWeight, product of:
                1.0688068 = boost
                4.6014404 = idf(docFreq=1211, maxDocs=44421)
                0.014784815 = queryNorm
              0.28759003 = fieldWeight in 4758, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6014404 = idf(docFreq=1211, maxDocs=44421)
                0.0625 = fieldNorm(doc=4758)
          0.052264597 = weight(abstract_txt:scientific in 4758) [ClassicSimilarity], result of:
            0.052264597 = score(doc=4758,freq=6.0), product of:
              0.07369563 = queryWeight, product of:
                1.0760083 = boost
                4.6324444 = idf(docFreq=1174, maxDocs=44421)
                0.014784815 = queryNorm
              0.7091953 = fieldWeight in 4758, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.6324444 = idf(docFreq=1174, maxDocs=44421)
                0.0625 = fieldNorm(doc=4758)
          0.022421632 = weight(abstract_txt:data in 4758) [ClassicSimilarity], result of:
            0.022421632 = score(doc=4758,freq=2.0), product of:
              0.07617256 = queryWeight, product of:
                1.5470667 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.014784815 = queryNorm
              0.29435313 = fieldWeight in 4758, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.0625 = fieldNorm(doc=4758)
          0.038485996 = weight(abstract_txt:identify in 4758) [ClassicSimilarity], result of:
            0.038485996 = score(doc=4758,freq=1.0), product of:
              0.12500268 = queryWeight, product of:
                1.7163271 = boost
                4.9261017 = idf(docFreq=875, maxDocs=44421)
                0.014784815 = queryNorm
              0.30788136 = fieldWeight in 4758, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9261017 = idf(docFreq=875, maxDocs=44421)
                0.0625 = fieldNorm(doc=4758)
          0.017305415 = weight(abstract_txt:based in 4758) [ClassicSimilarity], result of:
            0.017305415 = score(doc=4758,freq=1.0), product of:
              0.08698715 = queryWeight, product of:
                1.8483845 = boost
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.014784815 = queryNorm
              0.1989422 = fieldWeight in 4758, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.0625 = fieldNorm(doc=4758)
          0.06386798 = weight(abstract_txt:identification in 4758) [ClassicSimilarity], result of:
            0.06386798 = score(doc=4758,freq=1.0), product of:
              0.17521568 = queryWeight, product of:
                2.0320148 = boost
                5.8321705 = idf(docFreq=353, maxDocs=44421)
                0.014784815 = queryNorm
              0.36451066 = fieldWeight in 4758, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8321705 = idf(docFreq=353, maxDocs=44421)
                0.0625 = fieldNorm(doc=4758)
          0.10154888 = weight(abstract_txt:method in 4758) [ClassicSimilarity], result of:
            0.10154888 = score(doc=4758,freq=3.0), product of:
              0.20851494 = queryWeight, product of:
                3.1349022 = boost
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.014784815 = queryNorm
              0.4870101 = fieldWeight in 4758, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.0625 = fieldNorm(doc=4758)
        0.28 = coord(7/25)
    
  5. Kilgour, F.G.; Moran, B.B.: Surname plus recallable title word searches for known items by scholars (2000) 0.08
    0.084365144 = sum of:
      0.084365144 = product of:
        1.0545644 = sum of:
          0.3824115 = weight(abstract_txt:surnames in 5296) [ClassicSimilarity], result of:
            0.3824115 = score(doc=5296,freq=1.0), product of:
              0.31794676 = queryWeight, product of:
                2.2349713 = boost
                9.622026 = idf(docFreq=7, maxDocs=44421)
                0.014784815 = queryNorm
              1.2027533 = fieldWeight in 5296, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.622026 = idf(docFreq=7, maxDocs=44421)
                0.125 = fieldNorm(doc=5296)
          0.6721528 = weight(abstract_txt:surname in 5296) [ClassicSimilarity], result of:
            0.6721528 = score(doc=5296,freq=1.0), product of:
              0.58343047 = queryWeight, product of:
                4.2815824 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.014784815 = queryNorm
              1.1520702 = fieldWeight in 5296, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.125 = fieldNorm(doc=5296)
        0.08 = coord(2/25)