Document (#39777)

Author
Donner, P.
Title
Enhanced self-citation detection by fuzzy author name matching and complementary error estimates
Source
Journal of the Association for Information Science and Technology. 67(2016) no.3, S.662-670
Year
2016
Abstract
In this article I investigate the shortcomings of exact string match-based author self-citation detection methods. The contributions of this study are twofold. First, I apply a fuzzy string matching algorithm for self-citation detection and benchmark this approach and other common methods of exclusively author name-based self-citation detection against a manually curated ground truth sample. Near full recall can be achieved with the proposed method while incurring only negligible precision loss. Second, I report some important observations from the results about the extent of latent self-citations and their characteristics and give an example of the effect of improved self-citation detection on the document level self-citation rate of real data.
Content
Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23399/abstract.
Theme
Informetrie

Similar documents (content)

  1. Gipp, B.; Meuschke, N.; Breitinger, C.: Citation-based plagiarism detection : practicability on a large-scale scientific corpus (2014) 0.26
    0.25792933 = sum of:
      0.25792933 = product of:
        0.9211762 = sum of:
          0.045818947 = weight(abstract_txt:ground in 4332) [ClassicSimilarity], result of:
            0.045818947 = score(doc=4332,freq=1.0), product of:
              0.10511827 = queryWeight, product of:
                1.0170329 = boost
                6.9740796 = idf(docFreq=112, maxDocs=44421)
                0.014820276 = queryNorm
              0.43587998 = fieldWeight in 4332, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9740796 = idf(docFreq=112, maxDocs=44421)
                0.0625 = fieldNorm(doc=4332)
          0.005645723 = weight(abstract_txt:this in 4332) [ClassicSimilarity], result of:
            0.005645723 = score(doc=4332,freq=1.0), product of:
              0.03754063 = queryWeight, product of:
                1.0527065 = boost
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.014820276 = queryNorm
              0.15038967 = fieldWeight in 4332, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.0625 = fieldNorm(doc=4332)
          0.051916357 = weight(abstract_txt:benchmark in 4332) [ClassicSimilarity], result of:
            0.051916357 = score(doc=4332,freq=1.0), product of:
              0.1142486 = queryWeight, product of:
                1.0602819 = boost
                7.270651 = idf(docFreq=83, maxDocs=44421)
                0.014820276 = queryNorm
              0.45441568 = fieldWeight in 4332, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.270651 = idf(docFreq=83, maxDocs=44421)
                0.0625 = fieldNorm(doc=4332)
          0.053520117 = weight(abstract_txt:truth in 4332) [ClassicSimilarity], result of:
            0.053520117 = score(doc=4332,freq=1.0), product of:
              0.11658951 = queryWeight, product of:
                1.0710891 = boost
                7.344759 = idf(docFreq=77, maxDocs=44421)
                0.014820276 = queryNorm
              0.45904744 = fieldWeight in 4332, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.344759 = idf(docFreq=77, maxDocs=44421)
                0.0625 = fieldNorm(doc=4332)
          0.019217983 = weight(abstract_txt:methods in 4332) [ClassicSimilarity], result of:
            0.019217983 = score(doc=4332,freq=1.0), product of:
              0.07421015 = queryWeight, product of:
                1.2084886 = boost
                4.1434727 = idf(docFreq=1915, maxDocs=44421)
                0.014820276 = queryNorm
              0.25896704 = fieldWeight in 4332, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1434727 = idf(docFreq=1915, maxDocs=44421)
                0.0625 = fieldNorm(doc=4332)
          0.18956214 = weight(abstract_txt:citation in 4332) [ClassicSimilarity], result of:
            0.18956214 = score(doc=4332,freq=4.0), product of:
              0.31010798 = queryWeight, product of:
                4.278859 = boost
                4.890223 = idf(docFreq=907, maxDocs=44421)
                0.014820276 = queryNorm
              0.6112779 = fieldWeight in 4332, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.890223 = idf(docFreq=907, maxDocs=44421)
                0.0625 = fieldNorm(doc=4332)
          0.5554949 = weight(abstract_txt:detection in 4332) [ClassicSimilarity], result of:
            0.5554949 = score(doc=4332,freq=7.0), product of:
              0.49589777 = queryWeight, product of:
                4.9394317 = boost
                6.774214 = idf(docFreq=137, maxDocs=44421)
                0.014820276 = queryNorm
              1.1201802 = fieldWeight in 4332, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                6.774214 = idf(docFreq=137, maxDocs=44421)
                0.0625 = fieldNorm(doc=4332)
        0.28 = coord(7/25)
    
  2. Davarpanah, M.R.; Amel, F.: Author self-citation pattern in science (2009) 0.15
    0.15474707 = sum of:
      0.15474707 = product of:
        0.96716917 = sum of:
          0.009980322 = weight(abstract_txt:this in 3968) [ClassicSimilarity], result of:
            0.009980322 = score(doc=3968,freq=2.0), product of:
              0.03754063 = queryWeight, product of:
                1.0527065 = boost
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.014820276 = queryNorm
              0.26585388 = fieldWeight in 3968, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.078125 = fieldNorm(doc=3968)
          0.08847685 = weight(abstract_txt:author in 3968) [ClassicSimilarity], result of:
            0.08847685 = score(doc=3968,freq=2.0), product of:
              0.16080207 = queryWeight, product of:
                2.1787245 = boost
                4.980042 = idf(docFreq=829, maxDocs=44421)
                0.014820276 = queryNorm
              0.5502221 = fieldWeight in 3968, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.980042 = idf(docFreq=829, maxDocs=44421)
                0.078125 = fieldNorm(doc=3968)
          0.3351017 = weight(abstract_txt:citation in 3968) [ClassicSimilarity], result of:
            0.3351017 = score(doc=3968,freq=8.0), product of:
              0.31010798 = queryWeight, product of:
                4.278859 = boost
                4.890223 = idf(docFreq=907, maxDocs=44421)
                0.014820276 = queryNorm
              1.0805968 = fieldWeight in 3968, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                4.890223 = idf(docFreq=907, maxDocs=44421)
                0.078125 = fieldNorm(doc=3968)
          0.5336103 = weight(abstract_txt:self in 3968) [ClassicSimilarity], result of:
            0.5336103 = score(doc=3968,freq=7.0), product of:
              0.46543345 = queryWeight, product of:
                5.6620502 = boost
                5.5466094 = idf(docFreq=470, maxDocs=44421)
                0.014820276 = queryNorm
              1.1464803 = fieldWeight in 3968, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                5.5466094 = idf(docFreq=470, maxDocs=44421)
                0.078125 = fieldNorm(doc=3968)
        0.16 = coord(4/25)
    
  3. Galvez, C.; Moya-Anegón, F.: Approximate personal name-matching through finite-state graphs (2007) 0.14
    0.13529909 = sum of:
      0.13529909 = product of:
        0.48321104 = sum of:
          0.009778679 = weight(abstract_txt:this in 1614) [ClassicSimilarity], result of:
            0.009778679 = score(doc=1614,freq=3.0), product of:
              0.03754063 = queryWeight, product of:
                1.0527065 = boost
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.014820276 = queryNorm
              0.26048255 = fieldWeight in 1614, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.0625 = fieldNorm(doc=1614)
          0.027178332 = weight(abstract_txt:methods in 1614) [ClassicSimilarity], result of:
            0.027178332 = score(doc=1614,freq=2.0), product of:
              0.07421015 = queryWeight, product of:
                1.2084886 = boost
                4.1434727 = idf(docFreq=1915, maxDocs=44421)
                0.014820276 = queryNorm
              0.3662347 = fieldWeight in 1614, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1434727 = idf(docFreq=1915, maxDocs=44421)
                0.0625 = fieldNorm(doc=1614)
          0.07246736 = weight(abstract_txt:name in 1614) [ClassicSimilarity], result of:
            0.07246736 = score(doc=1614,freq=2.0), product of:
              0.14269532 = queryWeight, product of:
                1.6757752 = boost
                5.7456303 = idf(docFreq=385, maxDocs=44421)
                0.014820276 = queryNorm
              0.5078468 = fieldWeight in 1614, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.7456303 = idf(docFreq=385, maxDocs=44421)
                0.0625 = fieldNorm(doc=1614)
          0.05958726 = weight(abstract_txt:matching in 1614) [ClassicSimilarity], result of:
            0.05958726 = score(doc=1614,freq=1.0), product of:
              0.15779518 = queryWeight, product of:
                1.7622104 = boost
                6.0419855 = idf(docFreq=286, maxDocs=44421)
                0.014820276 = queryNorm
              0.3776241 = fieldWeight in 1614, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0419855 = idf(docFreq=286, maxDocs=44421)
                0.0625 = fieldNorm(doc=1614)
          0.09998376 = weight(abstract_txt:string in 1614) [ClassicSimilarity], result of:
            0.09998376 = score(doc=1614,freq=1.0), product of:
              0.22281499 = queryWeight, product of:
                2.0940309 = boost
                7.179679 = idf(docFreq=91, maxDocs=44421)
                0.014820276 = queryNorm
              0.44872993 = fieldWeight in 1614, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.179679 = idf(docFreq=91, maxDocs=44421)
                0.0625 = fieldNorm(doc=1614)
          0.050050065 = weight(abstract_txt:author in 1614) [ClassicSimilarity], result of:
            0.050050065 = score(doc=1614,freq=1.0), product of:
              0.16080207 = queryWeight, product of:
                2.1787245 = boost
                4.980042 = idf(docFreq=829, maxDocs=44421)
                0.014820276 = queryNorm
              0.31125262 = fieldWeight in 1614, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.980042 = idf(docFreq=829, maxDocs=44421)
                0.0625 = fieldNorm(doc=1614)
          0.16416563 = weight(abstract_txt:citation in 1614) [ClassicSimilarity], result of:
            0.16416563 = score(doc=1614,freq=3.0), product of:
              0.31010798 = queryWeight, product of:
                4.278859 = boost
                4.890223 = idf(docFreq=907, maxDocs=44421)
                0.014820276 = queryNorm
              0.52938217 = fieldWeight in 1614, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.890223 = idf(docFreq=907, maxDocs=44421)
                0.0625 = fieldNorm(doc=1614)
        0.28 = coord(7/25)
    
  4. Ferreira, A.A.; Veloso, A.; Gonçalves, M.A.; Laender, A.H.F.: Self-training author name disambiguation for information scarce scenarios (2014) 0.13
    0.12623477 = sum of:
      0.12623477 = product of:
        0.63117385 = sum of:
          0.033286523 = weight(abstract_txt:methods in 2292) [ClassicSimilarity], result of:
            0.033286523 = score(doc=2292,freq=3.0), product of:
              0.07421015 = queryWeight, product of:
                1.2084886 = boost
                4.1434727 = idf(docFreq=1915, maxDocs=44421)
                0.014820276 = queryNorm
              0.44854406 = fieldWeight in 2292, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.1434727 = idf(docFreq=1915, maxDocs=44421)
                0.0625 = fieldNorm(doc=2292)
          0.07246736 = weight(abstract_txt:name in 2292) [ClassicSimilarity], result of:
            0.07246736 = score(doc=2292,freq=2.0), product of:
              0.14269532 = queryWeight, product of:
                1.6757752 = boost
                5.7456303 = idf(docFreq=385, maxDocs=44421)
                0.014820276 = queryNorm
              0.5078468 = fieldWeight in 2292, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.7456303 = idf(docFreq=385, maxDocs=44421)
                0.0625 = fieldNorm(doc=2292)
          0.11191535 = weight(abstract_txt:author in 2292) [ClassicSimilarity], result of:
            0.11191535 = score(doc=2292,freq=5.0), product of:
              0.16080207 = queryWeight, product of:
                2.1787245 = boost
                4.980042 = idf(docFreq=829, maxDocs=44421)
                0.014820276 = queryNorm
              0.69598204 = fieldWeight in 2292, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.980042 = idf(docFreq=829, maxDocs=44421)
                0.0625 = fieldNorm(doc=2292)
          0.13404068 = weight(abstract_txt:citation in 2292) [ClassicSimilarity], result of:
            0.13404068 = score(doc=2292,freq=2.0), product of:
              0.31010798 = queryWeight, product of:
                4.278859 = boost
                4.890223 = idf(docFreq=907, maxDocs=44421)
                0.014820276 = queryNorm
              0.43223873 = fieldWeight in 2292, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.890223 = idf(docFreq=907, maxDocs=44421)
                0.0625 = fieldNorm(doc=2292)
          0.27946395 = weight(abstract_txt:self in 2292) [ClassicSimilarity], result of:
            0.27946395 = score(doc=2292,freq=3.0), product of:
              0.46543345 = queryWeight, product of:
                5.6620502 = boost
                5.5466094 = idf(docFreq=470, maxDocs=44421)
                0.014820276 = queryNorm
              0.60043806 = fieldWeight in 2292, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.5466094 = idf(docFreq=470, maxDocs=44421)
                0.0625 = fieldNorm(doc=2292)
        0.2 = coord(5/25)
    
  5. Kim, J.(im); Kim, J.(enna): Effect of forename string on author name disambiguation (2020) 0.12
    0.115914986 = sum of:
      0.115914986 = product of:
        0.48297912 = sum of:
          0.005645723 = weight(abstract_txt:this in 930) [ClassicSimilarity], result of:
            0.005645723 = score(doc=930,freq=1.0), product of:
              0.03754063 = queryWeight, product of:
                1.0527065 = boost
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.014820276 = queryNorm
              0.15038967 = fieldWeight in 930, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.0625 = fieldNorm(doc=930)
          0.019217983 = weight(abstract_txt:methods in 930) [ClassicSimilarity], result of:
            0.019217983 = score(doc=930,freq=1.0), product of:
              0.07421015 = queryWeight, product of:
                1.2084886 = boost
                4.1434727 = idf(docFreq=1915, maxDocs=44421)
                0.014820276 = queryNorm
              0.25896704 = fieldWeight in 930, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1434727 = idf(docFreq=1915, maxDocs=44421)
                0.0625 = fieldNorm(doc=930)
          0.08875402 = weight(abstract_txt:name in 930) [ClassicSimilarity], result of:
            0.08875402 = score(doc=930,freq=3.0), product of:
              0.14269532 = queryWeight, product of:
                1.6757752 = boost
                5.7456303 = idf(docFreq=385, maxDocs=44421)
                0.014820276 = queryNorm
              0.6219827 = fieldWeight in 930, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.7456303 = idf(docFreq=385, maxDocs=44421)
                0.0625 = fieldNorm(doc=930)
          0.084269114 = weight(abstract_txt:matching in 930) [ClassicSimilarity], result of:
            0.084269114 = score(doc=930,freq=2.0), product of:
              0.15779518 = queryWeight, product of:
                1.7622104 = boost
                6.0419855 = idf(docFreq=286, maxDocs=44421)
                0.014820276 = queryNorm
              0.5340411 = fieldWeight in 930, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.0419855 = idf(docFreq=286, maxDocs=44421)
                0.0625 = fieldNorm(doc=930)
          0.17317694 = weight(abstract_txt:string in 930) [ClassicSimilarity], result of:
            0.17317694 = score(doc=930,freq=3.0), product of:
              0.22281499 = queryWeight, product of:
                2.0940309 = boost
                7.179679 = idf(docFreq=91, maxDocs=44421)
                0.014820276 = queryNorm
              0.77722305 = fieldWeight in 930, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.179679 = idf(docFreq=91, maxDocs=44421)
                0.0625 = fieldNorm(doc=930)
          0.11191535 = weight(abstract_txt:author in 930) [ClassicSimilarity], result of:
            0.11191535 = score(doc=930,freq=5.0), product of:
              0.16080207 = queryWeight, product of:
                2.1787245 = boost
                4.980042 = idf(docFreq=829, maxDocs=44421)
                0.014820276 = queryNorm
              0.69598204 = fieldWeight in 930, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.980042 = idf(docFreq=829, maxDocs=44421)
                0.0625 = fieldNorm(doc=930)
        0.24 = coord(6/25)