Document (#43412)

Author
Cabanac, G.
Labbé, C.
Title
Prevalence of nonsensical algorithmically generated papers in the scientific literature
Source
Journal of the Association for Information Science and Technology. 72(2021) no.12, S.1461-1476
Year
2021
Abstract
In 2014 leading publishers withdrew more than 120 nonsensical publications automatically generated with the SCIgen program. Casual observations suggested that similar problematic papers are still published and sold, without follow-up retractions. No systematic screening has been performed and the prevalence of such nonsensical publications in the scientific literature is unknown. Our contribution is 2-fold. First, we designed a detector that combs the scientific literature for grammar-based computer-generated papers. Applied to SCIgen, it has a 83.6% precision. Second, we performed a scientometric study of the 243 detected SCIgen-papers from 19 publishers. We estimate the prevalence of SCIgen-papers to be 75 per million papers in Information and Computing Sciences. Only 19% of the 243 problematic papers were dealt with: formal retraction (12) or silent removal (34). Publishers still serve and sometimes sell the remaining 197 papers without any caveat. We found evidence of citation manipulation via edited SCIgen bibliographies. This work reveals metric gaming up to the point of absurdity: fraudsters publish nonsensical algorithmically generated papers featuring genuine references. It stresses the need to screen papers for nonsense before peer-review and chase citation manipulation in published papers. Overall, this is yet another illustration of the harmful effects of the pressure to publish or perish.
Content
Vgl.: https://asistdl.onlinelibrary.wiley.com/doi/10.1002/asi.24495.
Theme
Elektronisches Publizieren
Informetrie
Field
Kommunikationswissenschaften
Object
SCIgen

Similar documents (author)

  1. Cabanac, G.: Shaping the landscape of research in information systems from the perspective of editorial boards : a scientometric study of 77 leading journals (2012) 5.87
    5.874302 = sum of:
      5.874302 = weight(author_txt:cabanac in 1242) [ClassicSimilarity], result of:
        5.874302 = fieldWeight in 1242, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.398883 = idf(docFreq=9, maxDocs=44421)
          0.625 = fieldNorm(doc=1242)
    
  2. Cabanac, G.: Bibliogifts in LibGen? : a study of a text-sharing platform driven by biblioleaks and crowdsourcing (2016) 5.87
    5.874302 = sum of:
      5.874302 = weight(author_txt:cabanac in 3850) [ClassicSimilarity], result of:
        5.874302 = fieldWeight in 3850, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.398883 = idf(docFreq=9, maxDocs=44421)
          0.625 = fieldNorm(doc=3850)
    
  3. Cabanac, G.; Preuss, T.: Capitalizing on order effects in the bids of peer-reviewed conferences to secure reviews by expert referees (2013) 4.70
    4.6994414 = sum of:
      4.6994414 = weight(author_txt:cabanac in 1619) [ClassicSimilarity], result of:
        4.6994414 = fieldWeight in 1619, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.398883 = idf(docFreq=9, maxDocs=44421)
          0.5 = fieldNorm(doc=1619)
    
  4. Cabanac, G.; Hartley, J.: Issues of work-life balance among JASIST authors and editors (2013) 4.70
    4.6994414 = sum of:
      4.6994414 = weight(author_txt:cabanac in 1996) [ClassicSimilarity], result of:
        4.6994414 = fieldWeight in 1996, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.398883 = idf(docFreq=9, maxDocs=44421)
          0.5 = fieldNorm(doc=1996)
    
  5. Cabanac, G.; Hubert, G.; Hartley, J.: Solo versus collaborative writing : discrepancies in the use of tables and graphs in academic articles (2014) 3.52
    3.524581 = sum of:
      3.524581 = weight(author_txt:cabanac in 2242) [ClassicSimilarity], result of:
        3.524581 = fieldWeight in 2242, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.398883 = idf(docFreq=9, maxDocs=44421)
          0.375 = fieldNorm(doc=2242)
    

Similar documents (content)

  1. Larivière, V.; Gingras, Y.: On the prevalence and scientific impact of duplicate publications in different scientific fields (1980-2007) (2010) 0.24
    0.24485543 = sum of:
      0.24485543 = product of:
        1.020231 = sum of:
          0.04615538 = weight(abstract_txt:published in 609) [ClassicSimilarity], result of:
            0.04615538 = score(doc=609,freq=3.0), product of:
              0.0869111 = queryWeight, product of:
                1.0909672 = boost
                4.9057617 = idf(docFreq=893, maxDocs=44421)
                0.016238919 = queryNorm
              0.5310643 = fieldWeight in 609, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.9057617 = idf(docFreq=893, maxDocs=44421)
                0.0625 = fieldNorm(doc=609)
          0.046466887 = weight(abstract_txt:publications in 609) [ClassicSimilarity], result of:
            0.046466887 = score(doc=609,freq=2.0), product of:
              0.09993551 = queryWeight, product of:
                1.1698602 = boost
                5.260521 = idf(docFreq=626, maxDocs=44421)
                0.016238919 = queryNorm
              0.46496874 = fieldWeight in 609, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.260521 = idf(docFreq=626, maxDocs=44421)
                0.0625 = fieldNorm(doc=609)
          0.028899228 = weight(abstract_txt:literature in 609) [ClassicSimilarity], result of:
            0.028899228 = score(doc=609,freq=1.0), product of:
              0.10501597 = queryWeight, product of:
                1.4687482 = boost
                4.4030223 = idf(docFreq=1477, maxDocs=44421)
                0.016238919 = queryNorm
              0.2751889 = fieldWeight in 609, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4030223 = idf(docFreq=1477, maxDocs=44421)
                0.0625 = fieldNorm(doc=609)
          0.047596965 = weight(abstract_txt:scientific in 609) [ClassicSimilarity], result of:
            0.047596965 = score(doc=609,freq=2.0), product of:
              0.116244934 = queryWeight, product of:
                1.5452783 = boost
                4.6324444 = idf(docFreq=1174, maxDocs=44421)
                0.016238919 = queryNorm
              0.4094541 = fieldWeight in 609, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.6324444 = idf(docFreq=1174, maxDocs=44421)
                0.0625 = fieldNorm(doc=609)
          0.30748695 = weight(abstract_txt:prevalence in 609) [ClassicSimilarity], result of:
            0.30748695 = score(doc=609,freq=3.0), product of:
              0.35224262 = queryWeight, product of:
                2.689928 = boost
                8.063882 = idf(docFreq=37, maxDocs=44421)
                0.016238919 = queryNorm
              0.8729408 = fieldWeight in 609, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.063882 = idf(docFreq=37, maxDocs=44421)
                0.0625 = fieldNorm(doc=609)
          0.54362565 = weight(abstract_txt:papers in 609) [ClassicSimilarity], result of:
            0.54362565 = score(doc=609,freq=9.0), product of:
              0.5506481 = queryWeight, product of:
                6.440098 = boost
                5.2653174 = idf(docFreq=623, maxDocs=44421)
                0.016238919 = queryNorm
              0.987247 = fieldWeight in 609, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                5.2653174 = idf(docFreq=623, maxDocs=44421)
                0.0625 = fieldNorm(doc=609)
        0.24 = coord(6/25)
    
  2. Jiao, H.; Qiu, Y.; Ma, X.; Yang, B.: Dissmination effect of data papers on scientific datasets (2024) 0.20
    0.20485923 = sum of:
      0.20485923 = product of:
        0.8535801 = sum of:
          0.045718182 = weight(abstract_txt:citation in 2206) [ClassicSimilarity], result of:
            0.045718182 = score(doc=2206,freq=3.0), product of:
              0.08636139 = queryWeight, product of:
                1.0875115 = boost
                4.890223 = idf(docFreq=907, maxDocs=44421)
                0.016238919 = queryNorm
              0.52938217 = fieldWeight in 2206, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.890223 = idf(docFreq=907, maxDocs=44421)
                0.0625 = fieldNorm(doc=2206)
          0.026647821 = weight(abstract_txt:published in 2206) [ClassicSimilarity], result of:
            0.026647821 = score(doc=2206,freq=1.0), product of:
              0.0869111 = queryWeight, product of:
                1.0909672 = boost
                4.9057617 = idf(docFreq=893, maxDocs=44421)
                0.016238919 = queryNorm
              0.3066101 = fieldWeight in 2206, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9057617 = idf(docFreq=893, maxDocs=44421)
                0.0625 = fieldNorm(doc=2206)
          0.032857053 = weight(abstract_txt:publications in 2206) [ClassicSimilarity], result of:
            0.032857053 = score(doc=2206,freq=1.0), product of:
              0.09993551 = queryWeight, product of:
                1.1698602 = boost
                5.260521 = idf(docFreq=626, maxDocs=44421)
                0.016238919 = queryNorm
              0.32878256 = fieldWeight in 2206, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.260521 = idf(docFreq=626, maxDocs=44421)
                0.0625 = fieldNorm(doc=2206)
          0.05829414 = weight(abstract_txt:scientific in 2206) [ClassicSimilarity], result of:
            0.05829414 = score(doc=2206,freq=3.0), product of:
              0.116244934 = queryWeight, product of:
                1.5452783 = boost
                4.6324444 = idf(docFreq=1174, maxDocs=44421)
                0.016238919 = queryNorm
              0.5014768 = fieldWeight in 2206, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.6324444 = idf(docFreq=1174, maxDocs=44421)
                0.0625 = fieldNorm(doc=2206)
          0.17752768 = weight(abstract_txt:prevalence in 2206) [ClassicSimilarity], result of:
            0.17752768 = score(doc=2206,freq=1.0), product of:
              0.35224262 = queryWeight, product of:
                2.689928 = boost
                8.063882 = idf(docFreq=37, maxDocs=44421)
                0.016238919 = queryNorm
              0.5039926 = fieldWeight in 2206, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.063882 = idf(docFreq=37, maxDocs=44421)
                0.0625 = fieldNorm(doc=2206)
          0.5125352 = weight(abstract_txt:papers in 2206) [ClassicSimilarity], result of:
            0.5125352 = score(doc=2206,freq=8.0), product of:
              0.5506481 = queryWeight, product of:
                6.440098 = boost
                5.2653174 = idf(docFreq=623, maxDocs=44421)
                0.016238919 = queryNorm
              0.9307854 = fieldWeight in 2206, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                5.2653174 = idf(docFreq=623, maxDocs=44421)
                0.0625 = fieldNorm(doc=2206)
        0.24 = coord(6/25)
    
  3. Pertile, S. de L.; Moreira, V.P.: Comparing and combining content- and citation-based approaches for plagiarism detection (2016) 0.14
    0.14001966 = sum of:
      0.14001966 = product of:
        0.58341527 = sum of:
          0.026395405 = weight(abstract_txt:citation in 4123) [ClassicSimilarity], result of:
            0.026395405 = score(doc=4123,freq=1.0), product of:
              0.08636139 = queryWeight, product of:
                1.0875115 = boost
                4.890223 = idf(docFreq=907, maxDocs=44421)
                0.016238919 = queryNorm
              0.30563894 = fieldWeight in 4123, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.890223 = idf(docFreq=907, maxDocs=44421)
                0.0625 = fieldNorm(doc=4123)
          0.03207337 = weight(abstract_txt:without in 4123) [ClassicSimilarity], result of:
            0.03207337 = score(doc=4123,freq=1.0), product of:
              0.09834007 = queryWeight, product of:
                1.1604844 = boost
                5.2183604 = idf(docFreq=653, maxDocs=44421)
                0.016238919 = queryNorm
              0.32614753 = fieldWeight in 4123, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2183604 = idf(docFreq=653, maxDocs=44421)
                0.0625 = fieldNorm(doc=4123)
          0.032857053 = weight(abstract_txt:publications in 4123) [ClassicSimilarity], result of:
            0.032857053 = score(doc=4123,freq=1.0), product of:
              0.09993551 = queryWeight, product of:
                1.1698602 = boost
                5.260521 = idf(docFreq=626, maxDocs=44421)
                0.016238919 = queryNorm
              0.32878256 = fieldWeight in 4123, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.260521 = idf(docFreq=626, maxDocs=44421)
                0.0625 = fieldNorm(doc=4123)
          0.05829414 = weight(abstract_txt:scientific in 4123) [ClassicSimilarity], result of:
            0.05829414 = score(doc=4123,freq=3.0), product of:
              0.116244934 = queryWeight, product of:
                1.5452783 = boost
                4.6324444 = idf(docFreq=1174, maxDocs=44421)
                0.016238919 = queryNorm
              0.5014768 = fieldWeight in 4123, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.6324444 = idf(docFreq=1174, maxDocs=44421)
                0.0625 = fieldNorm(doc=4123)
          0.17752768 = weight(abstract_txt:prevalence in 4123) [ClassicSimilarity], result of:
            0.17752768 = score(doc=4123,freq=1.0), product of:
              0.35224262 = queryWeight, product of:
                2.689928 = boost
                8.063882 = idf(docFreq=37, maxDocs=44421)
                0.016238919 = queryNorm
              0.5039926 = fieldWeight in 4123, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.063882 = idf(docFreq=37, maxDocs=44421)
                0.0625 = fieldNorm(doc=4123)
          0.2562676 = weight(abstract_txt:papers in 4123) [ClassicSimilarity], result of:
            0.2562676 = score(doc=4123,freq=2.0), product of:
              0.5506481 = queryWeight, product of:
                6.440098 = boost
                5.2653174 = idf(docFreq=623, maxDocs=44421)
                0.016238919 = queryNorm
              0.4653927 = fieldWeight in 4123, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.2653174 = idf(docFreq=623, maxDocs=44421)
                0.0625 = fieldNorm(doc=4123)
        0.24 = coord(6/25)
    
  4. Besançon, L.; Cabanac, G.; Labbé, C.; Magazinov, A.: Sneaked references : fabricated reference metadata distort citation counts (2024) 0.13
    0.13429508 = sum of:
      0.13429508 = product of:
        0.41967213 = sum of:
          0.083993085 = weight(abstract_txt:gaming in 2391) [ClassicSimilarity], result of:
            0.083993085 = score(doc=2391,freq=1.0), product of:
              0.1482927 = queryWeight, product of:
                1.0076715 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.016238919 = queryNorm
              0.56640065 = fieldWeight in 2391, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.0625 = fieldNorm(doc=2391)
          0.05279081 = weight(abstract_txt:citation in 2391) [ClassicSimilarity], result of:
            0.05279081 = score(doc=2391,freq=4.0), product of:
              0.08636139 = queryWeight, product of:
                1.0875115 = boost
                4.890223 = idf(docFreq=907, maxDocs=44421)
                0.016238919 = queryNorm
              0.6112779 = fieldWeight in 2391, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.890223 = idf(docFreq=907, maxDocs=44421)
                0.0625 = fieldNorm(doc=2391)
          0.03768571 = weight(abstract_txt:published in 2391) [ClassicSimilarity], result of:
            0.03768571 = score(doc=2391,freq=2.0), product of:
              0.0869111 = queryWeight, product of:
                1.0909672 = boost
                4.9057617 = idf(docFreq=893, maxDocs=44421)
                0.016238919 = queryNorm
              0.43361217 = fieldWeight in 2391, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.9057617 = idf(docFreq=893, maxDocs=44421)
                0.0625 = fieldNorm(doc=2391)
          0.032857053 = weight(abstract_txt:publications in 2391) [ClassicSimilarity], result of:
            0.032857053 = score(doc=2391,freq=1.0), product of:
              0.09993551 = queryWeight, product of:
                1.1698602 = boost
                5.260521 = idf(docFreq=626, maxDocs=44421)
                0.016238919 = queryNorm
              0.32878256 = fieldWeight in 2391, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.260521 = idf(docFreq=626, maxDocs=44421)
                0.0625 = fieldNorm(doc=2391)
          0.028899228 = weight(abstract_txt:literature in 2391) [ClassicSimilarity], result of:
            0.028899228 = score(doc=2391,freq=1.0), product of:
              0.10501597 = queryWeight, product of:
                1.4687482 = boost
                4.4030223 = idf(docFreq=1477, maxDocs=44421)
                0.016238919 = queryNorm
              0.2751889 = fieldWeight in 2391, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4030223 = idf(docFreq=1477, maxDocs=44421)
                0.0625 = fieldNorm(doc=2391)
          0.07432948 = weight(abstract_txt:manipulation in 2391) [ClassicSimilarity], result of:
            0.07432948 = score(doc=2391,freq=1.0), product of:
              0.17221652 = queryWeight, product of:
                1.535718 = boost
                6.905677 = idf(docFreq=120, maxDocs=44421)
                0.016238919 = queryNorm
              0.4316048 = fieldWeight in 2391, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.905677 = idf(docFreq=120, maxDocs=44421)
                0.0625 = fieldNorm(doc=2391)
          0.03365614 = weight(abstract_txt:scientific in 2391) [ClassicSimilarity], result of:
            0.03365614 = score(doc=2391,freq=1.0), product of:
              0.116244934 = queryWeight, product of:
                1.5452783 = boost
                4.6324444 = idf(docFreq=1174, maxDocs=44421)
                0.016238919 = queryNorm
              0.28952777 = fieldWeight in 2391, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6324444 = idf(docFreq=1174, maxDocs=44421)
                0.0625 = fieldNorm(doc=2391)
          0.07546065 = weight(abstract_txt:publishers in 2391) [ClassicSimilarity], result of:
            0.07546065 = score(doc=2391,freq=1.0), product of:
              0.19913375 = queryWeight, product of:
                2.0225167 = boost
                6.0631127 = idf(docFreq=280, maxDocs=44421)
                0.016238919 = queryNorm
              0.37894455 = fieldWeight in 2391, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0631127 = idf(docFreq=280, maxDocs=44421)
                0.0625 = fieldNorm(doc=2391)
        0.32 = coord(8/25)
    
  5. Shuai, X.; Rollins, J.; Moulinier, I.; Custis, T.; Edmunds, M.; Schilder, F.: ¬A multidimensional investigation of the effects of publication retraction on scholarly impact (2017) 0.13
    0.12681687 = sum of:
      0.12681687 = product of:
        0.79260546 = sum of:
          0.13701873 = weight(abstract_txt:retraction in 4798) [ClassicSimilarity], result of:
            0.13701873 = score(doc=4798,freq=2.0), product of:
              0.16310504 = queryWeight, product of:
                1.0567999 = boost
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.016238919 = queryNorm
              0.8400643 = fieldWeight in 4798, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.0625 = fieldNorm(doc=4798)
          0.24557263 = weight(abstract_txt:retractions in 4798) [ClassicSimilarity], result of:
            0.24557263 = score(doc=4798,freq=5.0), product of:
              0.17731851 = queryWeight, product of:
                1.1018846 = boost
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.016238919 = queryNorm
              1.3849238 = fieldWeight in 4798, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.0625 = fieldNorm(doc=4798)
          0.047596965 = weight(abstract_txt:scientific in 4798) [ClassicSimilarity], result of:
            0.047596965 = score(doc=4798,freq=2.0), product of:
              0.116244934 = queryWeight, product of:
                1.5452783 = boost
                4.6324444 = idf(docFreq=1174, maxDocs=44421)
                0.016238919 = queryNorm
              0.4094541 = fieldWeight in 4798, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.6324444 = idf(docFreq=1174, maxDocs=44421)
                0.0625 = fieldNorm(doc=4798)
          0.36241713 = weight(abstract_txt:papers in 4798) [ClassicSimilarity], result of:
            0.36241713 = score(doc=4798,freq=4.0), product of:
              0.5506481 = queryWeight, product of:
                6.440098 = boost
                5.2653174 = idf(docFreq=623, maxDocs=44421)
                0.016238919 = queryNorm
              0.6581647 = fieldWeight in 4798, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.2653174 = idf(docFreq=623, maxDocs=44421)
                0.0625 = fieldNorm(doc=4798)
        0.16 = coord(4/25)