Document (#34742)

Author
Stamatatos, E.
Title
¬A survey of modern authorship attribution methods
Source
Journal of the American Society for Information Science and Technology. 60(2009) no.3, S.538-556
Year
2009
Abstract
Authorship attribution supported by statistical or computational methods has a long history starting from the 19th century and is marked by the seminal study of Mosteller and Wallace (1964) on the authorship of the disputed Federalist Papers. During the last decade, this scientific field has been developed substantially, taking advantage of research advances in areas such as machine learning, information retrieval, and natural language processing. The plethora of available electronic texts (e.g., e-mail messages, online forum messages, blogs, source code, etc.) indicates a wide variety of applications of this technology, provided it is able to handle short and noisy text from multiple candidate authors. In this article, a survey of recent advances of the automated approaches to attributing authorship is presented, examining their characteristics for both text representation and text classification. The focus of this survey is on computational requirements and settings rather than on linguistic or literary issues. We also discuss evaluation methodologies and criteria for authorship attribution studies and list open questions that will attract future work in this area.

Similar documents (author)

  1. Stamatatos, E.: Author identification : using text sampling to handle the class imbalance problem (2008) 6.19
    6.1935673 = sum of:
      6.1935673 = weight(author_txt:stamatatos in 3063) [ClassicSimilarity], result of:
        6.1935673 = fieldWeight in 3063, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.909708 = idf(docFreq=5, maxDocs=44421)
          0.625 = fieldNorm(doc=3063)
    
  2. Stamatatos, E.: Plagiarism detection using stopword n-grams (2011) 6.19
    6.1935673 = sum of:
      6.1935673 = weight(author_txt:stamatatos in 955) [ClassicSimilarity], result of:
        6.1935673 = fieldWeight in 955, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.909708 = idf(docFreq=5, maxDocs=44421)
          0.625 = fieldNorm(doc=955)
    
  3. Stamatatos, E.: Masking topic-related information to enhance authorship attribution (2018) 6.19
    6.1935673 = sum of:
      6.1935673 = weight(author_txt:stamatatos in 124) [ClassicSimilarity], result of:
        6.1935673 = fieldWeight in 124, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.909708 = idf(docFreq=5, maxDocs=44421)
          0.625 = fieldNorm(doc=124)
    
  4. Potha, N.; Stamatatos, E.: Improving author verification based on topic modeling (2019) 4.95
    4.954854 = sum of:
      4.954854 = weight(author_txt:stamatatos in 385) [ClassicSimilarity], result of:
        4.954854 = fieldWeight in 385, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.909708 = idf(docFreq=5, maxDocs=44421)
          0.5 = fieldNorm(doc=385)
    

Similar documents (content)

  1. Koppel, M.; Schler, J.; Argamon, S.: Computational methods in authorship attribution (2009) 0.26
    0.2575916 = sum of:
      0.2575916 = product of:
        0.91997004 = sum of:
          0.04370864 = weight(abstract_txt:handle in 3683) [ClassicSimilarity], result of:
            0.04370864 = score(doc=3683,freq=1.0), product of:
              0.10356413 = queryWeight, product of:
                1.0122436 = boost
                6.7527075 = idf(docFreq=140, maxDocs=44421)
                0.015151177 = queryNorm
              0.42204422 = fieldWeight in 3683, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7527075 = idf(docFreq=140, maxDocs=44421)
                0.0625 = fieldNorm(doc=3683)
          0.09641128 = weight(abstract_txt:candidate in 3683) [ClassicSimilarity], result of:
            0.09641128 = score(doc=3683,freq=3.0), product of:
              0.121677235 = queryWeight, product of:
                1.0971981 = boost
                7.319441 = idf(docFreq=79, maxDocs=44421)
                0.015151177 = queryNorm
              0.7923527 = fieldWeight in 3683, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.319441 = idf(docFreq=79, maxDocs=44421)
                0.0625 = fieldNorm(doc=3683)
          0.028560903 = weight(abstract_txt:methods in 3683) [ClassicSimilarity], result of:
            0.028560903 = score(doc=3683,freq=2.0), product of:
              0.07798525 = queryWeight, product of:
                1.2422289 = boost
                4.1434727 = idf(docFreq=1915, maxDocs=44421)
                0.015151177 = queryNorm
              0.3662347 = fieldWeight in 3683, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1434727 = idf(docFreq=1915, maxDocs=44421)
                0.0625 = fieldNorm(doc=3683)
          0.019776411 = weight(abstract_txt:this in 3683) [ClassicSimilarity], result of:
            0.019776411 = score(doc=3683,freq=4.0), product of:
              0.06575056 = queryWeight, product of:
                1.8034958 = boost
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.015151177 = queryNorm
              0.30077934 = fieldWeight in 3683, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.0625 = fieldNorm(doc=3683)
          0.02809851 = weight(abstract_txt:text in 3683) [ClassicSimilarity], result of:
            0.02809851 = score(doc=3683,freq=1.0), product of:
              0.111256935 = queryWeight, product of:
                1.8172077 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.015151177 = queryNorm
              0.25255513 = fieldWeight in 3683, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0625 = fieldNorm(doc=3683)
          0.36294395 = weight(abstract_txt:attribution in 3683) [ClassicSimilarity], result of:
            0.36294395 = score(doc=3683,freq=3.0), product of:
              0.42467582 = queryWeight, product of:
                3.5503385 = boost
                7.894805 = idf(docFreq=44, maxDocs=44421)
                0.015151177 = queryNorm
              0.8546377 = fieldWeight in 3683, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.894805 = idf(docFreq=44, maxDocs=44421)
                0.0625 = fieldNorm(doc=3683)
          0.34047028 = weight(abstract_txt:authorship in 3683) [ClassicSimilarity], result of:
            0.34047028 = score(doc=3683,freq=2.0), product of:
              0.5523283 = queryWeight, product of:
                5.227139 = boost
                6.9740796 = idf(docFreq=112, maxDocs=44421)
                0.015151177 = queryNorm
              0.61642736 = fieldWeight in 3683, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.9740796 = idf(docFreq=112, maxDocs=44421)
                0.0625 = fieldNorm(doc=3683)
        0.28 = coord(7/25)
    
  2. Savoy, J.: Estimating the probability of an authorship attribution (2016) 0.24
    0.23505995 = sum of:
      0.23505995 = product of:
        0.9794165 = sum of:
          0.020195609 = weight(abstract_txt:methods in 3937) [ClassicSimilarity], result of:
            0.020195609 = score(doc=3937,freq=1.0), product of:
              0.07798525 = queryWeight, product of:
                1.2422289 = boost
                4.1434727 = idf(docFreq=1915, maxDocs=44421)
                0.015151177 = queryNorm
              0.25896704 = fieldWeight in 3937, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1434727 = idf(docFreq=1915, maxDocs=44421)
                0.0625 = fieldNorm(doc=3937)
          0.12186699 = weight(abstract_txt:disputed in 3937) [ClassicSimilarity], result of:
            0.12186699 = score(doc=3937,freq=1.0), product of:
              0.20515804 = queryWeight, product of:
                1.4247041 = boost
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.015151177 = queryNorm
              0.5940152 = fieldWeight in 3937, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.0625 = fieldNorm(doc=3937)
          0.019776411 = weight(abstract_txt:this in 3937) [ClassicSimilarity], result of:
            0.019776411 = score(doc=3937,freq=4.0), product of:
              0.06575056 = queryWeight, product of:
                1.8034958 = boost
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.015151177 = queryNorm
              0.30077934 = fieldWeight in 3937, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.0625 = fieldNorm(doc=3937)
          0.03973729 = weight(abstract_txt:text in 3937) [ClassicSimilarity], result of:
            0.03973729 = score(doc=3937,freq=2.0), product of:
              0.111256935 = queryWeight, product of:
                1.8172077 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.015151177 = queryNorm
              0.3571669 = fieldWeight in 3937, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0625 = fieldNorm(doc=3937)
          0.2963425 = weight(abstract_txt:attribution in 3937) [ClassicSimilarity], result of:
            0.2963425 = score(doc=3937,freq=2.0), product of:
              0.42467582 = queryWeight, product of:
                3.5503385 = boost
                7.894805 = idf(docFreq=44, maxDocs=44421)
                0.015151177 = queryNorm
              0.69780874 = fieldWeight in 3937, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.894805 = idf(docFreq=44, maxDocs=44421)
                0.0625 = fieldNorm(doc=3937)
          0.48149768 = weight(abstract_txt:authorship in 3937) [ClassicSimilarity], result of:
            0.48149768 = score(doc=3937,freq=4.0), product of:
              0.5523283 = queryWeight, product of:
                5.227139 = boost
                6.9740796 = idf(docFreq=112, maxDocs=44421)
                0.015151177 = queryNorm
              0.87175995 = fieldWeight in 3937, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.9740796 = idf(docFreq=112, maxDocs=44421)
                0.0625 = fieldNorm(doc=3937)
        0.24 = coord(6/25)
    
  3. Stover, J.A.; Winter, Y.; Koppel, M.; Kestemont, M.: Computational authorship verification method attributes a new work to a major 2nd century African author (2016) 0.23
    0.22529662 = sum of:
      0.22529662 = product of:
        0.80463076 = sum of:
          0.055663083 = weight(abstract_txt:candidate in 3503) [ClassicSimilarity], result of:
            0.055663083 = score(doc=3503,freq=1.0), product of:
              0.121677235 = queryWeight, product of:
                1.0971981 = boost
                7.319441 = idf(docFreq=79, maxDocs=44421)
                0.015151177 = queryNorm
              0.45746505 = fieldWeight in 3503, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.319441 = idf(docFreq=79, maxDocs=44421)
                0.0625 = fieldNorm(doc=3503)
          0.020195609 = weight(abstract_txt:methods in 3503) [ClassicSimilarity], result of:
            0.020195609 = score(doc=3503,freq=1.0), product of:
              0.07798525 = queryWeight, product of:
                1.2422289 = boost
                4.1434727 = idf(docFreq=1915, maxDocs=44421)
                0.015151177 = queryNorm
              0.25896704 = fieldWeight in 3503, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1434727 = idf(docFreq=1915, maxDocs=44421)
                0.0625 = fieldNorm(doc=3503)
          0.019776411 = weight(abstract_txt:this in 3503) [ClassicSimilarity], result of:
            0.019776411 = score(doc=3503,freq=4.0), product of:
              0.06575056 = queryWeight, product of:
                1.8034958 = boost
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.015151177 = queryNorm
              0.30077934 = fieldWeight in 3503, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.0625 = fieldNorm(doc=3503)
          0.05619702 = weight(abstract_txt:text in 3503) [ClassicSimilarity], result of:
            0.05619702 = score(doc=3503,freq=4.0), product of:
              0.111256935 = queryWeight, product of:
                1.8172077 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.015151177 = queryNorm
              0.50511026 = fieldWeight in 3503, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0625 = fieldNorm(doc=3503)
          0.102782585 = weight(abstract_txt:computational in 3503) [ClassicSimilarity], result of:
            0.102782585 = score(doc=3503,freq=2.0), product of:
              0.1831376 = queryWeight, product of:
                1.903637 = boost
                6.3496094 = idf(docFreq=210, maxDocs=44421)
                0.015151177 = queryNorm
              0.5612315 = fieldWeight in 3503, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.3496094 = idf(docFreq=210, maxDocs=44421)
                0.0625 = fieldNorm(doc=3503)
          0.2095458 = weight(abstract_txt:attribution in 3503) [ClassicSimilarity], result of:
            0.2095458 = score(doc=3503,freq=1.0), product of:
              0.42467582 = queryWeight, product of:
                3.5503385 = boost
                7.894805 = idf(docFreq=44, maxDocs=44421)
                0.015151177 = queryNorm
              0.4934253 = fieldWeight in 3503, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.894805 = idf(docFreq=44, maxDocs=44421)
                0.0625 = fieldNorm(doc=3503)
          0.34047028 = weight(abstract_txt:authorship in 3503) [ClassicSimilarity], result of:
            0.34047028 = score(doc=3503,freq=2.0), product of:
              0.5523283 = queryWeight, product of:
                5.227139 = boost
                6.9740796 = idf(docFreq=112, maxDocs=44421)
                0.015151177 = queryNorm
              0.61642736 = fieldWeight in 3503, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.9740796 = idf(docFreq=112, maxDocs=44421)
                0.0625 = fieldNorm(doc=3503)
        0.28 = coord(7/25)
    
  4. Sebastiani, F.: Classification of text, automatic (2006) 0.22
    0.22482637 = sum of:
      0.22482637 = product of:
        0.9367765 = sum of:
          0.014832308 = weight(abstract_txt:this in 3) [ClassicSimilarity], result of:
            0.014832308 = score(doc=3,freq=1.0), product of:
              0.06575056 = queryWeight, product of:
                1.8034958 = boost
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.015151177 = queryNorm
              0.2255845 = fieldWeight in 3, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.09375 = fieldNorm(doc=3)
          0.05960594 = weight(abstract_txt:text in 3) [ClassicSimilarity], result of:
            0.05960594 = score(doc=3,freq=2.0), product of:
              0.111256935 = queryWeight, product of:
                1.8172077 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.015151177 = queryNorm
              0.5357503 = fieldWeight in 3, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.09375 = fieldNorm(doc=3)
          0.109017394 = weight(abstract_txt:computational in 3) [ClassicSimilarity], result of:
            0.109017394 = score(doc=3,freq=1.0), product of:
              0.1831376 = queryWeight, product of:
                1.903637 = boost
                6.3496094 = idf(docFreq=210, maxDocs=44421)
                0.015151177 = queryNorm
              0.5952759 = fieldWeight in 3, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3496094 = idf(docFreq=210, maxDocs=44421)
                0.09375 = fieldNorm(doc=3)
          0.077878915 = weight(abstract_txt:survey in 3) [ClassicSimilarity], result of:
            0.077878915 = score(doc=3,freq=1.0), product of:
              0.16752926 = queryWeight, product of:
                2.229905 = boost
                4.958587 = idf(docFreq=847, maxDocs=44421)
                0.015151177 = queryNorm
              0.46486753 = fieldWeight in 3, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.958587 = idf(docFreq=847, maxDocs=44421)
                0.09375 = fieldNorm(doc=3)
          0.3143187 = weight(abstract_txt:attribution in 3) [ClassicSimilarity], result of:
            0.3143187 = score(doc=3,freq=1.0), product of:
              0.42467582 = queryWeight, product of:
                3.5503385 = boost
                7.894805 = idf(docFreq=44, maxDocs=44421)
                0.015151177 = queryNorm
              0.74013793 = fieldWeight in 3, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.894805 = idf(docFreq=44, maxDocs=44421)
                0.09375 = fieldNorm(doc=3)
          0.36112326 = weight(abstract_txt:authorship in 3) [ClassicSimilarity], result of:
            0.36112326 = score(doc=3,freq=1.0), product of:
              0.5523283 = queryWeight, product of:
                5.227139 = boost
                6.9740796 = idf(docFreq=112, maxDocs=44421)
                0.015151177 = queryNorm
              0.65382 = fieldWeight in 3, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9740796 = idf(docFreq=112, maxDocs=44421)
                0.09375 = fieldNorm(doc=3)
        0.24 = coord(6/25)
    
  5. Stamatatos, E.: Masking topic-related information to enhance authorship attribution (2018) 0.20
    0.20487642 = sum of:
      0.20487642 = product of:
        1.0243821 = sum of:
          0.040391218 = weight(abstract_txt:methods in 124) [ClassicSimilarity], result of:
            0.040391218 = score(doc=124,freq=4.0), product of:
              0.07798525 = queryWeight, product of:
                1.2422289 = boost
                4.1434727 = idf(docFreq=1915, maxDocs=44421)
                0.015151177 = queryNorm
              0.5179341 = fieldWeight in 124, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.1434727 = idf(docFreq=1915, maxDocs=44421)
                0.0625 = fieldNorm(doc=124)
          0.013984034 = weight(abstract_txt:this in 124) [ClassicSimilarity], result of:
            0.013984034 = score(doc=124,freq=2.0), product of:
              0.06575056 = queryWeight, product of:
                1.8034958 = boost
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.015151177 = queryNorm
              0.21268311 = fieldWeight in 124, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.0625 = fieldNorm(doc=124)
          0.03973729 = weight(abstract_txt:text in 124) [ClassicSimilarity], result of:
            0.03973729 = score(doc=124,freq=2.0), product of:
              0.111256935 = queryWeight, product of:
                1.8172077 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.015151177 = queryNorm
              0.3571669 = fieldWeight in 124, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0625 = fieldNorm(doc=124)
          0.51328033 = weight(abstract_txt:attribution in 124) [ClassicSimilarity], result of:
            0.51328033 = score(doc=124,freq=6.0), product of:
              0.42467582 = queryWeight, product of:
                3.5503385 = boost
                7.894805 = idf(docFreq=44, maxDocs=44421)
                0.015151177 = queryNorm
              1.2086403 = fieldWeight in 124, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.894805 = idf(docFreq=44, maxDocs=44421)
                0.0625 = fieldNorm(doc=124)
          0.4169892 = weight(abstract_txt:authorship in 124) [ClassicSimilarity], result of:
            0.4169892 = score(doc=124,freq=3.0), product of:
              0.5523283 = queryWeight, product of:
                5.227139 = boost
                6.9740796 = idf(docFreq=112, maxDocs=44421)
                0.015151177 = queryNorm
              0.75496626 = fieldWeight in 124, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.9740796 = idf(docFreq=112, maxDocs=44421)
                0.0625 = fieldNorm(doc=124)
        0.2 = coord(5/25)