Document (#38611)

Author
Ofek, N.
Rokach, L.
Title
¬A classifier to determine which Wikipedia biographies will be accepted
Source
Journal of the Association for Information Science and Technology. 66(2015) no.1, S.213-218
Year
2015
Series
Brief communication
Abstract
Wikipedia, like other encyclopedias, includes biographies of notable people. However, because it is jointly written by many contributors, it is subject to constant manipulation by contributors attempting to add biographies of non-notable people. Over time, Wikipedia has developed inclusion criteria for notable people (e.g., receiving a significant award) based on which newly contributed biographies are evaluated. In this paper we present and analyze a set of simple indicators that can be used to predict which article will eventually be accepted. These indicators do not refer to the content itself, but to meta-content features (such as the number of categories that the biography is associated with) and to author-based features (such as if it is a first-time author). By training a classifier on these features, we successfully reached a high predictive performance (area under the receiver operating characteristic [ROC] curve [AUC] of 0.97) even though we overlooked the actual biography text.
Content
Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23199/abstract.
Theme
Informationsmittel
Object
Wikipedia

Similar documents (author)

  1. Rokach, L.; Mitra, P.: Parsimonious citer-based measures : the artificial intelligence domain as a case study (2013) 4.81
    4.811013 = sum of:
      4.811013 = weight(author_txt:rokach in 1212) [ClassicSimilarity], result of:
        4.811013 = fieldWeight in 1212, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.622026 = idf(docFreq=7, maxDocs=44421)
          0.5 = fieldNorm(doc=1212)
    
  2. Blank, I.; Rokach, L.; Shani, G.: Leveraging metadata to recommend keywords for academic papers (2016) 3.61
    3.60826 = sum of:
      3.60826 = weight(author_txt:rokach in 4232) [ClassicSimilarity], result of:
        3.60826 = fieldWeight in 4232, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.622026 = idf(docFreq=7, maxDocs=44421)
          0.375 = fieldNorm(doc=4232)
    
  3. Greenstein-Messica, A.; Rokach, L.; Shabtai, A.: Personal-discount sensitivity prediction for mobile coupon conversion optimization (2017) 3.61
    3.60826 = sum of:
      3.60826 = weight(author_txt:rokach in 4751) [ClassicSimilarity], result of:
        3.60826 = fieldWeight in 4751, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.622026 = idf(docFreq=7, maxDocs=44421)
          0.375 = fieldNorm(doc=4751)
    
  4. Greenstein-Messica, A.; Rokach, L.; Shabtai, A.: Personal-discount sensitivity prediction for mobile coupon conversion optimization (2017) 3.61
    3.60826 = sum of:
      3.60826 = weight(author_txt:rokach in 4761) [ClassicSimilarity], result of:
        3.60826 = fieldWeight in 4761, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.622026 = idf(docFreq=7, maxDocs=44421)
          0.375 = fieldNorm(doc=4761)
    
  5. Rokach, L.; Kalech, M.; Blank, I.; Stern, R.: Who is going to win the next Association for the Advancement of Artificial Intelligence Fellowship Award? : evaluating researchers by mining bibliographic data (2011) 3.01
    3.0068831 = sum of:
      3.0068831 = weight(author_txt:rokach in 945) [ClassicSimilarity], result of:
        3.0068831 = fieldWeight in 945, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.622026 = idf(docFreq=7, maxDocs=44421)
          0.3125 = fieldNorm(doc=945)
    

Similar documents (content)

  1. Tyrwhitt-Drake, B.: ¬The DNB on CD-ROM (1996) 0.11
    0.10868852 = sum of:
      0.10868852 = product of:
        0.9057377 = sum of:
          0.03097365 = weight(abstract_txt:content in 6706) [ClassicSimilarity], result of:
            0.03097365 = score(doc=6706,freq=1.0), product of:
              0.06775459 = queryWeight, product of:
                1.1128095 = boost
                4.1796083 = idf(docFreq=1847, maxDocs=44421)
                0.014567407 = queryNorm
              0.45714468 = fieldWeight in 6706, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1796083 = idf(docFreq=1847, maxDocs=44421)
                0.109375 = fieldNorm(doc=6706)
          0.27080664 = weight(abstract_txt:biography in 6706) [ClassicSimilarity], result of:
            0.27080664 = score(doc=6706,freq=1.0), product of:
              0.28755218 = queryWeight, product of:
                2.2925026 = boost
                8.610425 = idf(docFreq=21, maxDocs=44421)
                0.014567407 = queryNorm
              0.94176525 = fieldWeight in 6706, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.610425 = idf(docFreq=21, maxDocs=44421)
                0.109375 = fieldNorm(doc=6706)
          0.6039574 = weight(abstract_txt:biographies in 6706) [ClassicSimilarity], result of:
            0.6039574 = score(doc=6706,freq=1.0), product of:
              0.6184311 = queryWeight, product of:
                4.75458 = boost
                8.928879 = idf(docFreq=15, maxDocs=44421)
                0.014567407 = queryNorm
              0.9765961 = fieldWeight in 6706, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.928879 = idf(docFreq=15, maxDocs=44421)
                0.109375 = fieldNorm(doc=6706)
        0.12 = coord(3/25)
    
  2. Veelen, I. van: ¬The truth according to Wikipedia (2008) 0.09
    0.09432554 = sum of:
      0.09432554 = product of:
        0.33687693 = sum of:
          0.010468708 = weight(abstract_txt:will in 3139) [ClassicSimilarity], result of:
            0.010468708 = score(doc=3139,freq=1.0), product of:
              0.057834953 = queryWeight, product of:
                1.0281268 = boost
                3.8615482 = idf(docFreq=2539, maxDocs=44421)
                0.014567407 = queryNorm
              0.18101007 = fieldWeight in 3139, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.8615482 = idf(docFreq=2539, maxDocs=44421)
                0.046875 = fieldNorm(doc=3139)
          0.012982149 = weight(abstract_txt:time in 3139) [ClassicSimilarity], result of:
            0.012982149 = score(doc=3139,freq=1.0), product of:
              0.06675637 = queryWeight, product of:
                1.1045817 = boost
                4.1487055 = idf(docFreq=1905, maxDocs=44421)
                0.014567407 = queryNorm
              0.19447057 = fieldWeight in 3139, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1487055 = idf(docFreq=1905, maxDocs=44421)
                0.046875 = fieldNorm(doc=3139)
          0.018772867 = weight(abstract_txt:content in 3139) [ClassicSimilarity], result of:
            0.018772867 = score(doc=3139,freq=2.0), product of:
              0.06775459 = queryWeight, product of:
                1.1128095 = boost
                4.1796083 = idf(docFreq=1847, maxDocs=44421)
                0.014567407 = queryNorm
              0.2770715 = fieldWeight in 3139, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1796083 = idf(docFreq=1847, maxDocs=44421)
                0.046875 = fieldNorm(doc=3139)
          0.006746436 = weight(abstract_txt:which in 3139) [ClassicSimilarity], result of:
            0.006746436 = score(doc=3139,freq=1.0), product of:
              0.049394093 = queryWeight, product of:
                1.1636828 = boost
                2.9137893 = idf(docFreq=6552, maxDocs=44421)
                0.014567407 = queryNorm
              0.13658386 = fieldWeight in 3139, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.9137893 = idf(docFreq=6552, maxDocs=44421)
                0.046875 = fieldNorm(doc=3139)
          0.03175578 = weight(abstract_txt:author in 3139) [ClassicSimilarity], result of:
            0.03175578 = score(doc=3139,freq=2.0), product of:
              0.096190795 = queryWeight, product of:
                1.3259228 = boost
                4.980042 = idf(docFreq=829, maxDocs=44421)
                0.014567407 = queryNorm
              0.33013326 = fieldWeight in 3139, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.980042 = idf(docFreq=829, maxDocs=44421)
                0.046875 = fieldNorm(doc=3139)
          0.05595884 = weight(abstract_txt:people in 3139) [ClassicSimilarity], result of:
            0.05595884 = score(doc=3139,freq=3.0), product of:
              0.1403344 = queryWeight, product of:
                1.9614588 = boost
                4.9113703 = idf(docFreq=888, maxDocs=44421)
                0.014567407 = queryNorm
              0.39875355 = fieldWeight in 3139, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.9113703 = idf(docFreq=888, maxDocs=44421)
                0.046875 = fieldNorm(doc=3139)
          0.20019214 = weight(abstract_txt:wikipedia in 3139) [ClassicSimilarity], result of:
            0.20019214 = score(doc=3139,freq=9.0), product of:
              0.22760192 = queryWeight, product of:
                2.4979577 = boost
                6.25473 = idf(docFreq=231, maxDocs=44421)
                0.014567407 = queryNorm
              0.87957144 = fieldWeight in 3139, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                6.25473 = idf(docFreq=231, maxDocs=44421)
                0.046875 = fieldNorm(doc=3139)
        0.28 = coord(7/25)
    
  3. Bounhas, I.; Elayeb, B.; Evrard, F.; Slimani, Y.: Toward a computer study of the reliability of Arabic stories (2010) 0.07
    0.07071996 = sum of:
      0.07071996 = product of:
        0.589333 = sum of:
          0.008995249 = weight(abstract_txt:which in 696) [ClassicSimilarity], result of:
            0.008995249 = score(doc=696,freq=1.0), product of:
              0.049394093 = queryWeight, product of:
                1.1636828 = boost
                2.9137893 = idf(docFreq=6552, maxDocs=44421)
                0.014567407 = queryNorm
              0.18211183 = fieldWeight in 696, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.9137893 = idf(docFreq=6552, maxDocs=44421)
                0.0625 = fieldNorm(doc=696)
          0.09226648 = weight(abstract_txt:classifier in 696) [ClassicSimilarity], result of:
            0.09226648 = score(doc=696,freq=1.0), product of:
              0.20370348 = queryWeight, product of:
                1.9295263 = boost
                7.2471204 = idf(docFreq=85, maxDocs=44421)
                0.014567407 = queryNorm
              0.45294502 = fieldWeight in 696, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2471204 = idf(docFreq=85, maxDocs=44421)
                0.0625 = fieldNorm(doc=696)
          0.48807126 = weight(abstract_txt:biographies in 696) [ClassicSimilarity], result of:
            0.48807126 = score(doc=696,freq=2.0), product of:
              0.6184311 = queryWeight, product of:
                4.75458 = boost
                8.928879 = idf(docFreq=15, maxDocs=44421)
                0.014567407 = queryNorm
              0.7892088 = fieldWeight in 696, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.928879 = idf(docFreq=15, maxDocs=44421)
                0.0625 = fieldNorm(doc=696)
        0.12 = coord(3/25)
    
  4. Fallis, D.: Toward an epistemology of Wikipedia (2008) 0.07
    0.06843401 = sum of:
      0.06843401 = product of:
        0.42771256 = sum of:
          0.012213494 = weight(abstract_txt:will in 3010) [ClassicSimilarity], result of:
            0.012213494 = score(doc=3010,freq=1.0), product of:
              0.057834953 = queryWeight, product of:
                1.0281268 = boost
                3.8615482 = idf(docFreq=2539, maxDocs=44421)
                0.014567407 = queryNorm
              0.21117842 = fieldWeight in 3010, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.8615482 = idf(docFreq=2539, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3010)
          0.050514936 = weight(abstract_txt:encyclopedias in 3010) [ClassicSimilarity], result of:
            0.050514936 = score(doc=3010,freq=1.0), product of:
              0.118277006 = queryWeight, product of:
                1.0396488 = boost
                7.809647 = idf(docFreq=48, maxDocs=44421)
                0.014567407 = queryNorm
              0.42709008 = fieldWeight in 3010, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.809647 = idf(docFreq=48, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3010)
          0.08428298 = weight(abstract_txt:people in 3010) [ClassicSimilarity], result of:
            0.08428298 = score(doc=3010,freq=5.0), product of:
              0.1403344 = queryWeight, product of:
                1.9614588 = boost
                4.9113703 = idf(docFreq=888, maxDocs=44421)
                0.014567407 = queryNorm
              0.6005868 = fieldWeight in 3010, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.9113703 = idf(docFreq=888, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3010)
          0.28070116 = weight(abstract_txt:wikipedia in 3010) [ClassicSimilarity], result of:
            0.28070116 = score(doc=3010,freq=13.0), product of:
              0.22760192 = queryWeight, product of:
                2.4979577 = boost
                6.25473 = idf(docFreq=231, maxDocs=44421)
                0.014567407 = queryNorm
              1.2332988 = fieldWeight in 3010, product of:
                3.6055512 = tf(freq=13.0), with freq of:
                  13.0 = termFreq=13.0
                6.25473 = idf(docFreq=231, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3010)
        0.16 = coord(4/25)
    
  5. Hyvönen, E.; Leskinen, P.; Tamper, M.; Keravuori, K.; Rantala, H.; Ikkala, E.; Tuominen, J.: BiographySampo - publishing and enriching biographies on the Semantic Web for digital humanities research (2019) 0.06
    0.064083986 = sum of:
      0.064083986 = product of:
        0.8010499 = sum of:
          0.05384642 = weight(abstract_txt:people in 799) [ClassicSimilarity], result of:
            0.05384642 = score(doc=799,freq=1.0), product of:
              0.1403344 = queryWeight, product of:
                1.9614588 = boost
                4.9113703 = idf(docFreq=888, maxDocs=44421)
                0.014567407 = queryNorm
              0.3837008 = fieldWeight in 799, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9113703 = idf(docFreq=888, maxDocs=44421)
                0.078125 = fieldNorm(doc=799)
          0.74720347 = weight(abstract_txt:biographies in 799) [ClassicSimilarity], result of:
            0.74720347 = score(doc=799,freq=3.0), product of:
              0.6184311 = queryWeight, product of:
                4.75458 = boost
                8.928879 = idf(docFreq=15, maxDocs=44421)
                0.014567407 = queryNorm
              1.2082243 = fieldWeight in 799, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.928879 = idf(docFreq=15, maxDocs=44421)
                0.078125 = fieldNorm(doc=799)
        0.08 = coord(2/25)