Document (#26341)

Author
Lee, L.-H.
Luh, C.-J.
Title
Generation of pornographic blacklist and its incremental update using an inverse chi-square based method
Source
Information processing and management. 44(2008) no.5, S.1698-1706
Year
2008
Abstract
This study presented an inverse chi-square based web content classification system that works along with an incremental update mechanism for incremental generation of pornographic blacklist. The proposed system, as indicated from the experimental results, can classify bilingual (English and Chinese) web pages at an average precision rate of 97.11%; while maintaining a favorably low false positive rate. Such satisfactory performance was obtained under a cost-effective parameter configuration used in inverse chi-square calculations. The proposed incremental update mechanism operates on the linking structure of pornographic hubs to locate newly added pornographic sites. The resulting blacklist has been empirically verified to be comparatively responsive to the growth dynamics of pornography sites than three public domain blacklists.

Similar documents (content)

  1. Lee, L.-H.; Chen, H.-H.: Mining search intents for collaborative cyberporn filtering (2012) 0.32
    0.31615466 = sum of:
      0.31615466 = product of:
        1.1291238 = sum of:
          0.04100252 = weight(abstract_txt:false in 988) [ClassicSimilarity], result of:
            0.04100252 = score(doc=988,freq=1.0), product of:
              0.08624027 = queryWeight, product of:
                1.0865786 = boost
                7.607123 = idf(docFreq=59, maxDocs=44421)
                0.010433464 = queryNorm
              0.47544518 = fieldWeight in 988, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.607123 = idf(docFreq=59, maxDocs=44421)
                0.0625 = fieldNorm(doc=988)
          0.063714616 = weight(abstract_txt:favorably in 988) [ClassicSimilarity], result of:
            0.063714616 = score(doc=988,freq=1.0), product of:
              0.115698874 = queryWeight, product of:
                1.2585505 = boost
                8.811096 = idf(docFreq=17, maxDocs=44421)
                0.010433464 = queryNorm
              0.5506935 = fieldWeight in 988, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.811096 = idf(docFreq=17, maxDocs=44421)
                0.0625 = fieldNorm(doc=988)
          0.025778022 = weight(abstract_txt:proposed in 988) [ClassicSimilarity], result of:
            0.025778022 = score(doc=988,freq=2.0), product of:
              0.06329015 = queryWeight, product of:
                1.3164039 = boost
                4.608063 = idf(docFreq=1203, maxDocs=44421)
                0.010433464 = queryNorm
              0.4072991 = fieldWeight in 988, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.608063 = idf(docFreq=1203, maxDocs=44421)
                0.0625 = fieldNorm(doc=988)
          0.060551208 = weight(abstract_txt:rate in 988) [ClassicSimilarity], result of:
            0.060551208 = score(doc=988,freq=2.0), product of:
              0.11183685 = queryWeight, product of:
                1.7499013 = boost
                6.1255183 = idf(docFreq=263, maxDocs=44421)
                0.010433464 = queryNorm
              0.54142445 = fieldWeight in 988, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.1255183 = idf(docFreq=263, maxDocs=44421)
                0.0625 = fieldNorm(doc=988)
          0.12921345 = weight(abstract_txt:update in 988) [ClassicSimilarity], result of:
            0.12921345 = score(doc=988,freq=2.0), product of:
              0.21219671 = queryWeight, product of:
                2.952134 = boost
                6.889283 = idf(docFreq=122, maxDocs=44421)
                0.010433464 = queryNorm
              0.6089324 = fieldWeight in 988, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.889283 = idf(docFreq=122, maxDocs=44421)
                0.0625 = fieldNorm(doc=988)
          0.38456354 = weight(abstract_txt:blacklist in 988) [ClassicSimilarity], result of:
            0.38456354 = score(doc=988,freq=2.0), product of:
              0.43904823 = queryWeight, product of:
                4.246419 = boost
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.010433464 = queryNorm
              0.8759027 = fieldWeight in 988, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.0625 = fieldNorm(doc=988)
          0.42430046 = weight(abstract_txt:pornographic in 988) [ClassicSimilarity], result of:
            0.42430046 = score(doc=988,freq=2.0), product of:
              0.5159751 = queryWeight, product of:
                5.3155775 = boost
                9.303573 = idf(docFreq=10, maxDocs=44421)
                0.010433464 = queryNorm
              0.8223274 = fieldWeight in 988, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.303573 = idf(docFreq=10, maxDocs=44421)
                0.0625 = fieldNorm(doc=988)
        0.28 = coord(7/25)
    
  2. Keilty, P.: Carnal Indexing (2017) 0.09
    0.092826635 = sum of:
      0.092826635 = product of:
        0.58016646 = sum of:
          0.04100252 = weight(abstract_txt:false in 4841) [ClassicSimilarity], result of:
            0.04100252 = score(doc=4841,freq=1.0), product of:
              0.08624027 = queryWeight, product of:
                1.0865786 = boost
                7.607123 = idf(docFreq=59, maxDocs=44421)
                0.010433464 = queryNorm
              0.47544518 = fieldWeight in 4841, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.607123 = idf(docFreq=59, maxDocs=44421)
                0.0625 = fieldNorm(doc=4841)
          0.0854591 = weight(abstract_txt:pornography in 4841) [ClassicSimilarity], result of:
            0.0854591 = score(doc=4841,freq=2.0), product of:
              0.11168597 = queryWeight, product of:
                1.2365321 = boost
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.010433464 = queryNorm
              0.7651731 = fieldWeight in 4841, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.0625 = fieldNorm(doc=4841)
          0.029404357 = weight(abstract_txt:sites in 4841) [ClassicSimilarity], result of:
            0.029404357 = score(doc=4841,freq=1.0), product of:
              0.08705376 = queryWeight, product of:
                1.5438848 = boost
                5.4043584 = idf(docFreq=542, maxDocs=44421)
                0.010433464 = queryNorm
              0.3377724 = fieldWeight in 4841, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4043584 = idf(docFreq=542, maxDocs=44421)
                0.0625 = fieldNorm(doc=4841)
          0.42430046 = weight(abstract_txt:pornographic in 4841) [ClassicSimilarity], result of:
            0.42430046 = score(doc=4841,freq=2.0), product of:
              0.5159751 = queryWeight, product of:
                5.3155775 = boost
                9.303573 = idf(docFreq=10, maxDocs=44421)
                0.010433464 = queryNorm
              0.8223274 = fieldWeight in 4841, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.303573 = idf(docFreq=10, maxDocs=44421)
                0.0625 = fieldNorm(doc=4841)
        0.16 = coord(4/25)
    
  3. Shieh, W.-Y.; Chung, C.-P.: ¬A statistics-based approach to incrementally update inverted files (2005) 0.07
    0.06811803 = sum of:
      0.06811803 = product of:
        0.4257377 = sum of:
          0.022784766 = weight(abstract_txt:proposed in 2010) [ClassicSimilarity], result of:
            0.022784766 = score(doc=2010,freq=1.0), product of:
              0.06329015 = queryWeight, product of:
                1.3164039 = boost
                4.608063 = idf(docFreq=1203, maxDocs=44421)
                0.010433464 = queryNorm
              0.36000493 = fieldWeight in 2010, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.608063 = idf(docFreq=1203, maxDocs=44421)
                0.078125 = fieldNorm(doc=2010)
          0.05352021 = weight(abstract_txt:rate in 2010) [ClassicSimilarity], result of:
            0.05352021 = score(doc=2010,freq=1.0), product of:
              0.11183685 = queryWeight, product of:
                1.7499013 = boost
                6.1255183 = idf(docFreq=263, maxDocs=44421)
                0.010433464 = queryNorm
              0.47855613 = fieldWeight in 2010, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1255183 = idf(docFreq=263, maxDocs=44421)
                0.078125 = fieldNorm(doc=2010)
          0.11420962 = weight(abstract_txt:update in 2010) [ClassicSimilarity], result of:
            0.11420962 = score(doc=2010,freq=1.0), product of:
              0.21219671 = queryWeight, product of:
                2.952134 = boost
                6.889283 = idf(docFreq=122, maxDocs=44421)
                0.010433464 = queryNorm
              0.53822523 = fieldWeight in 2010, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.889283 = idf(docFreq=122, maxDocs=44421)
                0.078125 = fieldNorm(doc=2010)
          0.23522311 = weight(abstract_txt:incremental in 2010) [ClassicSimilarity], result of:
            0.23522311 = score(doc=2010,freq=1.0), product of:
              0.37806785 = queryWeight, product of:
                4.5501003 = boost
                7.963798 = idf(docFreq=41, maxDocs=44421)
                0.010433464 = queryNorm
              0.6221717 = fieldWeight in 2010, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.963798 = idf(docFreq=41, maxDocs=44421)
                0.078125 = fieldNorm(doc=2010)
        0.16 = coord(4/25)
    
  4. Lee, L.-H.; Juan, Y.-C.; Tseng, W.-L.; Chen, H.-H.; Tseng, Y.-H.: Mining browsing behaviors for objectionable content filtering (2015) 0.07
    0.06658798 = sum of:
      0.06658798 = product of:
        0.4161749 = sum of:
          0.04100252 = weight(abstract_txt:false in 2818) [ClassicSimilarity], result of:
            0.04100252 = score(doc=2818,freq=1.0), product of:
              0.08624027 = queryWeight, product of:
                1.0865786 = boost
                7.607123 = idf(docFreq=59, maxDocs=44421)
                0.010433464 = queryNorm
              0.47544518 = fieldWeight in 2818, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.607123 = idf(docFreq=59, maxDocs=44421)
                0.0625 = fieldNorm(doc=2818)
          0.06042871 = weight(abstract_txt:pornography in 2818) [ClassicSimilarity], result of:
            0.06042871 = score(doc=2818,freq=1.0), product of:
              0.11168597 = queryWeight, product of:
                1.2365321 = boost
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.010433464 = queryNorm
              0.5410591 = fieldWeight in 2818, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.0625 = fieldNorm(doc=2818)
          0.042816166 = weight(abstract_txt:rate in 2818) [ClassicSimilarity], result of:
            0.042816166 = score(doc=2818,freq=1.0), product of:
              0.11183685 = queryWeight, product of:
                1.7499013 = boost
                6.1255183 = idf(docFreq=263, maxDocs=44421)
                0.010433464 = queryNorm
              0.3828449 = fieldWeight in 2818, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1255183 = idf(docFreq=263, maxDocs=44421)
                0.0625 = fieldNorm(doc=2818)
          0.27192748 = weight(abstract_txt:blacklist in 2818) [ClassicSimilarity], result of:
            0.27192748 = score(doc=2818,freq=1.0), product of:
              0.43904823 = queryWeight, product of:
                4.246419 = boost
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.010433464 = queryNorm
              0.61935675 = fieldWeight in 2818, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.0625 = fieldNorm(doc=2818)
        0.16 = coord(4/25)
    
  5. Fu, T.; Abbasi, A.; Chen, H.: ¬A focused crawler for Dark Web forums (2010) 0.07
    0.06629113 = sum of:
      0.06629113 = product of:
        0.5524261 = sum of:
          0.046855655 = weight(abstract_txt:mechanism in 458) [ClassicSimilarity], result of:
            0.046855655 = score(doc=458,freq=1.0), product of:
              0.1187648 = queryWeight, product of:
                1.8032875 = boost
                6.312396 = idf(docFreq=218, maxDocs=44421)
                0.010433464 = queryNorm
              0.39452475 = fieldWeight in 458, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.312396 = idf(docFreq=218, maxDocs=44421)
                0.0625 = fieldNorm(doc=458)
          0.12921345 = weight(abstract_txt:update in 458) [ClassicSimilarity], result of:
            0.12921345 = score(doc=458,freq=2.0), product of:
              0.21219671 = queryWeight, product of:
                2.952134 = boost
                6.889283 = idf(docFreq=122, maxDocs=44421)
                0.010433464 = queryNorm
              0.6089324 = fieldWeight in 458, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.889283 = idf(docFreq=122, maxDocs=44421)
                0.0625 = fieldNorm(doc=458)
          0.376357 = weight(abstract_txt:incremental in 458) [ClassicSimilarity], result of:
            0.376357 = score(doc=458,freq=4.0), product of:
              0.37806785 = queryWeight, product of:
                4.5501003 = boost
                7.963798 = idf(docFreq=41, maxDocs=44421)
                0.010433464 = queryNorm
              0.99547476 = fieldWeight in 458, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.963798 = idf(docFreq=41, maxDocs=44421)
                0.0625 = fieldNorm(doc=458)
        0.12 = coord(3/25)