Document (#26341)

Author
Lee, L.-H.
Luh, C.-J.
Title
Generation of pornographic blacklist and its incremental update using an inverse chi-square based method
Source
Information processing and management. 44(2008) no.5, S.1698-1706
Year
2008
Abstract
This study presented an inverse chi-square based web content classification system that works along with an incremental update mechanism for incremental generation of pornographic blacklist. The proposed system, as indicated from the experimental results, can classify bilingual (English and Chinese) web pages at an average precision rate of 97.11%; while maintaining a favorably low false positive rate. Such satisfactory performance was obtained under a cost-effective parameter configuration used in inverse chi-square calculations. The proposed incremental update mechanism operates on the linking structure of pornographic hubs to locate newly added pornographic sites. The resulting blacklist has been empirically verified to be comparatively responsive to the growth dynamics of pornography sites than three public domain blacklists.

Similar documents (content)

  1. Lee, L.-H.; Chen, H.-H.: Mining search intents for collaborative cyberporn filtering (2012) 0.32
    0.31615782 = sum of:
      0.31615782 = product of:
        1.129135 = sum of:
          0.04150033 = weight(abstract_txt:false in 4988) [ClassicSimilarity], result of:
            0.04150033 = score(doc=4988,freq=1.0), product of:
              0.08695216 = queryWeight, product of:
                1.0914809 = boost
                7.636444 = idf(docFreq=57, maxDocs=44218)
                0.010432132 = queryNorm
              0.47727776 = fieldWeight in 4988, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.636444 = idf(docFreq=57, maxDocs=44218)
                0.0625 = fieldNorm(doc=4988)
          0.06364883 = weight(abstract_txt:favorably in 4988) [ClassicSimilarity], result of:
            0.06364883 = score(doc=4988,freq=1.0), product of:
              0.115639515 = queryWeight, product of:
                1.2587198 = boost
                8.806516 = idf(docFreq=17, maxDocs=44218)
                0.010432132 = queryNorm
              0.55040723 = fieldWeight in 4988, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.806516 = idf(docFreq=17, maxDocs=44218)
                0.0625 = fieldNorm(doc=4988)
          0.025812602 = weight(abstract_txt:proposed in 4988) [ClassicSimilarity], result of:
            0.025812602 = score(doc=4988,freq=2.0), product of:
              0.06335786 = queryWeight, product of:
                1.317623 = boost
                4.6093135 = idf(docFreq=1196, maxDocs=44218)
                0.010432132 = queryNorm
              0.4074096 = fieldWeight in 4988, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.6093135 = idf(docFreq=1196, maxDocs=44218)
                0.0625 = fieldNorm(doc=4988)
          0.06055981 = weight(abstract_txt:rate in 4988) [ClassicSimilarity], result of:
            0.06055981 = score(doc=4988,freq=2.0), product of:
              0.111867085 = queryWeight, product of:
                1.7508224 = boost
                6.124733 = idf(docFreq=262, maxDocs=44218)
                0.010432132 = queryNorm
              0.541355 = fieldWeight in 4988, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.124733 = idf(docFreq=262, maxDocs=44218)
                0.0625 = fieldNorm(doc=4988)
          0.12948333 = weight(abstract_txt:update in 4988) [ClassicSimilarity], result of:
            0.12948333 = score(doc=4988,freq=2.0), product of:
              0.21252938 = queryWeight, product of:
                2.9556026 = boost
                6.892866 = idf(docFreq=121, maxDocs=44218)
                0.010432132 = queryNorm
              0.60924906 = fieldWeight in 4988, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.892866 = idf(docFreq=121, maxDocs=44218)
                0.0625 = fieldNorm(doc=4988)
          0.3842328 = weight(abstract_txt:blacklist in 4988) [ClassicSimilarity], result of:
            0.3842328 = score(doc=4988,freq=2.0), product of:
              0.4388735 = queryWeight, product of:
                4.2472343 = boost
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.010432132 = queryNorm
              0.8754978 = fieldWeight in 4988, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.0625 = fieldNorm(doc=4988)
          0.4238973 = weight(abstract_txt:pornographic in 4988) [ClassicSimilarity], result of:
            0.4238973 = score(doc=4988,freq=2.0), product of:
              0.5157387 = queryWeight, product of:
                5.3164387 = boost
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.010432132 = queryNorm
              0.82192254 = fieldWeight in 4988, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.0625 = fieldNorm(doc=4988)
        0.28 = coord(7/25)
    
  2. Keilty, P.: Carnal Indexing (2017) 0.09
    0.09281779 = sum of:
      0.09281779 = product of:
        0.5801112 = sum of:
          0.04150033 = weight(abstract_txt:false in 3841) [ClassicSimilarity], result of:
            0.04150033 = score(doc=3841,freq=1.0), product of:
              0.08695216 = queryWeight, product of:
                1.0914809 = boost
                7.636444 = idf(docFreq=57, maxDocs=44218)
                0.010432132 = queryNorm
              0.47727776 = fieldWeight in 3841, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.636444 = idf(docFreq=57, maxDocs=44218)
                0.0625 = fieldNorm(doc=3841)
          0.085368454 = weight(abstract_txt:pornography in 3841) [ClassicSimilarity], result of:
            0.085368454 = score(doc=3841,freq=2.0), product of:
              0.11162658 = queryWeight, product of:
                1.2366868 = boost
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.010432132 = queryNorm
              0.7647682 = fieldWeight in 3841, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.0625 = fieldNorm(doc=3841)
          0.029345108 = weight(abstract_txt:sites in 3841) [ClassicSimilarity], result of:
            0.029345108 = score(doc=3841,freq=1.0), product of:
              0.086952046 = queryWeight, product of:
                1.543586 = boost
                5.399778 = idf(docFreq=542, maxDocs=44218)
                0.010432132 = queryNorm
              0.33748612 = fieldWeight in 3841, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.399778 = idf(docFreq=542, maxDocs=44218)
                0.0625 = fieldNorm(doc=3841)
          0.4238973 = weight(abstract_txt:pornographic in 3841) [ClassicSimilarity], result of:
            0.4238973 = score(doc=3841,freq=2.0), product of:
              0.5157387 = queryWeight, product of:
                5.3164387 = boost
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.010432132 = queryNorm
              0.82192254 = fieldWeight in 3841, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.0625 = fieldNorm(doc=3841)
        0.16 = coord(4/25)
    
  3. Shieh, W.-Y.; Chung, C.-P.: ¬A statistics-based approach to incrementally update inverted files (2005) 0.07
    0.068117194 = sum of:
      0.068117194 = product of:
        0.4257325 = sum of:
          0.022815332 = weight(abstract_txt:proposed in 1010) [ClassicSimilarity], result of:
            0.022815332 = score(doc=1010,freq=1.0), product of:
              0.06335786 = queryWeight, product of:
                1.317623 = boost
                4.6093135 = idf(docFreq=1196, maxDocs=44218)
                0.010432132 = queryNorm
              0.36010262 = fieldWeight in 1010, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6093135 = idf(docFreq=1196, maxDocs=44218)
                0.078125 = fieldNorm(doc=1010)
          0.053527813 = weight(abstract_txt:rate in 1010) [ClassicSimilarity], result of:
            0.053527813 = score(doc=1010,freq=1.0), product of:
              0.111867085 = queryWeight, product of:
                1.7508224 = boost
                6.124733 = idf(docFreq=262, maxDocs=44218)
                0.010432132 = queryNorm
              0.47849476 = fieldWeight in 1010, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.124733 = idf(docFreq=262, maxDocs=44218)
                0.078125 = fieldNorm(doc=1010)
          0.114448175 = weight(abstract_txt:update in 1010) [ClassicSimilarity], result of:
            0.114448175 = score(doc=1010,freq=1.0), product of:
              0.21252938 = queryWeight, product of:
                2.9556026 = boost
                6.892866 = idf(docFreq=121, maxDocs=44218)
                0.010432132 = queryNorm
              0.5385052 = fieldWeight in 1010, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.892866 = idf(docFreq=121, maxDocs=44218)
                0.078125 = fieldNorm(doc=1010)
          0.23494118 = weight(abstract_txt:incremental in 1010) [ClassicSimilarity], result of:
            0.23494118 = score(doc=1010,freq=1.0), product of:
              0.377832 = queryWeight, product of:
                4.55046 = boost
                7.9592175 = idf(docFreq=41, maxDocs=44218)
                0.010432132 = queryNorm
              0.6218139 = fieldWeight in 1010, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.9592175 = idf(docFreq=41, maxDocs=44218)
                0.078125 = fieldNorm(doc=1010)
        0.16 = coord(4/25)
    
  4. Lee, L.-H.; Juan, Y.-C.; Tseng, W.-L.; Chen, H.-H.; Tseng, Y.-H.: Mining browsing behaviors for objectionable content filtering (2015) 0.07
    0.06662093 = sum of:
      0.06662093 = product of:
        0.41638082 = sum of:
          0.04150033 = weight(abstract_txt:false in 1818) [ClassicSimilarity], result of:
            0.04150033 = score(doc=1818,freq=1.0), product of:
              0.08695216 = queryWeight, product of:
                1.0914809 = boost
                7.636444 = idf(docFreq=57, maxDocs=44218)
                0.010432132 = queryNorm
              0.47727776 = fieldWeight in 1818, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.636444 = idf(docFreq=57, maxDocs=44218)
                0.0625 = fieldNorm(doc=1818)
          0.06036462 = weight(abstract_txt:pornography in 1818) [ClassicSimilarity], result of:
            0.06036462 = score(doc=1818,freq=1.0), product of:
              0.11162658 = queryWeight, product of:
                1.2366868 = boost
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.010432132 = queryNorm
              0.5407728 = fieldWeight in 1818, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.0625 = fieldNorm(doc=1818)
          0.042822253 = weight(abstract_txt:rate in 1818) [ClassicSimilarity], result of:
            0.042822253 = score(doc=1818,freq=1.0), product of:
              0.111867085 = queryWeight, product of:
                1.7508224 = boost
                6.124733 = idf(docFreq=262, maxDocs=44218)
                0.010432132 = queryNorm
              0.3827958 = fieldWeight in 1818, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.124733 = idf(docFreq=262, maxDocs=44218)
                0.0625 = fieldNorm(doc=1818)
          0.27169362 = weight(abstract_txt:blacklist in 1818) [ClassicSimilarity], result of:
            0.27169362 = score(doc=1818,freq=1.0), product of:
              0.4388735 = queryWeight, product of:
                4.2472343 = boost
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.010432132 = queryNorm
              0.6190705 = fieldWeight in 1818, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.0625 = fieldNorm(doc=1818)
        0.16 = coord(4/25)
    
  5. Fu, T.; Abbasi, A.; Chen, H.: ¬A focused crawler for Dark Web forums (2010) 0.07
    0.066284634 = sum of:
      0.066284634 = product of:
        0.552372 = sum of:
          0.04698276 = weight(abstract_txt:mechanism in 3471) [ClassicSimilarity], result of:
            0.04698276 = score(doc=3471,freq=1.0), product of:
              0.119000375 = queryWeight, product of:
                1.8057811 = boost
                6.31699 = idf(docFreq=216, maxDocs=44218)
                0.010432132 = queryNorm
              0.39481187 = fieldWeight in 3471, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.31699 = idf(docFreq=216, maxDocs=44218)
                0.0625 = fieldNorm(doc=3471)
          0.12948333 = weight(abstract_txt:update in 3471) [ClassicSimilarity], result of:
            0.12948333 = score(doc=3471,freq=2.0), product of:
              0.21252938 = queryWeight, product of:
                2.9556026 = boost
                6.892866 = idf(docFreq=121, maxDocs=44218)
                0.010432132 = queryNorm
              0.60924906 = fieldWeight in 3471, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.892866 = idf(docFreq=121, maxDocs=44218)
                0.0625 = fieldNorm(doc=3471)
          0.37590587 = weight(abstract_txt:incremental in 3471) [ClassicSimilarity], result of:
            0.37590587 = score(doc=3471,freq=4.0), product of:
              0.377832 = queryWeight, product of:
                4.55046 = boost
                7.9592175 = idf(docFreq=41, maxDocs=44218)
                0.010432132 = queryNorm
              0.9949022 = fieldWeight in 3471, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.9592175 = idf(docFreq=41, maxDocs=44218)
                0.0625 = fieldNorm(doc=3471)
        0.12 = coord(3/25)