Document (#38819)

Author
Lee, L.-H.
Juan, Y.-C.
Tseng, W.-L.
Chen, H.-H.
Tseng, Y.-H.
Title
Mining browsing behaviors for objectionable content filtering
Source
Journal of the Association for Information Science and Technology. 66(2015) no.5, S.930-942
Year
2015
Abstract
This article explores users' browsing intents to predict the category of a user's next access during web surfing and applies the results to filter objectionable content, such as pornography, gambling, violence, and drugs. Users' access trails in terms of category sequences in click-through data are employed to mine users' web browsing behaviors. Contextual relationships of URL categories are learned by the hidden Markov model. The top-level domains (TLDs) extracted from URLs themselves and the corresponding categories are caught by the TLD model. Given a URL to be predicted, its TLD and current context are empirically combined in an aggregation model. In addition to the uses of the current context, the predictions of the URL accessed previously in different contexts by various users are also considered by majority rule to improve the aggregation model. Large-scale experiments show that the advanced aggregation approach achieves promising performance while maintaining an acceptably low false positive rate. Different strategies are introduced to integrate the model with the blacklist it generates for filtering objectionable web pages without analyzing their content. In practice, this is complementary to the existing content analysis from users' behavioral perspectives.
Content
Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23217/abstract.

Similar documents (author)

  1. Tseng, Y.-H.: Automatic cataloguing and searching for retrospective data by use of OCR text (2001) 2.23
    2.228139 = sum of:
      2.228139 = product of:
        4.456278 = sum of:
          4.456278 = weight(author_txt:tseng in 5420) [ClassicSimilarity], result of:
            4.456278 = score(doc=5420,freq=1.0), product of:
              0.97548705 = queryWeight, product of:
                2.977545 = boost
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.0358577 = queryNorm
              4.5682592 = fieldWeight in 5420, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.5 = fieldNorm(doc=5420)
        0.5 = coord(1/2)
    
  2. Tseng, Y.-H.: Keyword extraction techniques and relevance feedback (1997) 2.23
    2.228139 = sum of:
      2.228139 = product of:
        4.456278 = sum of:
          4.456278 = weight(author_txt:tseng in 2830) [ClassicSimilarity], result of:
            4.456278 = score(doc=2830,freq=1.0), product of:
              0.97548705 = queryWeight, product of:
                2.977545 = boost
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.0358577 = queryNorm
              4.5682592 = fieldWeight in 2830, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.5 = fieldNorm(doc=2830)
        0.5 = coord(1/2)
    
  3. Tseng, Y.-H.: Solving vocabulary problems with interactive query expansion (1998) 2.23
    2.228139 = sum of:
      2.228139 = product of:
        4.456278 = sum of:
          4.456278 = weight(author_txt:tseng in 6159) [ClassicSimilarity], result of:
            4.456278 = score(doc=6159,freq=1.0), product of:
              0.97548705 = queryWeight, product of:
                2.977545 = boost
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.0358577 = queryNorm
              4.5682592 = fieldWeight in 6159, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.5 = fieldNorm(doc=6159)
        0.5 = coord(1/2)
    
  4. Tseng, Y.H.; Lin, Y.I.: Evaluation of fuzzy search, term suggestion, and term relevance feedback in an OPAC system (1998) 2.23
    2.228139 = sum of:
      2.228139 = product of:
        4.456278 = sum of:
          4.456278 = weight(author_txt:tseng in 430) [ClassicSimilarity], result of:
            4.456278 = score(doc=430,freq=1.0), product of:
              0.97548705 = queryWeight, product of:
                2.977545 = boost
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.0358577 = queryNorm
              4.5682592 = fieldWeight in 430, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.5 = fieldNorm(doc=430)
        0.5 = coord(1/2)
    
  5. Tseng, Y.-H.: Automatic thesaurus generation for Chinese documents (2002) 2.23
    2.228139 = sum of:
      2.228139 = product of:
        4.456278 = sum of:
          4.456278 = weight(author_txt:tseng in 226) [ClassicSimilarity], result of:
            4.456278 = score(doc=226,freq=1.0), product of:
              0.97548705 = queryWeight, product of:
                2.977545 = boost
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.0358577 = queryNorm
              4.5682592 = fieldWeight in 226, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.5 = fieldNorm(doc=226)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Lee, L.-H.; Chen, H.-H.: Mining search intents for collaborative cyberporn filtering (2012) 0.15
    0.14615262 = sum of:
      0.14615262 = product of:
        0.6089693 = sum of:
          0.06985978 = weight(abstract_txt:false in 988) [ClassicSimilarity], result of:
            0.06985978 = score(doc=988,freq=1.0), product of:
              0.14693551 = queryWeight, product of:
                1.0021776 = boost
                7.607123 = idf(docFreq=59, maxDocs=44421)
                0.019273546 = queryNorm
              0.47544518 = fieldWeight in 988, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.607123 = idf(docFreq=59, maxDocs=44421)
                0.0625 = fieldNorm(doc=988)
          0.11296832 = weight(abstract_txt:trails in 988) [ClassicSimilarity], result of:
            0.11296832 = score(doc=988,freq=1.0), product of:
              0.20243226 = queryWeight, product of:
                1.1763084 = boost
                8.928879 = idf(docFreq=15, maxDocs=44421)
                0.019273546 = queryNorm
              0.5580549 = fieldWeight in 988, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.928879 = idf(docFreq=15, maxDocs=44421)
                0.0625 = fieldNorm(doc=988)
          0.21840543 = weight(abstract_txt:blacklist in 988) [ClassicSimilarity], result of:
            0.21840543 = score(doc=988,freq=2.0), product of:
              0.24934895 = queryWeight, product of:
                1.3055247 = boost
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.019273546 = queryNorm
              0.8759027 = fieldWeight in 988, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.0625 = fieldNorm(doc=988)
          0.1253669 = weight(abstract_txt:filtering in 988) [ClassicSimilarity], result of:
            0.1253669 = score(doc=988,freq=2.0), product of:
              0.2169854 = queryWeight, product of:
                1.722311 = boost
                6.5366817 = idf(docFreq=174, maxDocs=44421)
                0.019273546 = queryNorm
              0.5777665 = fieldWeight in 988, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.5366817 = idf(docFreq=174, maxDocs=44421)
                0.0625 = fieldNorm(doc=988)
          0.046348173 = weight(abstract_txt:content in 988) [ClassicSimilarity], result of:
            0.046348173 = score(doc=988,freq=1.0), product of:
              0.1774259 = queryWeight, product of:
                2.20252 = boost
                4.1796083 = idf(docFreq=1847, maxDocs=44421)
                0.019273546 = queryNorm
              0.26122552 = fieldWeight in 988, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1796083 = idf(docFreq=1847, maxDocs=44421)
                0.0625 = fieldNorm(doc=988)
          0.03602072 = weight(abstract_txt:users in 988) [ClassicSimilarity], result of:
            0.03602072 = score(doc=988,freq=1.0), product of:
              0.16155988 = queryWeight, product of:
                2.3498118 = boost
                3.5672934 = idf(docFreq=3408, maxDocs=44421)
                0.019273546 = queryNorm
              0.22295584 = fieldWeight in 988, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5672934 = idf(docFreq=3408, maxDocs=44421)
                0.0625 = fieldNorm(doc=988)
        0.24 = coord(6/25)
    
  2. Herrera-Viedma, E.; Pasi, G.; Lopez-Herrera, A.G.; Porcel; C.: Evaluating the information quality of Web sites : a methodology based on fuzzy computing with words (2006) 0.10
    0.098883905 = sum of:
      0.098883905 = product of:
        0.49441952 = sum of:
          0.10447772 = weight(abstract_txt:generates in 286) [ClassicSimilarity], result of:
            0.10447772 = score(doc=286,freq=2.0), product of:
              0.15251565 = queryWeight, product of:
                1.0210301 = boost
                7.750224 = idf(docFreq=51, maxDocs=44421)
                0.019273546 = queryNorm
              0.6850295 = fieldWeight in 286, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.750224 = idf(docFreq=51, maxDocs=44421)
                0.0625 = fieldNorm(doc=286)
          0.08864778 = weight(abstract_txt:filtering in 286) [ClassicSimilarity], result of:
            0.08864778 = score(doc=286,freq=1.0), product of:
              0.2169854 = queryWeight, product of:
                1.722311 = boost
                6.5366817 = idf(docFreq=174, maxDocs=44421)
                0.019273546 = queryNorm
              0.4085426 = fieldWeight in 286, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5366817 = idf(docFreq=174, maxDocs=44421)
                0.0625 = fieldNorm(doc=286)
          0.065546215 = weight(abstract_txt:content in 286) [ClassicSimilarity], result of:
            0.065546215 = score(doc=286,freq=2.0), product of:
              0.1774259 = queryWeight, product of:
                2.20252 = boost
                4.1796083 = idf(docFreq=1847, maxDocs=44421)
                0.019273546 = queryNorm
              0.36942866 = fieldWeight in 286, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1796083 = idf(docFreq=1847, maxDocs=44421)
                0.0625 = fieldNorm(doc=286)
          0.050940987 = weight(abstract_txt:users in 286) [ClassicSimilarity], result of:
            0.050940987 = score(doc=286,freq=2.0), product of:
              0.16155988 = queryWeight, product of:
                2.3498118 = boost
                3.5672934 = idf(docFreq=3408, maxDocs=44421)
                0.019273546 = queryNorm
              0.31530717 = fieldWeight in 286, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5672934 = idf(docFreq=3408, maxDocs=44421)
                0.0625 = fieldNorm(doc=286)
          0.18480684 = weight(abstract_txt:aggregation in 286) [ClassicSimilarity], result of:
            0.18480684 = score(doc=286,freq=1.0), product of:
              0.4053477 = queryWeight, product of:
                2.883074 = boost
                7.2947483 = idf(docFreq=81, maxDocs=44421)
                0.019273546 = queryNorm
              0.45592177 = fieldWeight in 286, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2947483 = idf(docFreq=81, maxDocs=44421)
                0.0625 = fieldNorm(doc=286)
        0.2 = coord(5/25)
    
  3. Rorissa, A.; Iyer, H.: Theories of cognition and image categorization : what category labels reveal about basic level theory (2008) 0.09
    0.0882161 = sum of:
      0.0882161 = product of:
        0.4410805 = sum of:
          0.05507673 = weight(abstract_txt:categories in 2958) [ClassicSimilarity], result of:
            0.05507673 = score(doc=2958,freq=1.0), product of:
              0.136152 = queryWeight, product of:
                1.3642951 = boost
                5.177905 = idf(docFreq=680, maxDocs=44421)
                0.019273546 = queryNorm
              0.40452385 = fieldWeight in 2958, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.177905 = idf(docFreq=680, maxDocs=44421)
                0.078125 = fieldNorm(doc=2958)
          0.13672799 = weight(abstract_txt:category in 2958) [ClassicSimilarity], result of:
            0.13672799 = score(doc=2958,freq=2.0), product of:
              0.19812542 = queryWeight, product of:
                1.6457597 = boost
                6.2461467 = idf(docFreq=233, maxDocs=44421)
                0.019273546 = queryNorm
              0.69010824 = fieldWeight in 2958, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.2461467 = idf(docFreq=233, maxDocs=44421)
                0.078125 = fieldNorm(doc=2958)
          0.057935216 = weight(abstract_txt:content in 2958) [ClassicSimilarity], result of:
            0.057935216 = score(doc=2958,freq=1.0), product of:
              0.1774259 = queryWeight, product of:
                2.20252 = boost
                4.1796083 = idf(docFreq=1847, maxDocs=44421)
                0.019273546 = queryNorm
              0.3265319 = fieldWeight in 2958, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1796083 = idf(docFreq=1847, maxDocs=44421)
                0.078125 = fieldNorm(doc=2958)
          0.1463147 = weight(abstract_txt:browsing in 2958) [ClassicSimilarity], result of:
            0.1463147 = score(doc=2958,freq=2.0), product of:
              0.2372781 = queryWeight, product of:
                2.2058237 = boost
                5.58117 = idf(docFreq=454, maxDocs=44421)
                0.019273546 = queryNorm
              0.616638 = fieldWeight in 2958, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.58117 = idf(docFreq=454, maxDocs=44421)
                0.078125 = fieldNorm(doc=2958)
          0.0450259 = weight(abstract_txt:users in 2958) [ClassicSimilarity], result of:
            0.0450259 = score(doc=2958,freq=1.0), product of:
              0.16155988 = queryWeight, product of:
                2.3498118 = boost
                3.5672934 = idf(docFreq=3408, maxDocs=44421)
                0.019273546 = queryNorm
              0.2786948 = fieldWeight in 2958, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5672934 = idf(docFreq=3408, maxDocs=44421)
                0.078125 = fieldNorm(doc=2958)
        0.2 = coord(5/25)
    
  4. Lee, L.-H.; Luh, C.-J.: Generation of pornographic blacklist and its incremental update using an inverse chi-square based method (2008) 0.09
    0.08751427 = sum of:
      0.08751427 = product of:
        0.54696417 = sum of:
          0.08732472 = weight(abstract_txt:false in 2340) [ClassicSimilarity], result of:
            0.08732472 = score(doc=2340,freq=1.0), product of:
              0.14693551 = queryWeight, product of:
                1.0021776 = boost
                7.607123 = idf(docFreq=59, maxDocs=44421)
                0.019273546 = queryNorm
              0.59430647 = fieldWeight in 2340, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.607123 = idf(docFreq=59, maxDocs=44421)
                0.078125 = fieldNorm(doc=2340)
          0.12869744 = weight(abstract_txt:pornography in 2340) [ClassicSimilarity], result of:
            0.12869744 = score(doc=2340,freq=1.0), product of:
              0.19028966 = queryWeight, product of:
                1.1404833 = boost
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.019273546 = queryNorm
              0.67632383 = fieldWeight in 2340, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.078125 = fieldNorm(doc=2340)
          0.2730068 = weight(abstract_txt:blacklist in 2340) [ClassicSimilarity], result of:
            0.2730068 = score(doc=2340,freq=2.0), product of:
              0.24934895 = queryWeight, product of:
                1.3055247 = boost
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.019273546 = queryNorm
              1.0948784 = fieldWeight in 2340, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.078125 = fieldNorm(doc=2340)
          0.057935216 = weight(abstract_txt:content in 2340) [ClassicSimilarity], result of:
            0.057935216 = score(doc=2340,freq=1.0), product of:
              0.1774259 = queryWeight, product of:
                2.20252 = boost
                4.1796083 = idf(docFreq=1847, maxDocs=44421)
                0.019273546 = queryNorm
              0.3265319 = fieldWeight in 2340, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1796083 = idf(docFreq=1847, maxDocs=44421)
                0.078125 = fieldNorm(doc=2340)
        0.16 = coord(4/25)
    
  5. Bondarenko, O.; Janssen, R.: Connecting visual cues to semantic judgments in the context of the office environment (2009) 0.09
    0.086934246 = sum of:
      0.086934246 = product of:
        0.43467122 = sum of:
          0.0447261 = weight(abstract_txt:context in 3797) [ClassicSimilarity], result of:
            0.0447261 = score(doc=3797,freq=3.0), product of:
              0.0953496 = queryWeight, product of:
                1.1417099 = boost
                4.333128 = idf(docFreq=1584, maxDocs=44421)
                0.019273546 = queryNorm
              0.46907485 = fieldWeight in 3797, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.333128 = idf(docFreq=1584, maxDocs=44421)
                0.0625 = fieldNorm(doc=3797)
          0.062312204 = weight(abstract_txt:categories in 3797) [ClassicSimilarity], result of:
            0.062312204 = score(doc=3797,freq=2.0), product of:
              0.136152 = queryWeight, product of:
                1.3642951 = boost
                5.177905 = idf(docFreq=680, maxDocs=44421)
                0.019273546 = queryNorm
              0.45766646 = fieldWeight in 3797, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.177905 = idf(docFreq=680, maxDocs=44421)
                0.0625 = fieldNorm(doc=3797)
          0.09269635 = weight(abstract_txt:content in 3797) [ClassicSimilarity], result of:
            0.09269635 = score(doc=3797,freq=4.0), product of:
              0.1774259 = queryWeight, product of:
                2.20252 = boost
                4.1796083 = idf(docFreq=1847, maxDocs=44421)
                0.019273546 = queryNorm
              0.52245104 = fieldWeight in 3797, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.1796083 = idf(docFreq=1847, maxDocs=44421)
                0.0625 = fieldNorm(doc=3797)
          0.050129745 = weight(abstract_txt:model in 3797) [ClassicSimilarity], result of:
            0.050129745 = score(doc=3797,freq=1.0), product of:
              0.20138584 = queryWeight, product of:
                2.6234982 = boost
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.019273546 = queryNorm
              0.24892388 = fieldWeight in 3797, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.0625 = fieldNorm(doc=3797)
          0.18480684 = weight(abstract_txt:aggregation in 3797) [ClassicSimilarity], result of:
            0.18480684 = score(doc=3797,freq=1.0), product of:
              0.4053477 = queryWeight, product of:
                2.883074 = boost
                7.2947483 = idf(docFreq=81, maxDocs=44421)
                0.019273546 = queryNorm
              0.45592177 = fieldWeight in 3797, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2947483 = idf(docFreq=81, maxDocs=44421)
                0.0625 = fieldNorm(doc=3797)
        0.2 = coord(5/25)