Document (#27223)

Author
Chakrabarti, S.
Title
Mining the Web : discovering knowledge from hypertext data
Imprint
San Francisco, CA : Morgan Kaufmann
Year
2003
Pages
344 S
Isbn
1-55860-754-4
Footnote
Rez. in: JASIST 55(2004) no.3, S.275-276 (C. Chen): "This is a book about finding significant statistical patterns on the Web - in particular, patterns that are associated with hypertext documents, topics, hyperlinks, and queries. The term pattern in this book refers to dependencies among such items. On the one hand, the Web contains useful information an just about every topic under the sun. On the other hand, just like searching for a needle in a haystack, one would need powerful tools to locate useful information an the vast land of the Web. Soumen Chakrabarti's book focuses an a wide range of techniques for machine learning and data mining an the Web. The goal of the book is to provide both the technical Background and tools and tricks of the trade of Web content mining. Much of the technical content reflects the state of the art between 1995 and 2002. The targeted audience is researchers and innovative developers in this area, as well as newcomers who intend to enter this area. The book begins with an introduction chapter. The introduction chapter explains fundamental concepts such as crawling and indexing as well as clustering and classification. The remaining eight chapters are organized into three parts: i) infrastructure, ii) learning and iii) applications.
Part I, Infrastructure, has two chapters: Chapter 2 on crawling the Web and Chapter 3 an Web search and information retrieval. The second part of the book, containing chapters 4, 5, and 6, is the centerpiece. This part specifically focuses an machine learning in the context of hypertext. Part III is a collection of applications that utilize the techniques described in earlier chapters. Chapter 7 is an social network analysis. Chapter 8 is an resource discovery. Chapter 9 is an the future of Web mining. Overall, this is a valuable reference book for researchers and developers in the field of Web mining. It should be particularly useful for those who would like to design and probably code their own Computer programs out of the equations and pseudocodes an most of the pages. For a student, the most valuable feature of the book is perhaps the formal and consistent treatments of concepts across the board. For what is behind and beyond the technical details, one has to either dig deeper into the bibliographic notes at the end of each chapter, or resort to more in-depth analysis of relevant subjects in the literature. lf you are looking for successful stories about Web mining or hard-way-learned lessons of failures, this is not the book."
Theme
Internet
Data Mining

Similar documents (content)

  1. Ohly, H.P.: Bibliometric mining : added value from document analysis and retrieval (2008) 0.59
    0.58644164 = sum of:
      0.58644164 = product of:
        0.8796624 = sum of:
          0.025795082 = weight(abstract_txt:from in 2386) [ClassicSimilarity], result of:
            0.025795082 = score(doc=2386,freq=1.0), product of:
              0.0995511 = queryWeight, product of:
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.03601857 = queryNorm
              0.259114 = fieldWeight in 2386, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.09375 = fieldNorm(doc=2386)
          0.06416631 = weight(abstract_txt:data in 2386) [ClassicSimilarity], result of:
            0.06416631 = score(doc=2386,freq=2.0), product of:
              0.1450606 = queryWeight, product of:
                1.2071235 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.03601857 = queryNorm
              0.44234142 = fieldWeight in 2386, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.09375 = fieldNorm(doc=2386)
          0.28772593 = weight(abstract_txt:mining in 2386) [ClassicSimilarity], result of:
            0.28772593 = score(doc=2386,freq=1.0), product of:
              0.49698213 = queryWeight, product of:
                2.2343302 = boost
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.03601857 = queryNorm
              0.57894623 = fieldWeight in 2386, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.09375 = fieldNorm(doc=2386)
          0.50197506 = weight(abstract_txt:discovering in 2386) [ClassicSimilarity], result of:
            0.50197506 = score(doc=2386,freq=1.0), product of:
              0.72023827 = queryWeight, product of:
                2.6897695 = boost
                7.4342074 = idf(docFreq=70, maxDocs=44218)
                0.03601857 = queryNorm
              0.69695693 = fieldWeight in 2386, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4342074 = idf(docFreq=70, maxDocs=44218)
                0.09375 = fieldNorm(doc=2386)
        0.6666667 = coord(4/6)
    
  2. Srinivasan, P.: Text mining : generating hypotheses from MEDLINE (2004) 0.55
    0.54970235 = sum of:
      0.54970235 = product of:
        0.8245535 = sum of:
          0.021495903 = weight(abstract_txt:from in 2225) [ClassicSimilarity], result of:
            0.021495903 = score(doc=2225,freq=1.0), product of:
              0.0995511 = queryWeight, product of:
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.03601857 = queryNorm
              0.21592833 = fieldWeight in 2225, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.078125 = fieldNorm(doc=2225)
          0.04565675 = weight(abstract_txt:knowledge in 2225) [ClassicSimilarity], result of:
            0.04565675 = score(doc=2225,freq=1.0), product of:
              0.16449232 = queryWeight, product of:
                1.285434 = boost
                3.5527887 = idf(docFreq=3442, maxDocs=44218)
                0.03601857 = queryNorm
              0.2775616 = fieldWeight in 2225, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5527887 = idf(docFreq=3442, maxDocs=44218)
                0.078125 = fieldNorm(doc=2225)
          0.33908826 = weight(abstract_txt:mining in 2225) [ClassicSimilarity], result of:
            0.33908826 = score(doc=2225,freq=2.0), product of:
              0.49698213 = queryWeight, product of:
                2.2343302 = boost
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.03601857 = queryNorm
              0.68229467 = fieldWeight in 2225, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.078125 = fieldNorm(doc=2225)
          0.41831255 = weight(abstract_txt:discovering in 2225) [ClassicSimilarity], result of:
            0.41831255 = score(doc=2225,freq=1.0), product of:
              0.72023827 = queryWeight, product of:
                2.6897695 = boost
                7.4342074 = idf(docFreq=70, maxDocs=44218)
                0.03601857 = queryNorm
              0.58079743 = fieldWeight in 2225, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4342074 = idf(docFreq=70, maxDocs=44218)
                0.078125 = fieldNorm(doc=2225)
        0.6666667 = coord(4/6)
    
  3. Liu, B.: Web data mining : exploring hyperlinks, contents, and usage data (2011) 0.54
    0.54054344 = sum of:
      0.54054344 = product of:
        0.8108151 = sum of:
          0.029785596 = weight(abstract_txt:from in 354) [ClassicSimilarity], result of:
            0.029785596 = score(doc=354,freq=3.0), product of:
              0.0995511 = queryWeight, product of:
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.03601857 = queryNorm
              0.29919907 = fieldWeight in 354, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.0625 = fieldNorm(doc=354)
          0.08002945 = weight(abstract_txt:data in 354) [ClassicSimilarity], result of:
            0.08002945 = score(doc=354,freq=7.0), product of:
              0.1450606 = queryWeight, product of:
                1.2071235 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.03601857 = queryNorm
              0.55169666 = fieldWeight in 354, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.0625 = fieldNorm(doc=354)
          0.036525406 = weight(abstract_txt:knowledge in 354) [ClassicSimilarity], result of:
            0.036525406 = score(doc=354,freq=1.0), product of:
              0.16449232 = queryWeight, product of:
                1.285434 = boost
                3.5527887 = idf(docFreq=3442, maxDocs=44218)
                0.03601857 = queryNorm
              0.2220493 = fieldWeight in 354, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5527887 = idf(docFreq=3442, maxDocs=44218)
                0.0625 = fieldNorm(doc=354)
          0.6644746 = weight(abstract_txt:mining in 354) [ClassicSimilarity], result of:
            0.6644746 = score(doc=354,freq=12.0), product of:
              0.49698213 = queryWeight, product of:
                2.2343302 = boost
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.03601857 = queryNorm
              1.3370191 = fieldWeight in 354, product of:
                3.4641016 = tf(freq=12.0), with freq of:
                  12.0 = termFreq=12.0
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.0625 = fieldNorm(doc=354)
        0.6666667 = coord(4/6)
    
  4. Raghavan, V.V.; Deogun, J.S.; Sever, H.: Knowledge discovery and data mining : introduction (1998) 0.53
    0.5343787 = sum of:
      0.5343787 = product of:
        0.80156803 = sum of:
          0.025795082 = weight(abstract_txt:from in 2899) [ClassicSimilarity], result of:
            0.025795082 = score(doc=2899,freq=1.0), product of:
              0.0995511 = queryWeight, product of:
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.03601857 = queryNorm
              0.259114 = fieldWeight in 2899, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.09375 = fieldNorm(doc=2899)
          0.09074487 = weight(abstract_txt:data in 2899) [ClassicSimilarity], result of:
            0.09074487 = score(doc=2899,freq=4.0), product of:
              0.1450606 = queryWeight, product of:
                1.2071235 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.03601857 = queryNorm
              0.62556523 = fieldWeight in 2899, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.09375 = fieldNorm(doc=2899)
          0.10957622 = weight(abstract_txt:knowledge in 2899) [ClassicSimilarity], result of:
            0.10957622 = score(doc=2899,freq=4.0), product of:
              0.16449232 = queryWeight, product of:
                1.285434 = boost
                3.5527887 = idf(docFreq=3442, maxDocs=44218)
                0.03601857 = queryNorm
              0.6661479 = fieldWeight in 2899, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.5527887 = idf(docFreq=3442, maxDocs=44218)
                0.09375 = fieldNorm(doc=2899)
          0.57545185 = weight(abstract_txt:mining in 2899) [ClassicSimilarity], result of:
            0.57545185 = score(doc=2899,freq=4.0), product of:
              0.49698213 = queryWeight, product of:
                2.2343302 = boost
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.03601857 = queryNorm
              1.1578925 = fieldWeight in 2899, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.09375 = fieldNorm(doc=2899)
        0.6666667 = coord(4/6)
    
  5. ¬The World Wide Web and Databases : International Workshop WebDB'98, Valencia, Spain, March 27-28, 1998, Selected papers (1999) 0.52
    0.51759 = sum of:
      0.51759 = product of:
        0.77638495 = sum of:
          0.034393445 = weight(abstract_txt:from in 3959) [ClassicSimilarity], result of:
            0.034393445 = score(doc=3959,freq=1.0), product of:
              0.0995511 = queryWeight, product of:
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.03601857 = queryNorm
              0.34548533 = fieldWeight in 3959, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.125 = fieldNorm(doc=3959)
          0.060496576 = weight(abstract_txt:data in 3959) [ClassicSimilarity], result of:
            0.060496576 = score(doc=3959,freq=1.0), product of:
              0.1450606 = queryWeight, product of:
                1.2071235 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.03601857 = queryNorm
              0.41704348 = fieldWeight in 3959, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.125 = fieldNorm(doc=3959)
          0.29786038 = weight(abstract_txt:hypertext in 3959) [ClassicSimilarity], result of:
            0.29786038 = score(doc=3959,freq=1.0), product of:
              0.41982737 = queryWeight, product of:
                2.0535834 = boost
                5.6758637 = idf(docFreq=411, maxDocs=44218)
                0.03601857 = queryNorm
              0.70948297 = fieldWeight in 3959, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6758637 = idf(docFreq=411, maxDocs=44218)
                0.125 = fieldNorm(doc=3959)
          0.38363457 = weight(abstract_txt:mining in 3959) [ClassicSimilarity], result of:
            0.38363457 = score(doc=3959,freq=1.0), product of:
              0.49698213 = queryWeight, product of:
                2.2343302 = boost
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.03601857 = queryNorm
              0.7719283 = fieldWeight in 3959, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.125 = fieldNorm(doc=3959)
        0.6666667 = coord(4/6)