Document (#31927)

Henzinger, M.R.
Link analysis in Web information retrieval
IEEE data engineering bulletin. 23(2000) no.3, S.3-8
The analysis of the hyperlink structure of the web has led to significant improvements in web information retrieval. This survey describes two successful link analysis algorithms and the state-of-the art of the field.
The goal of information retrieval is to find all documents relevant for a user query in a collection of documents. Decades of research in information retrieval were successful in developing and refining techniques that are solely word-based (see e.g., [2]). With the advent of the web new sources of information became available, one of them being the hyperlinks between documents and records of user behavior. To be precise, hypertexts (i.e., collections of documents connected by hyperlinks) have existed and have been studied for a long time. What was new was the large number of hyperlinks created by independent individuals. Hyperlinks provide a valuable source of information for web information retrieval as we will show in this article. This area of information retrieval is commonly called link analysis. Why would one expect hyperlinks to be useful? Ahyperlink is a reference of a web page B that is contained in a web page A. When the hyperlink is clicked on in a web browser, the browser displays page B. This functionality alone is not helpful for web information retrieval. However, the way hyperlinks are typically used by authors of web pages can give them valuable information content. Typically, authors create links because they think they will be useful for the readers of the pages. Thus, links are usually either navigational aids that, for example, bring the reader back to the homepage of the site, or links that point to pages whose content augments the content of the current page. The second kind of links tend to point to high-quality pages that might be on the same topic as the page containing the link.

Similar documents (author)

  1. Henzinger, M.R.: Hyperlink analysis for the Web (2001) 6.19
    6.1935673 = sum of:
      6.1935673 = weight(author_txt:henzinger in 1008) [ClassicSimilarity], result of:
        6.1935673 = fieldWeight in 1008, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.909708 = idf(docFreq=5, maxDocs=44421)
          0.625 = fieldNorm(doc=1008)
  2. Dean, J.; Henzinger, M.R.: Finding related pages in the World Wide Web (1999) 4.95
    4.954854 = sum of:
      4.954854 = weight(author_txt:henzinger in 284) [ClassicSimilarity], result of:
        4.954854 = fieldWeight in 284, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.909708 = idf(docFreq=5, maxDocs=44421)
          0.5 = fieldNorm(doc=284)
  3. Henzinger, M.; Pöppe, C.: "Qualität der Suchergebnisse ist unser höchstes Ziel" : Suchmaschine Google (2002) 4.95
    4.954854 = sum of:
      4.954854 = weight(author_txt:henzinger in 1851) [ClassicSimilarity], result of:
        4.954854 = fieldWeight in 1851, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.909708 = idf(docFreq=5, maxDocs=44421)
          0.5 = fieldNorm(doc=1851)
  4. Henzinger, M.; Wiesemann, M.: Google-Forschungschefin Monika Henzinger beklagt Manipulationen von Suchmaschinen : "Tricks der Porno-Branche" (2002) 4.95
    4.954854 = sum of:
      4.954854 = weight(author_txt:henzinger in 2137) [ClassicSimilarity], result of:
        4.954854 = fieldWeight in 2137, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.909708 = idf(docFreq=5, maxDocs=44421)
          0.5 = fieldNorm(doc=2137)

Similar documents (content)

  1. Rasmussen, E.: Clustering algorithms (1992) 0.38
    0.3782561 = sum of:
      0.3782561 = product of:
        0.8105488 = sum of:
          0.039913986 = weight(abstract_txt:structure in 4513) [ClassicSimilarity], result of:
            0.039913986 = score(doc=4513,freq=1.0), product of:
              0.14653918 = queryWeight, product of:
                1.8111455 = boost
                4.3580413 = idf(docFreq=1545, maxDocs=44421)
                0.018565604 = queryNorm
              0.27237758 = fieldWeight in 4513, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3580413 = idf(docFreq=1545, maxDocs=44421)
                0.0625 = fieldNorm(doc=4513)
          0.061666768 = weight(abstract_txt:field in 4513) [ClassicSimilarity], result of:
            0.061666768 = score(doc=4513,freq=2.0), product of:
              0.15543942 = queryWeight, product of:
                1.8653358 = boost
                4.4884357 = idf(docFreq=1356, maxDocs=44421)
                0.018565604 = queryNorm
              0.39672542 = fieldWeight in 4513, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4884357 = idf(docFreq=1356, maxDocs=44421)
                0.0625 = fieldNorm(doc=4513)
          0.013650062 = weight(abstract_txt:information in 4513) [ClassicSimilarity], result of:
            0.013650062 = score(doc=4513,freq=1.0), product of:
              0.0902894 = queryWeight, product of:
                2.0105267 = boost
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.018565604 = queryNorm
              0.15118122 = fieldWeight in 4513, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.0625 = fieldNorm(doc=4513)
          0.12629981 = weight(abstract_txt:algorithms in 4513) [ClassicSimilarity], result of:
            0.12629981 = score(doc=4513,freq=2.0), product of:
              0.25068527 = queryWeight, product of:
                2.368868 = boost
                5.7000527 = idf(docFreq=403, maxDocs=44421)
                0.018565604 = queryNorm
              0.5038182 = fieldWeight in 4513, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.7000527 = idf(docFreq=403, maxDocs=44421)
                0.0625 = fieldNorm(doc=4513)
          0.0701889 = weight(abstract_txt:retrieval in 4513) [ClassicSimilarity], result of:
            0.0701889 = score(doc=4513,freq=3.0), product of:
              0.1865029 = queryWeight, product of:
                2.889577 = boost
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.018565604 = queryNorm
              0.37634215 = fieldWeight in 4513, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.0625 = fieldNorm(doc=4513)
          0.14019622 = weight(abstract_txt:analysis in 4513) [ClassicSimilarity], result of:
            0.14019622 = score(doc=4513,freq=4.0), product of:
              0.3076439 = queryWeight, product of:
                4.545286 = boost
                3.6456752 = idf(docFreq=3151, maxDocs=44421)
                0.018565604 = queryNorm
              0.4557094 = fieldWeight in 4513, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.6456752 = idf(docFreq=3151, maxDocs=44421)
                0.0625 = fieldNorm(doc=4513)
          0.358633 = weight(abstract_txt:link in 4513) [ClassicSimilarity], result of:
            0.358633 = score(doc=4513,freq=4.0), product of:
              0.5026826 = queryWeight, product of:
                4.7439313 = boost
                5.707506 = idf(docFreq=400, maxDocs=44421)
                0.018565604 = queryNorm
              0.7134383 = fieldWeight in 4513, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.707506 = idf(docFreq=400, maxDocs=44421)
                0.0625 = fieldNorm(doc=4513)
        0.46666667 = coord(7/15)
  2. Henzinger, M.R.: Hyperlink analysis for the Web (2001) 0.36
    0.35786858 = sum of:
      0.35786858 = product of:
        1.3420072 = sum of:
          0.040950187 = weight(abstract_txt:information in 1008) [ClassicSimilarity], result of:
            0.040950187 = score(doc=1008,freq=1.0), product of:
              0.0902894 = queryWeight, product of:
                2.0105267 = boost
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.018565604 = queryNorm
              0.45354366 = fieldWeight in 1008, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.1875 = fieldNorm(doc=1008)
          0.3788994 = weight(abstract_txt:algorithms in 1008) [ClassicSimilarity], result of:
            0.3788994 = score(doc=1008,freq=2.0), product of:
              0.25068527 = queryWeight, product of:
                2.368868 = boost
                5.7000527 = idf(docFreq=403, maxDocs=44421)
                0.018565604 = queryNorm
              1.5114546 = fieldWeight in 1008, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.7000527 = idf(docFreq=403, maxDocs=44421)
                0.1875 = fieldNorm(doc=1008)
          0.7118632 = weight(abstract_txt:hyperlink in 1008) [ClassicSimilarity], result of:
            0.7118632 = score(doc=1008,freq=1.0), product of:
              0.48089904 = queryWeight, product of:
                3.2809787 = boost
                7.894805 = idf(docFreq=44, maxDocs=44421)
                0.018565604 = queryNorm
              1.4802759 = fieldWeight in 1008, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.894805 = idf(docFreq=44, maxDocs=44421)
                0.1875 = fieldNorm(doc=1008)
          0.2102943 = weight(abstract_txt:analysis in 1008) [ClassicSimilarity], result of:
            0.2102943 = score(doc=1008,freq=1.0), product of:
              0.3076439 = queryWeight, product of:
                4.545286 = boost
                3.6456752 = idf(docFreq=3151, maxDocs=44421)
                0.018565604 = queryNorm
              0.68356407 = fieldWeight in 1008, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6456752 = idf(docFreq=3151, maxDocs=44421)
                0.1875 = fieldNorm(doc=1008)
        0.26666668 = coord(4/15)
  3. Yang, P.; Gao, W.; Tan, Q.; Wong, K.-F.: ¬A link-bridged topic model for cross-domain document classification (2013) 0.30
    0.30335912 = sum of:
      0.30335912 = product of:
        0.7583978 = sum of:
          0.006718388 = weight(abstract_txt:this in 3706) [ClassicSimilarity], result of:
            0.006718388 = score(doc=3706,freq=1.0), product of:
              0.0446732 = queryWeight, product of:
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.018565604 = queryNorm
              0.15038967 = fieldWeight in 3706, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.0625 = fieldNorm(doc=3706)
          0.039913986 = weight(abstract_txt:structure in 3706) [ClassicSimilarity], result of:
            0.039913986 = score(doc=3706,freq=1.0), product of:
              0.14653918 = queryWeight, product of:
                1.8111455 = boost
                4.3580413 = idf(docFreq=1545, maxDocs=44421)
                0.018565604 = queryNorm
              0.27237758 = fieldWeight in 3706, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3580413 = idf(docFreq=1545, maxDocs=44421)
                0.0625 = fieldNorm(doc=3706)
          0.054209646 = weight(abstract_txt:state in 3706) [ClassicSimilarity], result of:
            0.054209646 = score(doc=3706,freq=1.0), product of:
              0.17971654 = queryWeight, product of:
                2.0057209 = boost
                4.8262353 = idf(docFreq=967, maxDocs=44421)
                0.018565604 = queryNorm
              0.3016397 = fieldWeight in 3706, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8262353 = idf(docFreq=967, maxDocs=44421)
                0.0625 = fieldNorm(doc=3706)
          0.019304102 = weight(abstract_txt:information in 3706) [ClassicSimilarity], result of:
            0.019304102 = score(doc=3706,freq=2.0), product of:
              0.0902894 = queryWeight, product of:
                2.0105267 = boost
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.018565604 = queryNorm
              0.21380253 = fieldWeight in 3706, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.0625 = fieldNorm(doc=3706)
          0.23728776 = weight(abstract_txt:hyperlink in 3706) [ClassicSimilarity], result of:
            0.23728776 = score(doc=3706,freq=1.0), product of:
              0.48089904 = queryWeight, product of:
                3.2809787 = boost
                7.894805 = idf(docFreq=44, maxDocs=44421)
                0.018565604 = queryNorm
              0.4934253 = fieldWeight in 3706, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.894805 = idf(docFreq=44, maxDocs=44421)
                0.0625 = fieldNorm(doc=3706)
          0.40096393 = weight(abstract_txt:link in 3706) [ClassicSimilarity], result of:
            0.40096393 = score(doc=3706,freq=5.0), product of:
              0.5026826 = queryWeight, product of:
                4.7439313 = boost
                5.707506 = idf(docFreq=400, maxDocs=44421)
                0.018565604 = queryNorm
              0.79764825 = fieldWeight in 3706, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.707506 = idf(docFreq=400, maxDocs=44421)
                0.0625 = fieldNorm(doc=3706)
        0.4 = coord(6/15)
  4. Thelwall, M.: ¬A comparison of link and URL citation counting (2011) 0.30
    0.29915184 = sum of:
      0.29915184 = product of:
        0.89745545 = sum of:
          0.009501236 = weight(abstract_txt:this in 533) [ClassicSimilarity], result of:
            0.009501236 = score(doc=533,freq=2.0), product of:
              0.0446732 = queryWeight, product of:
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.018565604 = queryNorm
              0.21268311 = fieldWeight in 533, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.0625 = fieldNorm(doc=533)
          0.073384725 = weight(abstract_txt:significant in 533) [ClassicSimilarity], result of:
            0.073384725 = score(doc=533,freq=2.0), product of:
              0.17455441 = queryWeight, product of:
                1.9767051 = boost
                4.7564163 = idf(docFreq=1037, maxDocs=44421)
                0.018565604 = queryNorm
              0.42041177 = fieldWeight in 533, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.7564163 = idf(docFreq=1037, maxDocs=44421)
                0.0625 = fieldNorm(doc=533)
          0.23728776 = weight(abstract_txt:hyperlink in 533) [ClassicSimilarity], result of:
            0.23728776 = score(doc=533,freq=1.0), product of:
              0.48089904 = queryWeight, product of:
                3.2809787 = boost
                7.894805 = idf(docFreq=44, maxDocs=44421)
                0.018565604 = queryNorm
              0.4934253 = fieldWeight in 533, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.894805 = idf(docFreq=44, maxDocs=44421)
                0.0625 = fieldNorm(doc=533)
          0.07009811 = weight(abstract_txt:analysis in 533) [ClassicSimilarity], result of:
            0.07009811 = score(doc=533,freq=1.0), product of:
              0.3076439 = queryWeight, product of:
                4.545286 = boost
                3.6456752 = idf(docFreq=3151, maxDocs=44421)
                0.018565604 = queryNorm
              0.2278547 = fieldWeight in 533, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6456752 = idf(docFreq=3151, maxDocs=44421)
                0.0625 = fieldNorm(doc=533)
          0.5071837 = weight(abstract_txt:link in 533) [ClassicSimilarity], result of:
            0.5071837 = score(doc=533,freq=8.0), product of:
              0.5026826 = queryWeight, product of:
                4.7439313 = boost
                5.707506 = idf(docFreq=400, maxDocs=44421)
                0.018565604 = queryNorm
              1.008954 = fieldWeight in 533, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                5.707506 = idf(docFreq=400, maxDocs=44421)
                0.0625 = fieldNorm(doc=533)
        0.33333334 = coord(5/15)
  5. Thelwall, M.; Li, X.; Barjak, F.; Robinson, S.: Assessing the international web connectivity of research groups (2008) 0.30
    0.29848757 = sum of:
      0.29848757 = product of:
        0.7462189 = sum of:
          0.01163659 = weight(abstract_txt:this in 2401) [ClassicSimilarity], result of:
            0.01163659 = score(doc=2401,freq=3.0), product of:
              0.0446732 = queryWeight, product of:
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.018565604 = queryNorm
              0.26048255 = fieldWeight in 2401, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.0625 = fieldNorm(doc=2401)
          0.061666768 = weight(abstract_txt:field in 2401) [ClassicSimilarity], result of:
            0.061666768 = score(doc=2401,freq=2.0), product of:
              0.15543942 = queryWeight, product of:
                1.8653358 = boost
                4.4884357 = idf(docFreq=1356, maxDocs=44421)
                0.018565604 = queryNorm
              0.39672542 = fieldWeight in 2401, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4884357 = idf(docFreq=1356, maxDocs=44421)
                0.0625 = fieldNorm(doc=2401)
          0.013650062 = weight(abstract_txt:information in 2401) [ClassicSimilarity], result of:
            0.013650062 = score(doc=2401,freq=1.0), product of:
              0.0902894 = queryWeight, product of:
                2.0105267 = boost
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.018565604 = queryNorm
              0.15118122 = fieldWeight in 2401, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.0625 = fieldNorm(doc=2401)
          0.33557555 = weight(abstract_txt:hyperlink in 2401) [ClassicSimilarity], result of:
            0.33557555 = score(doc=2401,freq=2.0), product of:
              0.48089904 = queryWeight, product of:
                3.2809787 = boost
                7.894805 = idf(docFreq=44, maxDocs=44421)
                0.018565604 = queryNorm
              0.69780874 = fieldWeight in 2401, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.894805 = idf(docFreq=44, maxDocs=44421)
                0.0625 = fieldNorm(doc=2401)
          0.07009811 = weight(abstract_txt:analysis in 2401) [ClassicSimilarity], result of:
            0.07009811 = score(doc=2401,freq=1.0), product of:
              0.3076439 = queryWeight, product of:
                4.545286 = boost
                3.6456752 = idf(docFreq=3151, maxDocs=44421)
                0.018565604 = queryNorm
              0.2278547 = fieldWeight in 2401, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6456752 = idf(docFreq=3151, maxDocs=44421)
                0.0625 = fieldNorm(doc=2401)
          0.25359184 = weight(abstract_txt:link in 2401) [ClassicSimilarity], result of:
            0.25359184 = score(doc=2401,freq=2.0), product of:
              0.5026826 = queryWeight, product of:
                4.7439313 = boost
                5.707506 = idf(docFreq=400, maxDocs=44421)
                0.018565604 = queryNorm
              0.504477 = fieldWeight in 2401, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.707506 = idf(docFreq=400, maxDocs=44421)
                0.0625 = fieldNorm(doc=2401)
        0.4 = coord(6/15)