Document (#29922)

Calado, P.
Cristo, M.
Gonçalves, M.A.
Moura, E.S. de
Ribeiro-Neto, B.
Ziviani, N.
Link-based similarity measures for the classification of Web documents
Journal of the American Society for Information Science and Technology. 57(2006) no.2, S.208-221
Traditional text-based document classifiers tend to perform poorly an the Web. Text in Web documents is usually noisy and often does not contain enough information to determine their topic. However, the Web provides a different source that can be useful to document classification: its hyperlink structure. In this work, the authors evaluate how the link structure of the Web can be used to determine a measure of similarity appropriate for document classification. They experiment with five different similarity measures and determine their adequacy for predicting the topic of a Web page. Tests performed an a Web directory Show that link information alone allows classifying documents with an average precision of 86%. Further, when combined with a traditional textbased classifier, precision increases to values of up to 90%, representing gains that range from 63 to 132% over the use of text-based classification alone. Because the measures proposed in this article are straightforward to compute, they provide a practical and effective solution for Web classification and related information retrieval tasks. Further, the authors provide an important set of guidelines an how link structure can be used effectively to classify Web documents.
Automatisches Klassifizieren

Similar documents (author)

  1. Couto, T.; Cristo, M.; Gonçalves, M.A.; Calado, P.; Ziviani, N.; Moura, E.; Ribeiro-Neto, B.: ¬A comparative study of citations and links in document classification (2006) 5.62
    5.6180835 = sum of:
      5.6180835 = sum of:
        0.75528246 = weight(author_txt:gonçalves in 3531) [ClassicSimilarity], result of:
          0.75528246 = score(doc=3531,freq=1.0), product of:
            0.35268962 = queryWeight, product of:
              8.565973 = idf(docFreq=22, maxDocs=44421)
              0.041173328 = queryNorm
            2.1414933 = fieldWeight in 3531, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.565973 = idf(docFreq=22, maxDocs=44421)
              0.25 = fieldNorm(doc=3531)
        0.79285836 = weight(author_txt:moura in 3531) [ClassicSimilarity], result of:
          0.79285836 = score(doc=3531,freq=1.0), product of:
            0.36429244 = queryWeight, product of:
              1.0163159 = boost
              8.705735 = idf(docFreq=19, maxDocs=44421)
              0.041173328 = queryNorm
            2.1764338 = fieldWeight in 3531, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.705735 = idf(docFreq=19, maxDocs=44421)
              0.25 = fieldNorm(doc=3531)
        0.80695546 = weight(author_txt:ribeiro in 3531) [ClassicSimilarity], result of:
          0.80695546 = score(doc=3531,freq=1.0), product of:
            0.36859784 = queryWeight, product of:
              1.0223039 = boost
              8.757029 = idf(docFreq=18, maxDocs=44421)
              0.041173328 = queryNorm
            2.1892571 = fieldWeight in 3531, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.757029 = idf(docFreq=18, maxDocs=44421)
              0.25 = fieldNorm(doc=3531)
        1.0316488 = weight(author_txt:neto in 3531) [ClassicSimilarity], result of:
          1.0316488 = score(doc=3531,freq=1.0), product of:
            0.4341845 = queryWeight, product of:
              1.1095345 = boost
              9.504243 = idf(docFreq=8, maxDocs=44421)
              0.041173328 = queryNorm
            2.3760607 = fieldWeight in 3531, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              9.504243 = idf(docFreq=8, maxDocs=44421)
              0.25 = fieldNorm(doc=3531)
        1.1156694 = weight(author_txt:cristo in 3531) [ClassicSimilarity], result of:
          1.1156694 = score(doc=3531,freq=1.0), product of:
            0.4574498 = queryWeight, product of:
              1.1388732 = boost
              9.755557 = idf(docFreq=6, maxDocs=44421)
              0.041173328 = queryNorm
            2.4388893 = fieldWeight in 3531, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              9.755557 = idf(docFreq=6, maxDocs=44421)
              0.25 = fieldNorm(doc=3531)
        1.1156694 = weight(author_txt:ziviani in 3531) [ClassicSimilarity], result of:
          1.1156694 = score(doc=3531,freq=1.0), product of:
            0.4574498 = queryWeight, product of:
              1.1388732 = boost
              9.755557 = idf(docFreq=6, maxDocs=44421)
              0.041173328 = queryNorm
            2.4388893 = fieldWeight in 3531, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              9.755557 = idf(docFreq=6, maxDocs=44421)
              0.25 = fieldNorm(doc=3531)
  2. Pereira, D.A.; Ribeiro-Neto, B.; Ziviani, N.; Laender, A.H.F.; Gonçalves, M.A.: ¬A generic Web-based entity resolution framework (2011) 2.47
    2.4730375 = sum of:
      2.4730375 = product of:
        3.709556 = sum of:
          0.75528246 = weight(author_txt:gonçalves in 450) [ClassicSimilarity], result of:
            0.75528246 = score(doc=450,freq=1.0), product of:
              0.35268962 = queryWeight, product of:
                8.565973 = idf(docFreq=22, maxDocs=44421)
                0.041173328 = queryNorm
              2.1414933 = fieldWeight in 450, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.565973 = idf(docFreq=22, maxDocs=44421)
                0.25 = fieldNorm(doc=450)
          0.80695546 = weight(author_txt:ribeiro in 450) [ClassicSimilarity], result of:
            0.80695546 = score(doc=450,freq=1.0), product of:
              0.36859784 = queryWeight, product of:
                1.0223039 = boost
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.041173328 = queryNorm
              2.1892571 = fieldWeight in 450, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.25 = fieldNorm(doc=450)
          1.0316488 = weight(author_txt:neto in 450) [ClassicSimilarity], result of:
            1.0316488 = score(doc=450,freq=1.0), product of:
              0.4341845 = queryWeight, product of:
                1.1095345 = boost
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.041173328 = queryNorm
              2.3760607 = fieldWeight in 450, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.25 = fieldNorm(doc=450)
          1.1156694 = weight(author_txt:ziviani in 450) [ClassicSimilarity], result of:
            1.1156694 = score(doc=450,freq=1.0), product of:
              0.4574498 = queryWeight, product of:
                1.1388732 = boost
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.041173328 = queryNorm
              2.4388893 = fieldWeight in 450, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.25 = fieldNorm(doc=450)
        0.6666667 = coord(4/6)
  3. Moura, E.S. de; Fernandes, D.; Ribeiro-Neto, B.; Silva, A.S. da; Gonçalves, M.A.: Using structural information to improve search in Web collections (2010) 2.26
    2.2578301 = sum of:
      2.2578301 = product of:
        3.386745 = sum of:
          0.75528246 = weight(author_txt:gonçalves in 119) [ClassicSimilarity], result of:
            0.75528246 = score(doc=119,freq=1.0), product of:
              0.35268962 = queryWeight, product of:
                8.565973 = idf(docFreq=22, maxDocs=44421)
                0.041173328 = queryNorm
              2.1414933 = fieldWeight in 119, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.565973 = idf(docFreq=22, maxDocs=44421)
                0.25 = fieldNorm(doc=119)
          0.79285836 = weight(author_txt:moura in 119) [ClassicSimilarity], result of:
            0.79285836 = score(doc=119,freq=1.0), product of:
              0.36429244 = queryWeight, product of:
                1.0163159 = boost
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.041173328 = queryNorm
              2.1764338 = fieldWeight in 119, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.25 = fieldNorm(doc=119)
          0.80695546 = weight(author_txt:ribeiro in 119) [ClassicSimilarity], result of:
            0.80695546 = score(doc=119,freq=1.0), product of:
              0.36859784 = queryWeight, product of:
                1.0223039 = boost
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.041173328 = queryNorm
              2.1892571 = fieldWeight in 119, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.25 = fieldNorm(doc=119)
          1.0316488 = weight(author_txt:neto in 119) [ClassicSimilarity], result of:
            1.0316488 = score(doc=119,freq=1.0), product of:
              0.4341845 = queryWeight, product of:
                1.1095345 = boost
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.041173328 = queryNorm
              2.3760607 = fieldWeight in 119, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.25 = fieldNorm(doc=119)
        0.6666667 = coord(4/6)
  4. Silva, A.J.C.; Gonçalves, M.A.; Laender, A.H.F.; Modesto, M.A.B.; Cristo, M.; Ziviani, N.: Finding what is missing from a digital library : a case study in the computer science field (2009) 1.49
    1.4933107 = sum of:
      1.4933107 = product of:
        2.9866214 = sum of:
          0.75528246 = weight(author_txt:gonçalves in 219) [ClassicSimilarity], result of:
            0.75528246 = score(doc=219,freq=1.0), product of:
              0.35268962 = queryWeight, product of:
                8.565973 = idf(docFreq=22, maxDocs=44421)
                0.041173328 = queryNorm
              2.1414933 = fieldWeight in 219, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.565973 = idf(docFreq=22, maxDocs=44421)
                0.25 = fieldNorm(doc=219)
          1.1156694 = weight(author_txt:cristo in 219) [ClassicSimilarity], result of:
            1.1156694 = score(doc=219,freq=1.0), product of:
              0.4574498 = queryWeight, product of:
                1.1388732 = boost
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.041173328 = queryNorm
              2.4388893 = fieldWeight in 219, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.25 = fieldNorm(doc=219)
          1.1156694 = weight(author_txt:ziviani in 219) [ClassicSimilarity], result of:
            1.1156694 = score(doc=219,freq=1.0), product of:
              0.4574498 = queryWeight, product of:
                1.1388732 = boost
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.041173328 = queryNorm
              2.4388893 = fieldWeight in 219, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.25 = fieldNorm(doc=219)
        0.5 = coord(3/6)
  5. Silveira, M.; Ribeiro-Neto, B.: Concept-based ranking : a case study in the juridical domain (2004) 1.07
    1.0725192 = sum of:
      1.0725192 = product of:
        3.2175574 = sum of:
          1.4121721 = weight(author_txt:ribeiro in 3339) [ClassicSimilarity], result of:
            1.4121721 = score(doc=3339,freq=1.0), product of:
              0.36859784 = queryWeight, product of:
                1.0223039 = boost
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.041173328 = queryNorm
              3.8312001 = fieldWeight in 3339, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.4375 = fieldNorm(doc=3339)
          1.8053852 = weight(author_txt:neto in 3339) [ClassicSimilarity], result of:
            1.8053852 = score(doc=3339,freq=1.0), product of:
              0.4341845 = queryWeight, product of:
                1.1095345 = boost
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.041173328 = queryNorm
              4.1581063 = fieldWeight in 3339, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.4375 = fieldNorm(doc=3339)
        0.33333334 = coord(2/6)

Similar documents (content)

  1. Couto, T.; Cristo, M.; Gonçalves, M.A.; Calado, P.; Ziviani, N.; Moura, E.; Ribeiro-Neto, B.: ¬A comparative study of citations and links in document classification (2006) 0.35
    0.3462817 = sum of:
      0.3462817 = product of:
        0.86570424 = sum of:
          0.11318088 = weight(abstract_txt:classifiers in 3531) [ClassicSimilarity], result of:
            0.11318088 = score(doc=3531,freq=2.0), product of:
              0.17011848 = queryWeight, product of:
                1.0042005 = boost
                7.5270805 = idf(docFreq=64, maxDocs=44421)
                0.02250632 = queryNorm
              0.6653062 = fieldWeight in 3531, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.5270805 = idf(docFreq=64, maxDocs=44421)
                0.0625 = fieldNorm(doc=3531)
          0.1274157 = weight(abstract_txt:gains in 3531) [ClassicSimilarity], result of:
            0.1274157 = score(doc=3531,freq=2.0), product of:
              0.184099 = queryWeight, product of:
                1.044649 = boost
                7.8302665 = idf(docFreq=47, maxDocs=44421)
                0.02250632 = queryNorm
              0.6921043 = fieldWeight in 3531, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.8302665 = idf(docFreq=47, maxDocs=44421)
                0.0625 = fieldNorm(doc=3531)
          0.03796097 = weight(abstract_txt:traditional in 3531) [ClassicSimilarity], result of:
            0.03796097 = score(doc=3531,freq=1.0), product of:
              0.13036099 = queryWeight, product of:
                1.2431785 = boost
                4.6591816 = idf(docFreq=1143, maxDocs=44421)
                0.02250632 = queryNorm
              0.29119885 = fieldWeight in 3531, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6591816 = idf(docFreq=1143, maxDocs=44421)
                0.0625 = fieldNorm(doc=3531)
          0.03796097 = weight(abstract_txt:further in 3531) [ClassicSimilarity], result of:
            0.03796097 = score(doc=3531,freq=1.0), product of:
              0.13036099 = queryWeight, product of:
                1.2431785 = boost
                4.6591816 = idf(docFreq=1143, maxDocs=44421)
                0.02250632 = queryNorm
              0.29119885 = fieldWeight in 3531, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6591816 = idf(docFreq=1143, maxDocs=44421)
                0.0625 = fieldNorm(doc=3531)
          0.048038486 = weight(abstract_txt:based in 3531) [ClassicSimilarity], result of:
            0.048038486 = score(doc=3531,freq=7.0), product of:
              0.09126692 = queryWeight, product of:
                1.2739782 = boost
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.02250632 = queryNorm
              0.5263516 = fieldWeight in 3531, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.0625 = fieldNorm(doc=3531)
          0.06434115 = weight(abstract_txt:text in 3531) [ClassicSimilarity], result of:
            0.06434115 = score(doc=3531,freq=3.0), product of:
              0.14708622 = queryWeight, product of:
                1.6173027 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.02250632 = queryNorm
              0.4374383 = fieldWeight in 3531, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0625 = fieldNorm(doc=3531)
          0.12710764 = weight(abstract_txt:measures in 3531) [ClassicSimilarity], result of:
            0.12710764 = score(doc=3531,freq=2.0), product of:
              0.26508856 = queryWeight, product of:
                2.1712048 = boost
                5.424824 = idf(docFreq=531, maxDocs=44421)
                0.02250632 = queryNorm
              0.47949123 = fieldWeight in 3531, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.424824 = idf(docFreq=531, maxDocs=44421)
                0.0625 = fieldNorm(doc=3531)
          0.052623548 = weight(abstract_txt:documents in 3531) [ClassicSimilarity], result of:
            0.052623548 = score(doc=3531,freq=1.0), product of:
              0.20419864 = queryWeight, product of:
                2.2003973 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.02250632 = queryNorm
              0.25770763 = fieldWeight in 3531, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.0625 = fieldNorm(doc=3531)
          0.059699677 = weight(abstract_txt:classification in 3531) [ClassicSimilarity], result of:
            0.059699677 = score(doc=3531,freq=1.0), product of:
              0.2392677 = queryWeight, product of:
                2.6630034 = boost
                3.9921594 = idf(docFreq=2228, maxDocs=44421)
                0.02250632 = queryNorm
              0.24950996 = fieldWeight in 3531, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9921594 = idf(docFreq=2228, maxDocs=44421)
                0.0625 = fieldNorm(doc=3531)
          0.19737518 = weight(abstract_txt:link in 3531) [ClassicSimilarity], result of:
            0.19737518 = score(doc=3531,freq=2.0), product of:
              0.39124712 = queryWeight, product of:
                3.045792 = boost
                5.707506 = idf(docFreq=400, maxDocs=44421)
                0.02250632 = queryNorm
              0.504477 = fieldWeight in 3531, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.707506 = idf(docFreq=400, maxDocs=44421)
                0.0625 = fieldNorm(doc=3531)
        0.4 = coord(10/25)
  2. Yang, P.; Gao, W.; Tan, Q.; Wong, K.-F.: ¬A link-bridged topic model for cross-domain document classification (2013) 0.31
    0.31029418 = sum of:
      0.31029418 = product of:
        0.8619283 = sum of:
          0.09234271 = weight(abstract_txt:hyperlink in 3706) [ClassicSimilarity], result of:
            0.09234271 = score(doc=3706,freq=1.0), product of:
              0.18714628 = queryWeight, product of:
                1.0532593 = boost
                7.894805 = idf(docFreq=44, maxDocs=44421)
                0.02250632 = queryNorm
              0.4934253 = fieldWeight in 3706, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.894805 = idf(docFreq=44, maxDocs=44421)
                0.0625 = fieldNorm(doc=3706)
          0.018156841 = weight(abstract_txt:based in 3706) [ClassicSimilarity], result of:
            0.018156841 = score(doc=3706,freq=1.0), product of:
              0.09126692 = queryWeight, product of:
                1.2739782 = boost
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.02250632 = queryNorm
              0.1989422 = fieldWeight in 3706, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.0625 = fieldNorm(doc=3706)
          0.083910786 = weight(abstract_txt:topic in 3706) [ClassicSimilarity], result of:
            0.083910786 = score(doc=3706,freq=3.0), product of:
              0.15337723 = queryWeight, product of:
                1.3484664 = boost
                5.053779 = idf(docFreq=770, maxDocs=44421)
                0.02250632 = queryNorm
              0.5470876 = fieldWeight in 3706, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.053779 = idf(docFreq=770, maxDocs=44421)
                0.0625 = fieldNorm(doc=3706)
          0.03714738 = weight(abstract_txt:text in 3706) [ClassicSimilarity], result of:
            0.03714738 = score(doc=3706,freq=1.0), product of:
              0.14708622 = queryWeight, product of:
                1.6173027 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.02250632 = queryNorm
              0.25255513 = fieldWeight in 3706, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0625 = fieldNorm(doc=3706)
          0.06304436 = weight(abstract_txt:document in 3706) [ClassicSimilarity], result of:
            0.06304436 = score(doc=3706,freq=2.0), product of:
              0.16610168 = queryWeight, product of:
                1.7186693 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.02250632 = queryNorm
              0.3795528 = fieldWeight in 3706, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.0625 = fieldNorm(doc=3706)
          0.046598684 = weight(abstract_txt:structure in 3706) [ClassicSimilarity], result of:
            0.046598684 = score(doc=3706,freq=1.0), product of:
              0.1710812 = queryWeight, product of:
                1.7442408 = boost
                4.3580413 = idf(docFreq=1545, maxDocs=44421)
                0.02250632 = queryNorm
              0.27237758 = fieldWeight in 3706, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3580413 = idf(docFreq=1545, maxDocs=44421)
                0.0625 = fieldNorm(doc=3706)
          0.105247095 = weight(abstract_txt:documents in 3706) [ClassicSimilarity], result of:
            0.105247095 = score(doc=3706,freq=4.0), product of:
              0.20419864 = queryWeight, product of:
                2.2003973 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.02250632 = queryNorm
              0.51541525 = fieldWeight in 3706, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.0625 = fieldNorm(doc=3706)
          0.10340287 = weight(abstract_txt:classification in 3706) [ClassicSimilarity], result of:
            0.10340287 = score(doc=3706,freq=3.0), product of:
              0.2392677 = queryWeight, product of:
                2.6630034 = boost
                3.9921594 = idf(docFreq=2228, maxDocs=44421)
                0.02250632 = queryNorm
              0.43216392 = fieldWeight in 3706, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.9921594 = idf(docFreq=2228, maxDocs=44421)
                0.0625 = fieldNorm(doc=3706)
          0.31207758 = weight(abstract_txt:link in 3706) [ClassicSimilarity], result of:
            0.31207758 = score(doc=3706,freq=5.0), product of:
              0.39124712 = queryWeight, product of:
                3.045792 = boost
                5.707506 = idf(docFreq=400, maxDocs=44421)
                0.02250632 = queryNorm
              0.79764825 = fieldWeight in 3706, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.707506 = idf(docFreq=400, maxDocs=44421)
                0.0625 = fieldNorm(doc=3706)
        0.36 = coord(9/25)
  3. Sun, A.; Lim, E.-P.; Ng, W.-K.: Performance measurement framework for hierarchical text classification (2003) 0.27
    0.27382085 = sum of:
      0.27382085 = product of:
        0.8556901 = sum of:
          0.17895469 = weight(abstract_txt:classifiers in 2808) [ClassicSimilarity], result of:
            0.17895469 = score(doc=2808,freq=5.0), product of:
              0.17011848 = queryWeight, product of:
                1.0042005 = boost
                7.5270805 = idf(docFreq=64, maxDocs=44421)
                0.02250632 = queryNorm
              1.0519415 = fieldWeight in 2808, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.5270805 = idf(docFreq=64, maxDocs=44421)
                0.0625 = fieldNorm(doc=2808)
          0.025677651 = weight(abstract_txt:based in 2808) [ClassicSimilarity], result of:
            0.025677651 = score(doc=2808,freq=2.0), product of:
              0.09126692 = queryWeight, product of:
                1.2739782 = boost
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.02250632 = queryNorm
              0.28134674 = fieldWeight in 2808, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.0625 = fieldNorm(doc=2808)
          0.03714738 = weight(abstract_txt:text in 2808) [ClassicSimilarity], result of:
            0.03714738 = score(doc=2808,freq=1.0), product of:
              0.14708622 = queryWeight, product of:
                1.6173027 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.02250632 = queryNorm
              0.25255513 = fieldWeight in 2808, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0625 = fieldNorm(doc=2808)
          0.044579092 = weight(abstract_txt:document in 2808) [ClassicSimilarity], result of:
            0.044579092 = score(doc=2808,freq=1.0), product of:
              0.16610168 = queryWeight, product of:
                1.7186693 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.02250632 = queryNorm
              0.26838437 = fieldWeight in 2808, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.0625 = fieldNorm(doc=2808)
          0.23779662 = weight(abstract_txt:measures in 2808) [ClassicSimilarity], result of:
            0.23779662 = score(doc=2808,freq=7.0), product of:
              0.26508856 = queryWeight, product of:
                2.1712048 = boost
                5.424824 = idf(docFreq=531, maxDocs=44421)
                0.02250632 = queryNorm
              0.89704597 = fieldWeight in 2808, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                5.424824 = idf(docFreq=531, maxDocs=44421)
                0.0625 = fieldNorm(doc=2808)
          0.07442094 = weight(abstract_txt:documents in 2808) [ClassicSimilarity], result of:
            0.07442094 = score(doc=2808,freq=2.0), product of:
              0.20419864 = queryWeight, product of:
                2.2003973 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.02250632 = queryNorm
              0.3644536 = fieldWeight in 2808, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.0625 = fieldNorm(doc=2808)
          0.110880025 = weight(abstract_txt:similarity in 2808) [ClassicSimilarity], result of:
            0.110880025 = score(doc=2808,freq=1.0), product of:
              0.30492198 = queryWeight, product of:
                2.3286257 = boost
                5.8181453 = idf(docFreq=358, maxDocs=44421)
                0.02250632 = queryNorm
              0.36363408 = fieldWeight in 2808, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8181453 = idf(docFreq=358, maxDocs=44421)
                0.0625 = fieldNorm(doc=2808)
          0.14623375 = weight(abstract_txt:classification in 2808) [ClassicSimilarity], result of:
            0.14623375 = score(doc=2808,freq=6.0), product of:
              0.2392677 = queryWeight, product of:
                2.6630034 = boost
                3.9921594 = idf(docFreq=2228, maxDocs=44421)
                0.02250632 = queryNorm
              0.61117214 = fieldWeight in 2808, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                3.9921594 = idf(docFreq=2228, maxDocs=44421)
                0.0625 = fieldNorm(doc=2808)
        0.32 = coord(8/25)
  4. Huang, L.; Milne, D.; Frank, E.; Witten, I.H.: Learning a concept-based document similarity measure (2012) 0.25
    0.24606748 = sum of:
      0.24606748 = product of:
        0.7689609 = sum of:
          0.034927044 = weight(abstract_txt:they in 1372) [ClassicSimilarity], result of:
            0.034927044 = score(doc=1372,freq=2.0), product of:
              0.08434914 = queryWeight, product of:
                3.7477977 = idf(docFreq=2845, maxDocs=44421)
                0.02250632 = queryNorm
              0.41407704 = fieldWeight in 1372, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.7477977 = idf(docFreq=2845, maxDocs=44421)
                0.078125 = fieldNorm(doc=1372)
          0.022696052 = weight(abstract_txt:based in 1372) [ClassicSimilarity], result of:
            0.022696052 = score(doc=1372,freq=1.0), product of:
              0.09126692 = queryWeight, product of:
                1.2739782 = boost
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.02250632 = queryNorm
              0.24867775 = fieldWeight in 1372, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.078125 = fieldNorm(doc=1372)
          0.046434224 = weight(abstract_txt:text in 1372) [ClassicSimilarity], result of:
            0.046434224 = score(doc=1372,freq=1.0), product of:
              0.14708622 = queryWeight, product of:
                1.6173027 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.02250632 = queryNorm
              0.3156939 = fieldWeight in 1372, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.078125 = fieldNorm(doc=1372)
          0.11144773 = weight(abstract_txt:document in 1372) [ClassicSimilarity], result of:
            0.11144773 = score(doc=1372,freq=4.0), product of:
              0.16610168 = queryWeight, product of:
                1.7186693 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.02250632 = queryNorm
              0.6709609 = fieldWeight in 1372, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.078125 = fieldNorm(doc=1372)
          0.15888456 = weight(abstract_txt:measures in 1372) [ClassicSimilarity], result of:
            0.15888456 = score(doc=1372,freq=2.0), product of:
              0.26508856 = queryWeight, product of:
                2.1712048 = boost
                5.424824 = idf(docFreq=531, maxDocs=44421)
                0.02250632 = queryNorm
              0.59936404 = fieldWeight in 1372, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.424824 = idf(docFreq=531, maxDocs=44421)
                0.078125 = fieldNorm(doc=1372)
          0.09302616 = weight(abstract_txt:documents in 1372) [ClassicSimilarity], result of:
            0.09302616 = score(doc=1372,freq=2.0), product of:
              0.20419864 = queryWeight, product of:
                2.2003973 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.02250632 = queryNorm
              0.455567 = fieldWeight in 1372, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.078125 = fieldNorm(doc=1372)
          0.19601004 = weight(abstract_txt:similarity in 1372) [ClassicSimilarity], result of:
            0.19601004 = score(doc=1372,freq=2.0), product of:
              0.30492198 = queryWeight, product of:
                2.3286257 = boost
                5.8181453 = idf(docFreq=358, maxDocs=44421)
                0.02250632 = queryNorm
              0.6428203 = fieldWeight in 1372, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.8181453 = idf(docFreq=358, maxDocs=44421)
                0.078125 = fieldNorm(doc=1372)
          0.10553511 = weight(abstract_txt:classification in 1372) [ClassicSimilarity], result of:
            0.10553511 = score(doc=1372,freq=2.0), product of:
              0.2392677 = queryWeight, product of:
                2.6630034 = boost
                3.9921594 = idf(docFreq=2228, maxDocs=44421)
                0.02250632 = queryNorm
              0.44107544 = fieldWeight in 1372, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.9921594 = idf(docFreq=2228, maxDocs=44421)
                0.078125 = fieldNorm(doc=1372)
        0.32 = coord(8/25)
  5. Haveliwala, T.: Context-Sensitive Web search (2005) 0.22
    0.22137311 = sum of:
      0.22137311 = product of:
        0.61492527 = sum of:
          0.036786295 = weight(abstract_txt:provide in 3567) [ClassicSimilarity], result of:
            0.036786295 = score(doc=3567,freq=3.0), product of:
              0.096753724 = queryWeight, product of:
                1.07101 = boost
                4.013929 = idf(docFreq=2180, maxDocs=44421)
                0.02250632 = queryNorm
              0.38020548 = fieldWeight in 3567, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.013929 = idf(docFreq=2180, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3567)
          0.032919675 = weight(abstract_txt:authors in 3567) [ClassicSimilarity], result of:
            0.032919675 = score(doc=3567,freq=1.0), product of:
              0.12958491 = queryWeight, product of:
                1.2394725 = boost
                4.6452923 = idf(docFreq=1159, maxDocs=44421)
                0.02250632 = queryNorm
              0.2540394 = fieldWeight in 3567, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6452923 = idf(docFreq=1159, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3567)
          0.057531536 = weight(abstract_txt:traditional in 3567) [ClassicSimilarity], result of:
            0.057531536 = score(doc=3567,freq=3.0), product of:
              0.13036099 = queryWeight, product of:
                1.2431785 = boost
                4.6591816 = idf(docFreq=1143, maxDocs=44421)
                0.02250632 = queryNorm
              0.44132477 = fieldWeight in 3567, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.6591816 = idf(docFreq=1143, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3567)
          0.015887238 = weight(abstract_txt:based in 3567) [ClassicSimilarity], result of:
            0.015887238 = score(doc=3567,freq=1.0), product of:
              0.09126692 = queryWeight, product of:
                1.2739782 = boost
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.02250632 = queryNorm
              0.17407443 = fieldWeight in 3567, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3567)
          0.057662927 = weight(abstract_txt:structure in 3567) [ClassicSimilarity], result of:
            0.057662927 = score(doc=3567,freq=2.0), product of:
              0.1710812 = queryWeight, product of:
                1.7442408 = boost
                4.3580413 = idf(docFreq=1545, maxDocs=44421)
                0.02250632 = queryNorm
              0.33705005 = fieldWeight in 3567, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3580413 = idf(docFreq=1545, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3567)
          0.09836867 = weight(abstract_txt:alone in 3567) [ClassicSimilarity], result of:
            0.09836867 = score(doc=3567,freq=1.0), product of:
              0.26883674 = queryWeight, product of:
                1.7852703 = boost
                6.690832 = idf(docFreq=149, maxDocs=44421)
                0.02250632 = queryNorm
              0.36590487 = fieldWeight in 3567, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.690832 = idf(docFreq=149, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3567)
          0.046045605 = weight(abstract_txt:documents in 3567) [ClassicSimilarity], result of:
            0.046045605 = score(doc=3567,freq=1.0), product of:
              0.20419864 = queryWeight, product of:
                2.2003973 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.02250632 = queryNorm
              0.22549418 = fieldWeight in 3567, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3567)
          0.09702002 = weight(abstract_txt:similarity in 3567) [ClassicSimilarity], result of:
            0.09702002 = score(doc=3567,freq=1.0), product of:
              0.30492198 = queryWeight, product of:
                2.3286257 = boost
                5.8181453 = idf(docFreq=358, maxDocs=44421)
                0.02250632 = queryNorm
              0.31817982 = fieldWeight in 3567, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8181453 = idf(docFreq=358, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3567)
          0.17270328 = weight(abstract_txt:link in 3567) [ClassicSimilarity], result of:
            0.17270328 = score(doc=3567,freq=2.0), product of:
              0.39124712 = queryWeight, product of:
                3.045792 = boost
                5.707506 = idf(docFreq=400, maxDocs=44421)
                0.02250632 = queryNorm
              0.4414174 = fieldWeight in 3567, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.707506 = idf(docFreq=400, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3567)
        0.36 = coord(9/25)