Document (#32954)

Author
Shen, D.
Yang, Q.
Chen, Z.
Title
Noise reduction through summarization for Web-page classification
Source
Information processing and management. 43(2007) no.6, S.1735-1747
Year
2007
Abstract
Due to a large variety of noisy information embedded in Web pages, Web-page classification is much more difficult than pure-text classification. In this paper, we propose to improve the Web-page classification performance by removing the noise through summarization techniques. We first give empirical evidence that ideal Web-page summaries generated by human editors can indeed improve the performance of Web-page classification algorithms. We then put forward a new Web-page summarization algorithm based on Web-page layout and evaluate it along with several other state-of-the-art text summarization algorithms on the LookSmart Web directory. Experimental results show that the classification algorithms (NB or SVM) augmented by any summarization approach can achieve an improvement by more than 5.0% as compared to pure-text-based classification algorithms. We further introduce an ensemble method to combine the different summarization algorithms. The ensemble summarization method achieves more than 12.0% improvement over pure-text based methods.
Theme
Automatisches Abstracting

Similar documents (author)

  1. Shen, D.; Chen, Z.; Yang, Q.; Zeng, H.J.; Zhang, B.; Lu, Y.; Ma, W.Y.: Web page classification through summarization (2004) 3.16
    3.1593757 = sum of:
      3.1593757 = sum of:
        0.59460276 = weight(author_txt:chen in 5132) [ClassicSimilarity], result of:
          0.59460276 = score(doc=5132,freq=1.0), product of:
            0.38755605 = queryWeight, product of:
              6.136947 = idf(docFreq=260, maxDocs=44421)
              0.06315128 = queryNorm
            1.5342368 = fieldWeight in 5132, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              6.136947 = idf(docFreq=260, maxDocs=44421)
              0.25 = fieldNorm(doc=5132)
        0.9478105 = weight(author_txt:yang in 5132) [ClassicSimilarity], result of:
          0.9478105 = score(doc=5132,freq=1.0), product of:
            0.52884805 = queryWeight, product of:
              1.1681489 = boost
              7.168868 = idf(docFreq=92, maxDocs=44421)
              0.06315128 = queryNorm
            1.792217 = fieldWeight in 5132, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              7.168868 = idf(docFreq=92, maxDocs=44421)
              0.25 = fieldNorm(doc=5132)
        1.6169623 = weight(author_txt:shen in 5132) [ClassicSimilarity], result of:
          1.6169623 = score(doc=5132,freq=1.0), product of:
            0.755063 = queryWeight, product of:
              1.3958037 = boost
              8.565973 = idf(docFreq=22, maxDocs=44421)
              0.06315128 = queryNorm
            2.1414933 = fieldWeight in 5132, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.565973 = idf(docFreq=22, maxDocs=44421)
              0.25 = fieldNorm(doc=5132)
    
  2. Chen, Y.-H.; Germain, C.A.; Yang, H.: ¬An exploration into the practices of library Web usability in ARL academic libraries (2009) 1.54
    1.5424131 = sum of:
      1.5424131 = product of:
        2.3136196 = sum of:
          0.8919041 = weight(author_txt:chen in 3798) [ClassicSimilarity], result of:
            0.8919041 = score(doc=3798,freq=1.0), product of:
              0.38755605 = queryWeight, product of:
                6.136947 = idf(docFreq=260, maxDocs=44421)
                0.06315128 = queryNorm
              2.3013551 = fieldWeight in 3798, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.136947 = idf(docFreq=260, maxDocs=44421)
                0.375 = fieldNorm(doc=3798)
          1.4217156 = weight(author_txt:yang in 3798) [ClassicSimilarity], result of:
            1.4217156 = score(doc=3798,freq=1.0), product of:
              0.52884805 = queryWeight, product of:
                1.1681489 = boost
                7.168868 = idf(docFreq=92, maxDocs=44421)
                0.06315128 = queryNorm
              2.6883254 = fieldWeight in 3798, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.168868 = idf(docFreq=92, maxDocs=44421)
                0.375 = fieldNorm(doc=3798)
        0.6666667 = coord(2/3)
    
  3. Liu, D.-R.; Chen, Y.-H.; Shen, M.; Lu, P.-J.: Complementary QA network analysis for QA retrieval in social question-answering websites (2015) 1.47
    1.4743767 = sum of:
      1.4743767 = product of:
        2.211565 = sum of:
          0.59460276 = weight(author_txt:chen in 2611) [ClassicSimilarity], result of:
            0.59460276 = score(doc=2611,freq=1.0), product of:
              0.38755605 = queryWeight, product of:
                6.136947 = idf(docFreq=260, maxDocs=44421)
                0.06315128 = queryNorm
              1.5342368 = fieldWeight in 2611, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.136947 = idf(docFreq=260, maxDocs=44421)
                0.25 = fieldNorm(doc=2611)
          1.6169623 = weight(author_txt:shen in 2611) [ClassicSimilarity], result of:
            1.6169623 = score(doc=2611,freq=1.0), product of:
              0.755063 = queryWeight, product of:
                1.3958037 = boost
                8.565973 = idf(docFreq=22, maxDocs=44421)
                0.06315128 = queryNorm
              2.1414933 = fieldWeight in 2611, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.565973 = idf(docFreq=22, maxDocs=44421)
                0.25 = fieldNorm(doc=2611)
        0.6666667 = coord(2/3)
    
  4. Shen, X.-L.; Li, Y.-J.; Sun, Y.; Chen, J.; Wang, F.: Knowledge withholding in online knowledge spaces : social deviance behavior and secondary control perspective (2019) 1.47
    1.4743767 = sum of:
      1.4743767 = product of:
        2.211565 = sum of:
          0.59460276 = weight(author_txt:chen in 16) [ClassicSimilarity], result of:
            0.59460276 = score(doc=16,freq=1.0), product of:
              0.38755605 = queryWeight, product of:
                6.136947 = idf(docFreq=260, maxDocs=44421)
                0.06315128 = queryNorm
              1.5342368 = fieldWeight in 16, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.136947 = idf(docFreq=260, maxDocs=44421)
                0.25 = fieldNorm(doc=16)
          1.6169623 = weight(author_txt:shen in 16) [ClassicSimilarity], result of:
            1.6169623 = score(doc=16,freq=1.0), product of:
              0.755063 = queryWeight, product of:
                1.3958037 = boost
                8.565973 = idf(docFreq=22, maxDocs=44421)
                0.06315128 = queryNorm
              2.1414933 = fieldWeight in 16, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.565973 = idf(docFreq=22, maxDocs=44421)
                0.25 = fieldNorm(doc=16)
        0.6666667 = coord(2/3)
    
  5. Shen, Z.: CJK: the unique need of Chinese, Japanese, and Korean language cataloging (1993) 1.35
    1.3474686 = sum of:
      1.3474686 = product of:
        4.0424056 = sum of:
          4.0424056 = weight(author_txt:shen in 4726) [ClassicSimilarity], result of:
            4.0424056 = score(doc=4726,freq=1.0), product of:
              0.755063 = queryWeight, product of:
                1.3958037 = boost
                8.565973 = idf(docFreq=22, maxDocs=44421)
                0.06315128 = queryNorm
              5.353733 = fieldWeight in 4726, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.565973 = idf(docFreq=22, maxDocs=44421)
                0.625 = fieldNorm(doc=4726)
        0.33333334 = coord(1/3)
    

Similar documents (content)

  1. Reeve, L.H.; Han, H.; Brooks, A.D.: ¬The use of domain-specific concepts in biomedical text summarization (2007) 0.22
    0.21928166 = sum of:
      0.21928166 = product of:
        0.78314877 = sum of:
          0.052043334 = weight(abstract_txt:summaries in 1955) [ClassicSimilarity], result of:
            0.052043334 = score(doc=1955,freq=2.0), product of:
              0.08366023 = queryWeight, product of:
                1.0310211 = boost
                7.0380287 = idf(docFreq=105, maxDocs=44421)
                0.011529234 = queryNorm
              0.62207973 = fieldWeight in 1955, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.0380287 = idf(docFreq=105, maxDocs=44421)
                0.0625 = fieldNorm(doc=1955)
          0.055758286 = weight(abstract_txt:reduction in 1955) [ClassicSimilarity], result of:
            0.055758286 = score(doc=1955,freq=2.0), product of:
              0.08759553 = queryWeight, product of:
                1.0549916 = boost
                7.201658 = idf(docFreq=89, maxDocs=44421)
                0.011529234 = queryNorm
              0.6365426 = fieldWeight in 1955, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.201658 = idf(docFreq=89, maxDocs=44421)
                0.0625 = fieldNorm(doc=1955)
          0.0470863 = weight(abstract_txt:method in 1955) [ClassicSimilarity], result of:
            0.0470863 = score(doc=1955,freq=6.0), product of:
              0.06836622 = queryWeight, product of:
                1.3180863 = boost
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.011529234 = queryNorm
              0.6887363 = fieldWeight in 1955, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.0625 = fieldNorm(doc=1955)
          0.02081541 = weight(abstract_txt:performance in 1955) [ClassicSimilarity], result of:
            0.02081541 = score(doc=1955,freq=1.0), product of:
              0.07209176 = queryWeight, product of:
                1.3535236 = boost
                4.619759 = idf(docFreq=1189, maxDocs=44421)
                0.011529234 = queryNorm
              0.28873494 = fieldWeight in 1955, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.619759 = idf(docFreq=1189, maxDocs=44421)
                0.0625 = fieldNorm(doc=1955)
          0.010213133 = weight(abstract_txt:based in 1955) [ClassicSimilarity], result of:
            0.010213133 = score(doc=1955,freq=1.0), product of:
              0.051337186 = queryWeight, product of:
                1.3988936 = boost
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.011529234 = queryNorm
              0.1989422 = fieldWeight in 1955, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.0625 = fieldNorm(doc=1955)
          0.062297493 = weight(abstract_txt:text in 1955) [ClassicSimilarity], result of:
            0.062297493 = score(doc=1955,freq=5.0), product of:
              0.11031368 = queryWeight, product of:
                2.367842 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.011529234 = queryNorm
              0.56473047 = fieldWeight in 1955, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0625 = fieldNorm(doc=1955)
          0.5349348 = weight(abstract_txt:summarization in 1955) [ClassicSimilarity], result of:
            0.5349348 = score(doc=1955,freq=4.0), product of:
              0.6004805 = queryWeight, product of:
                7.308134 = boost
                7.1267567 = idf(docFreq=96, maxDocs=44421)
                0.011529234 = queryNorm
              0.8908446 = fieldWeight in 1955, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.1267567 = idf(docFreq=96, maxDocs=44421)
                0.0625 = fieldNorm(doc=1955)
        0.28 = coord(7/25)
    
  2. Sankarasubramaniam, Y.; Ramanathan, K.; Ghosh, S.: Text summarization using Wikipedia (2014) 0.22
    0.21741013 = sum of:
      0.21741013 = product of:
        0.90587556 = sum of:
          0.02081541 = weight(abstract_txt:performance in 3693) [ClassicSimilarity], result of:
            0.02081541 = score(doc=3693,freq=1.0), product of:
              0.07209176 = queryWeight, product of:
                1.3535236 = boost
                4.619759 = idf(docFreq=1189, maxDocs=44421)
                0.011529234 = queryNorm
              0.28873494 = fieldWeight in 3693, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.619759 = idf(docFreq=1189, maxDocs=44421)
                0.0625 = fieldNorm(doc=3693)
          0.017689666 = weight(abstract_txt:based in 3693) [ClassicSimilarity], result of:
            0.017689666 = score(doc=3693,freq=3.0), product of:
              0.051337186 = queryWeight, product of:
                1.3988936 = boost
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.011529234 = queryNorm
              0.344578 = fieldWeight in 3693, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.0625 = fieldNorm(doc=3693)
          0.025721192 = weight(abstract_txt:improve in 3693) [ClassicSimilarity], result of:
            0.025721192 = score(doc=3693,freq=1.0), product of:
              0.083014965 = queryWeight, product of:
                1.4524502 = boost
                4.9574084 = idf(docFreq=848, maxDocs=44421)
                0.011529234 = queryNorm
              0.30983803 = fieldWeight in 3693, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9574084 = idf(docFreq=848, maxDocs=44421)
                0.0625 = fieldNorm(doc=3693)
          0.04825543 = weight(abstract_txt:text in 3693) [ClassicSimilarity], result of:
            0.04825543 = score(doc=3693,freq=3.0), product of:
              0.11031368 = queryWeight, product of:
                2.367842 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.011529234 = queryNorm
              0.4374383 = fieldWeight in 3693, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0625 = fieldNorm(doc=3693)
          0.13823518 = weight(abstract_txt:algorithms in 3693) [ClassicSimilarity], result of:
            0.13823518 = score(doc=3693,freq=2.0), product of:
              0.2743751 = queryWeight, product of:
                4.1750855 = boost
                5.7000527 = idf(docFreq=403, maxDocs=44421)
                0.011529234 = queryNorm
              0.5038182 = fieldWeight in 3693, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.7000527 = idf(docFreq=403, maxDocs=44421)
                0.0625 = fieldNorm(doc=3693)
          0.6551587 = weight(abstract_txt:summarization in 3693) [ClassicSimilarity], result of:
            0.6551587 = score(doc=3693,freq=6.0), product of:
              0.6004805 = queryWeight, product of:
                7.308134 = boost
                7.1267567 = idf(docFreq=96, maxDocs=44421)
                0.011529234 = queryNorm
              1.0910574 = fieldWeight in 3693, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.1267567 = idf(docFreq=96, maxDocs=44421)
                0.0625 = fieldNorm(doc=3693)
        0.24 = coord(6/25)
    
  3. Agarwal, B.; Ramampiaro, H.; Langseth, H.; Ruocco, M.: ¬A deep network model for paraphrase detection in short text messages (2018) 0.21
    0.20894162 = sum of:
      0.20894162 = product of:
        0.5803934 = sum of:
          0.043954004 = weight(abstract_txt:achieves in 43) [ClassicSimilarity], result of:
            0.043954004 = score(doc=43,freq=1.0), product of:
              0.09417839 = queryWeight, product of:
                1.0939152 = boost
                7.467361 = idf(docFreq=68, maxDocs=44421)
                0.011529234 = queryNorm
              0.46671006 = fieldWeight in 43, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.467361 = idf(docFreq=68, maxDocs=44421)
                0.0625 = fieldNorm(doc=43)
          0.058307763 = weight(abstract_txt:noisy in 43) [ClassicSimilarity], result of:
            0.058307763 = score(doc=43,freq=1.0), product of:
              0.11370247 = queryWeight, product of:
                1.2019682 = boost
                8.20496 = idf(docFreq=32, maxDocs=44421)
                0.011529234 = queryNorm
              0.51281 = fieldWeight in 43, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.20496 = idf(docFreq=32, maxDocs=44421)
                0.0625 = fieldNorm(doc=43)
          0.02081541 = weight(abstract_txt:performance in 43) [ClassicSimilarity], result of:
            0.02081541 = score(doc=43,freq=1.0), product of:
              0.07209176 = queryWeight, product of:
                1.3535236 = boost
                4.619759 = idf(docFreq=1189, maxDocs=44421)
                0.011529234 = queryNorm
              0.28873494 = fieldWeight in 43, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.619759 = idf(docFreq=1189, maxDocs=44421)
                0.0625 = fieldNorm(doc=43)
          0.01444355 = weight(abstract_txt:based in 43) [ClassicSimilarity], result of:
            0.01444355 = score(doc=43,freq=2.0), product of:
              0.051337186 = queryWeight, product of:
                1.3988936 = boost
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.011529234 = queryNorm
              0.28134674 = fieldWeight in 43, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.0625 = fieldNorm(doc=43)
          0.017543845 = weight(abstract_txt:more in 43) [ClassicSimilarity], result of:
            0.017543845 = score(doc=43,freq=2.0), product of:
              0.058443017 = queryWeight, product of:
                1.4925709 = boost
                3.3962307 = idf(docFreq=4044, maxDocs=44421)
                0.011529234 = queryNorm
              0.3001872 = fieldWeight in 43, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.3962307 = idf(docFreq=4044, maxDocs=44421)
                0.0625 = fieldNorm(doc=43)
          0.0186803 = weight(abstract_txt:than in 43) [ClassicSimilarity], result of:
            0.0186803 = score(doc=43,freq=1.0), product of:
              0.076780096 = queryWeight, product of:
                1.7107754 = boost
                3.8927383 = idf(docFreq=2461, maxDocs=44421)
                0.011529234 = queryNorm
              0.24329615 = fieldWeight in 43, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.8927383 = idf(docFreq=2461, maxDocs=44421)
                0.0625 = fieldNorm(doc=43)
          0.09978067 = weight(abstract_txt:noise in 43) [ClassicSimilarity], result of:
            0.09978067 = score(doc=43,freq=1.0), product of:
              0.20495567 = queryWeight, product of:
                2.2821963 = boost
                7.7894444 = idf(docFreq=49, maxDocs=44421)
                0.011529234 = queryNorm
              0.48684028 = fieldWeight in 43, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7894444 = idf(docFreq=49, maxDocs=44421)
                0.0625 = fieldNorm(doc=43)
          0.03940039 = weight(abstract_txt:text in 43) [ClassicSimilarity], result of:
            0.03940039 = score(doc=43,freq=2.0), product of:
              0.11031368 = queryWeight, product of:
                2.367842 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.011529234 = queryNorm
              0.3571669 = fieldWeight in 43, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0625 = fieldNorm(doc=43)
          0.2674674 = weight(abstract_txt:summarization in 43) [ClassicSimilarity], result of:
            0.2674674 = score(doc=43,freq=1.0), product of:
              0.6004805 = queryWeight, product of:
                7.308134 = boost
                7.1267567 = idf(docFreq=96, maxDocs=44421)
                0.011529234 = queryNorm
              0.4454223 = fieldWeight in 43, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.1267567 = idf(docFreq=96, maxDocs=44421)
                0.0625 = fieldNorm(doc=43)
        0.36 = coord(9/25)
    
  4. Huo, W.: Automatic multi-word term extraction and its application to Web-page summarization (2012) 0.21
    0.20655389 = sum of:
      0.20655389 = product of:
        1.0327694 = sum of:
          0.07967475 = weight(abstract_txt:summaries in 1563) [ClassicSimilarity], result of:
            0.07967475 = score(doc=1563,freq=3.0), product of:
              0.08366023 = queryWeight, product of:
                1.0310211 = boost
                7.0380287 = idf(docFreq=105, maxDocs=44421)
                0.011529234 = queryNorm
              0.95236117 = fieldWeight in 1563, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.0380287 = idf(docFreq=105, maxDocs=44421)
                0.078125 = fieldNorm(doc=1563)
          0.034825355 = weight(abstract_txt:text in 1563) [ClassicSimilarity], result of:
            0.034825355 = score(doc=1563,freq=1.0), product of:
              0.11031368 = queryWeight, product of:
                2.367842 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.011529234 = queryNorm
              0.3156939 = fieldWeight in 1563, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.078125 = fieldNorm(doc=1563)
          0.058766343 = weight(abstract_txt:classification in 1563) [ClassicSimilarity], result of:
            0.058766343 = score(doc=1563,freq=1.0), product of:
              0.18842164 = queryWeight, product of:
                4.0937605 = boost
                3.9921594 = idf(docFreq=2228, maxDocs=44421)
                0.011529234 = queryNorm
              0.31188744 = fieldWeight in 1563, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9921594 = idf(docFreq=2228, maxDocs=44421)
                0.078125 = fieldNorm(doc=1563)
          0.28041917 = weight(abstract_txt:page in 1563) [ClassicSimilarity], result of:
            0.28041917 = score(doc=1563,freq=2.0), product of:
              0.4238773 = queryWeight, product of:
                6.140124 = boost
                5.987735 = idf(docFreq=302, maxDocs=44421)
                0.011529234 = queryNorm
              0.66155744 = fieldWeight in 1563, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.987735 = idf(docFreq=302, maxDocs=44421)
                0.078125 = fieldNorm(doc=1563)
          0.57908386 = weight(abstract_txt:summarization in 1563) [ClassicSimilarity], result of:
            0.57908386 = score(doc=1563,freq=3.0), product of:
              0.6004805 = queryWeight, product of:
                7.308134 = boost
                7.1267567 = idf(docFreq=96, maxDocs=44421)
                0.011529234 = queryNorm
              0.9643675 = fieldWeight in 1563, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.1267567 = idf(docFreq=96, maxDocs=44421)
                0.078125 = fieldNorm(doc=1563)
        0.2 = coord(5/25)
    
  5. Kwon, O.W.; Lee, J.H.: Text categorization based on k-nearest neighbor approach for web site classification (2003) 0.18
    0.17806952 = sum of:
      0.17806952 = product of:
        0.63596255 = sum of:
          0.033577427 = weight(abstract_txt:directory in 2070) [ClassicSimilarity], result of:
            0.033577427 = score(doc=2070,freq=1.0), product of:
              0.07870167 = queryWeight, product of:
                6.82627 = idf(docFreq=130, maxDocs=44421)
                0.011529234 = queryNorm
              0.42664188 = fieldWeight in 2070, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.82627 = idf(docFreq=130, maxDocs=44421)
                0.0625 = fieldNorm(doc=2070)
          0.03329504 = weight(abstract_txt:method in 2070) [ClassicSimilarity], result of:
            0.03329504 = score(doc=2070,freq=3.0), product of:
              0.06836622 = queryWeight, product of:
                1.3180863 = boost
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.011529234 = queryNorm
              0.4870101 = fieldWeight in 2070, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.0625 = fieldNorm(doc=2070)
          0.029437434 = weight(abstract_txt:performance in 2070) [ClassicSimilarity], result of:
            0.029437434 = score(doc=2070,freq=2.0), product of:
              0.07209176 = queryWeight, product of:
                1.3535236 = boost
                4.619759 = idf(docFreq=1189, maxDocs=44421)
                0.011529234 = queryNorm
              0.40833285 = fieldWeight in 2070, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.619759 = idf(docFreq=1189, maxDocs=44421)
                0.0625 = fieldNorm(doc=2070)
          0.010213133 = weight(abstract_txt:based in 2070) [ClassicSimilarity], result of:
            0.010213133 = score(doc=2070,freq=1.0), product of:
              0.051337186 = queryWeight, product of:
                1.3988936 = boost
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.011529234 = queryNorm
              0.1989422 = fieldWeight in 2070, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.0625 = fieldNorm(doc=2070)
          0.025721192 = weight(abstract_txt:improve in 2070) [ClassicSimilarity], result of:
            0.025721192 = score(doc=2070,freq=1.0), product of:
              0.083014965 = queryWeight, product of:
                1.4524502 = boost
                4.9574084 = idf(docFreq=848, maxDocs=44421)
                0.011529234 = queryNorm
              0.30983803 = fieldWeight in 2070, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9574084 = idf(docFreq=848, maxDocs=44421)
                0.0625 = fieldNorm(doc=2070)
          0.11515805 = weight(abstract_txt:classification in 2070) [ClassicSimilarity], result of:
            0.11515805 = score(doc=2070,freq=6.0), product of:
              0.18842164 = queryWeight, product of:
                4.0937605 = boost
                3.9921594 = idf(docFreq=2228, maxDocs=44421)
                0.011529234 = queryNorm
              0.61117214 = fieldWeight in 2070, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                3.9921594 = idf(docFreq=2228, maxDocs=44421)
                0.0625 = fieldNorm(doc=2070)
          0.38856027 = weight(abstract_txt:page in 2070) [ClassicSimilarity], result of:
            0.38856027 = score(doc=2070,freq=6.0), product of:
              0.4238773 = queryWeight, product of:
                6.140124 = boost
                5.987735 = idf(docFreq=302, maxDocs=44421)
                0.011529234 = queryNorm
              0.916681 = fieldWeight in 2070, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.987735 = idf(docFreq=302, maxDocs=44421)
                0.0625 = fieldNorm(doc=2070)
        0.28 = coord(7/25)