Document (#20674)

Author
Wolfekuhler, M.R.
Punch, W.F.
Title
Finding salient features for personal Web pages categories
Source
Computer networks and ISDN systems. 29(1997) no.8, S.1147-1156
Year
1997
Abstract
Examines techniques that discover features in sets of pre-categorized documents, such that similar documents can be found on the WWW. Examines techniques which will classifiy training examples with high accuracy, then explains why this is not necessarily useful. Describes a method for extracting word clusters from the raw document features. Results show that the clustering technique is successful in discovering word groups in personal Web pages which can be used to find similar information on the WWW
Footnote
Contribution to a special issue of papers from the 6th International World Wide Web conference, held 7-11 Apr 1997, Santa Clara, California
Theme
Internet
Automatisches Indexieren
Metadaten

Similar documents (content)

  1. Mao, J.; Cui, H.: Identifying bacterial biotope entities using sequence labeling : performance and feature analysis (2018) 0.18
    0.17589067 = sum of:
      0.17589067 = product of:
        0.628181 = sum of:
          0.055331416 = weight(abstract_txt:accuracy in 462) [ClassicSimilarity], result of:
            0.055331416 = score(doc=462,freq=1.0), product of:
              0.14865883 = queryWeight, product of:
                1.1491503 = boost
                5.9552646 = idf(docFreq=312, maxDocs=44421)
                0.02172265 = queryNorm
              0.37220404 = fieldWeight in 462, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9552646 = idf(docFreq=312, maxDocs=44421)
                0.0625 = fieldNorm(doc=462)
          0.12575024 = weight(abstract_txt:clusters in 462) [ClassicSimilarity], result of:
            0.12575024 = score(doc=462,freq=3.0), product of:
              0.1781729 = queryWeight, product of:
                1.2580627 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.02172265 = queryNorm
              0.70577645 = fieldWeight in 462, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.0625 = fieldNorm(doc=462)
          0.08721987 = weight(abstract_txt:extracting in 462) [ClassicSimilarity], result of:
            0.08721987 = score(doc=462,freq=1.0), product of:
              0.2013507 = queryWeight, product of:
                1.33739 = boost
                6.930783 = idf(docFreq=117, maxDocs=44421)
                0.02172265 = queryNorm
              0.43317392 = fieldWeight in 462, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.930783 = idf(docFreq=117, maxDocs=44421)
                0.0625 = fieldNorm(doc=462)
          0.018005624 = weight(abstract_txt:that in 462) [ClassicSimilarity], result of:
            0.018005624 = score(doc=462,freq=3.0), product of:
              0.070331246 = queryWeight, product of:
                1.3690404 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.02172265 = queryNorm
              0.25601172 = fieldWeight in 462, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0625 = fieldNorm(doc=462)
          0.04877553 = weight(abstract_txt:techniques in 462) [ClassicSimilarity], result of:
            0.04877553 = score(doc=462,freq=1.0), product of:
              0.17219512 = queryWeight, product of:
                1.7490689 = boost
                4.5321174 = idf(docFreq=1298, maxDocs=44421)
                0.02172265 = queryNorm
              0.28325734 = fieldWeight in 462, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5321174 = idf(docFreq=1298, maxDocs=44421)
                0.0625 = fieldNorm(doc=462)
          0.14594641 = weight(abstract_txt:word in 462) [ClassicSimilarity], result of:
            0.14594641 = score(doc=462,freq=3.0), product of:
              0.247918 = queryWeight, product of:
                2.0987008 = boost
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.02172265 = queryNorm
              0.58868825 = fieldWeight in 462, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.0625 = fieldNorm(doc=462)
          0.14715189 = weight(abstract_txt:features in 462) [ClassicSimilarity], result of:
            0.14715189 = score(doc=462,freq=4.0), product of:
              0.25926298 = queryWeight, product of:
                2.6285264 = boost
                4.5406218 = idf(docFreq=1287, maxDocs=44421)
                0.02172265 = queryNorm
              0.5675777 = fieldWeight in 462, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.5406218 = idf(docFreq=1287, maxDocs=44421)
                0.0625 = fieldNorm(doc=462)
        0.28 = coord(7/25)
    
  2. Sebastian, Y.: Literature-based discovery by learning heterogeneous bibliographic information networks (2017) 0.16
    0.16174047 = sum of:
      0.16174047 = product of:
        0.505439 = sum of:
          0.03757878 = weight(abstract_txt:finding in 1536) [ClassicSimilarity], result of:
            0.03757878 = score(doc=1536,freq=1.0), product of:
              0.12555459 = queryWeight, product of:
                1.0560824 = boost
                5.4729567 = idf(docFreq=506, maxDocs=44421)
                0.02172265 = queryNorm
              0.2993023 = fieldWeight in 1536, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4729567 = idf(docFreq=506, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1536)
          0.04013725 = weight(abstract_txt:technique in 1536) [ClassicSimilarity], result of:
            0.04013725 = score(doc=1536,freq=1.0), product of:
              0.13119055 = queryWeight, product of:
                1.0795251 = boost
                5.5944448 = idf(docFreq=448, maxDocs=44421)
                0.02172265 = queryNorm
              0.3059462 = fieldWeight in 1536, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5944448 = idf(docFreq=448, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1536)
          0.048414987 = weight(abstract_txt:accuracy in 1536) [ClassicSimilarity], result of:
            0.048414987 = score(doc=1536,freq=1.0), product of:
              0.14865883 = queryWeight, product of:
                1.1491503 = boost
                5.9552646 = idf(docFreq=312, maxDocs=44421)
                0.02172265 = queryNorm
              0.32567853 = fieldWeight in 1536, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9552646 = idf(docFreq=312, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1536)
          0.0635267 = weight(abstract_txt:clusters in 1536) [ClassicSimilarity], result of:
            0.0635267 = score(doc=1536,freq=1.0), product of:
              0.1781729 = queryWeight, product of:
                1.2580627 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.02172265 = queryNorm
              0.3565452 = fieldWeight in 1536, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1536)
          0.07775661 = weight(abstract_txt:necessarily in 1536) [ClassicSimilarity], result of:
            0.07775661 = score(doc=1536,freq=1.0), product of:
              0.20387425 = queryWeight, product of:
                1.3457446 = boost
                6.9740796 = idf(docFreq=112, maxDocs=44421)
                0.02172265 = queryNorm
              0.38139498 = fieldWeight in 1536, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9740796 = idf(docFreq=112, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1536)
          0.020339519 = weight(abstract_txt:that in 1536) [ClassicSimilarity], result of:
            0.020339519 = score(doc=1536,freq=5.0), product of:
              0.070331246 = queryWeight, product of:
                1.3690404 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.02172265 = queryNorm
              0.28919604 = fieldWeight in 1536, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1536)
          0.07372943 = weight(abstract_txt:word in 1536) [ClassicSimilarity], result of:
            0.07372943 = score(doc=1536,freq=1.0), product of:
              0.247918 = queryWeight, product of:
                2.0987008 = boost
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.02172265 = queryNorm
              0.29739442 = fieldWeight in 1536, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1536)
          0.14395572 = weight(abstract_txt:features in 1536) [ClassicSimilarity], result of:
            0.14395572 = score(doc=1536,freq=5.0), product of:
              0.25926298 = queryWeight, product of:
                2.6285264 = boost
                4.5406218 = idf(docFreq=1287, maxDocs=44421)
                0.02172265 = queryNorm
              0.5552498 = fieldWeight in 1536, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.5406218 = idf(docFreq=1287, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1536)
        0.32 = coord(8/25)
    
  3. Lin, Y.-R.; Margolin, D.; Lazer, D.: Uncovering social semantics from textual traces : a theory-driven approach and evidence from public statements of U.S. Members of Congress (2016) 0.15
    0.15413886 = sum of:
      0.15413886 = product of:
        0.48168397 = sum of:
          0.053683974 = weight(abstract_txt:finding in 4078) [ClassicSimilarity], result of:
            0.053683974 = score(doc=4078,freq=1.0), product of:
              0.12555459 = queryWeight, product of:
                1.0560824 = boost
                5.4729567 = idf(docFreq=506, maxDocs=44421)
                0.02172265 = queryNorm
              0.42757475 = fieldWeight in 4078, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4729567 = idf(docFreq=506, maxDocs=44421)
                0.078125 = fieldNorm(doc=4078)
          0.05733893 = weight(abstract_txt:technique in 4078) [ClassicSimilarity], result of:
            0.05733893 = score(doc=4078,freq=1.0), product of:
              0.13119055 = queryWeight, product of:
                1.0795251 = boost
                5.5944448 = idf(docFreq=448, maxDocs=44421)
                0.02172265 = queryNorm
              0.437066 = fieldWeight in 4078, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5944448 = idf(docFreq=448, maxDocs=44421)
                0.078125 = fieldNorm(doc=4078)
          0.022913847 = weight(abstract_txt:which in 4078) [ClassicSimilarity], result of:
            0.022913847 = score(doc=4078,freq=2.0), product of:
              0.07117621 = queryWeight, product of:
                1.1245115 = boost
                2.9137893 = idf(docFreq=6552, maxDocs=44421)
                0.02172265 = queryNorm
              0.32193127 = fieldWeight in 4078, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.9137893 = idf(docFreq=6552, maxDocs=44421)
                0.078125 = fieldNorm(doc=4078)
          0.018376915 = weight(abstract_txt:that in 4078) [ClassicSimilarity], result of:
            0.018376915 = score(doc=4078,freq=2.0), product of:
              0.070331246 = queryWeight, product of:
                1.3690404 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.02172265 = queryNorm
              0.2612909 = fieldWeight in 4078, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.078125 = fieldNorm(doc=4078)
          0.045914553 = weight(abstract_txt:documents in 4078) [ClassicSimilarity], result of:
            0.045914553 = score(doc=4078,freq=1.0), product of:
              0.14253223 = queryWeight, product of:
                1.5913035 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.02172265 = queryNorm
              0.32213452 = fieldWeight in 4078, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.078125 = fieldNorm(doc=4078)
          0.060969416 = weight(abstract_txt:techniques in 4078) [ClassicSimilarity], result of:
            0.060969416 = score(doc=4078,freq=1.0), product of:
              0.17219512 = queryWeight, product of:
                1.7490689 = boost
                4.5321174 = idf(docFreq=1298, maxDocs=44421)
                0.02172265 = queryNorm
              0.35407168 = fieldWeight in 4078, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5321174 = idf(docFreq=1298, maxDocs=44421)
                0.078125 = fieldNorm(doc=4078)
          0.09242121 = weight(abstract_txt:similar in 4078) [ClassicSimilarity], result of:
            0.09242121 = score(doc=4078,freq=1.0), product of:
              0.22722736 = queryWeight, product of:
                2.0092168 = boost
                5.206202 = idf(docFreq=661, maxDocs=44421)
                0.02172265 = queryNorm
              0.40673453 = fieldWeight in 4078, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.206202 = idf(docFreq=661, maxDocs=44421)
                0.078125 = fieldNorm(doc=4078)
          0.13006513 = weight(abstract_txt:features in 4078) [ClassicSimilarity], result of:
            0.13006513 = score(doc=4078,freq=2.0), product of:
              0.25926298 = queryWeight, product of:
                2.6285264 = boost
                4.5406218 = idf(docFreq=1287, maxDocs=44421)
                0.02172265 = queryNorm
              0.50167257 = fieldWeight in 4078, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.5406218 = idf(docFreq=1287, maxDocs=44421)
                0.078125 = fieldNorm(doc=4078)
        0.32 = coord(8/25)
    
  4. Lim, C.S.; Lee, K.J.; Kim, G.C.: Multiple sets of features for automatic genre classification of web documents (2005) 0.15
    0.1532364 = sum of:
      0.1532364 = product of:
        0.54727286 = sum of:
          0.06315415 = weight(abstract_txt:sets in 2048) [ClassicSimilarity], result of:
            0.06315415 = score(doc=2048,freq=3.0), product of:
              0.11257373 = queryWeight, product of:
                5.18232 = idf(docFreq=677, maxDocs=44421)
                0.02172265 = queryNorm
              0.5610026 = fieldWeight in 2048, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.18232 = idf(docFreq=677, maxDocs=44421)
                0.0625 = fieldNorm(doc=2048)
          0.022450894 = weight(abstract_txt:which in 2048) [ClassicSimilarity], result of:
            0.022450894 = score(doc=2048,freq=3.0), product of:
              0.07117621 = queryWeight, product of:
                1.1245115 = boost
                2.9137893 = idf(docFreq=6552, maxDocs=44421)
                0.02172265 = queryNorm
              0.31542695 = fieldWeight in 2048, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.9137893 = idf(docFreq=6552, maxDocs=44421)
                0.0625 = fieldNorm(doc=2048)
          0.010395552 = weight(abstract_txt:that in 2048) [ClassicSimilarity], result of:
            0.010395552 = score(doc=2048,freq=1.0), product of:
              0.070331246 = queryWeight, product of:
                1.3690404 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.02172265 = queryNorm
              0.14780845 = fieldWeight in 2048, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0625 = fieldNorm(doc=2048)
          0.11019493 = weight(abstract_txt:documents in 2048) [ClassicSimilarity], result of:
            0.11019493 = score(doc=2048,freq=9.0), product of:
              0.14253223 = queryWeight, product of:
                1.5913035 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.02172265 = queryNorm
              0.7731229 = fieldWeight in 2048, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.0625 = fieldNorm(doc=2048)
          0.0842622 = weight(abstract_txt:word in 2048) [ClassicSimilarity], result of:
            0.0842622 = score(doc=2048,freq=1.0), product of:
              0.247918 = queryWeight, product of:
                2.0987008 = boost
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.02172265 = queryNorm
              0.33987933 = fieldWeight in 2048, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.0625 = fieldNorm(doc=2048)
          0.09229432 = weight(abstract_txt:pages in 2048) [ClassicSimilarity], result of:
            0.09229432 = score(doc=2048,freq=1.0), product of:
              0.2634326 = queryWeight, product of:
                2.163372 = boost
                5.6056433 = idf(docFreq=443, maxDocs=44421)
                0.02172265 = queryNorm
              0.3503527 = fieldWeight in 2048, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6056433 = idf(docFreq=443, maxDocs=44421)
                0.0625 = fieldNorm(doc=2048)
          0.16452082 = weight(abstract_txt:features in 2048) [ClassicSimilarity], result of:
            0.16452082 = score(doc=2048,freq=5.0), product of:
              0.25926298 = queryWeight, product of:
                2.6285264 = boost
                4.5406218 = idf(docFreq=1287, maxDocs=44421)
                0.02172265 = queryNorm
              0.6345712 = fieldWeight in 2048, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.5406218 = idf(docFreq=1287, maxDocs=44421)
                0.0625 = fieldNorm(doc=2048)
        0.28 = coord(7/25)
    
  5. Scholer, F.; Williams, H.E.; Turpin, A.: Query association surrogates for Web search (2004) 0.14
    0.14166947 = sum of:
      0.14166947 = product of:
        0.5059624 = sum of:
          0.0759206 = weight(abstract_txt:finding in 3236) [ClassicSimilarity], result of:
            0.0759206 = score(doc=3236,freq=2.0), product of:
              0.12555459 = queryWeight, product of:
                1.0560824 = boost
                5.4729567 = idf(docFreq=506, maxDocs=44421)
                0.02172265 = queryNorm
              0.60468197 = fieldWeight in 3236, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4729567 = idf(docFreq=506, maxDocs=44421)
                0.078125 = fieldNorm(doc=3236)
          0.05733893 = weight(abstract_txt:technique in 3236) [ClassicSimilarity], result of:
            0.05733893 = score(doc=3236,freq=1.0), product of:
              0.13119055 = queryWeight, product of:
                1.0795251 = boost
                5.5944448 = idf(docFreq=448, maxDocs=44421)
                0.02172265 = queryNorm
              0.437066 = fieldWeight in 3236, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5944448 = idf(docFreq=448, maxDocs=44421)
                0.078125 = fieldNorm(doc=3236)
          0.06916427 = weight(abstract_txt:accuracy in 3236) [ClassicSimilarity], result of:
            0.06916427 = score(doc=3236,freq=1.0), product of:
              0.14865883 = queryWeight, product of:
                1.1491503 = boost
                5.9552646 = idf(docFreq=312, maxDocs=44421)
                0.02172265 = queryNorm
              0.46525505 = fieldWeight in 3236, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9552646 = idf(docFreq=312, maxDocs=44421)
                0.078125 = fieldNorm(doc=3236)
          0.02598888 = weight(abstract_txt:that in 3236) [ClassicSimilarity], result of:
            0.02598888 = score(doc=3236,freq=4.0), product of:
              0.070331246 = queryWeight, product of:
                1.3690404 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.02172265 = queryNorm
              0.3695211 = fieldWeight in 3236, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.078125 = fieldNorm(doc=3236)
          0.079526335 = weight(abstract_txt:documents in 3236) [ClassicSimilarity], result of:
            0.079526335 = score(doc=3236,freq=3.0), product of:
              0.14253223 = queryWeight, product of:
                1.5913035 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.02172265 = queryNorm
              0.55795336 = fieldWeight in 3236, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.078125 = fieldNorm(doc=3236)
          0.10560212 = weight(abstract_txt:techniques in 3236) [ClassicSimilarity], result of:
            0.10560212 = score(doc=3236,freq=3.0), product of:
              0.17219512 = queryWeight, product of:
                1.7490689 = boost
                4.5321174 = idf(docFreq=1298, maxDocs=44421)
                0.02172265 = queryNorm
              0.6132701 = fieldWeight in 3236, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.5321174 = idf(docFreq=1298, maxDocs=44421)
                0.078125 = fieldNorm(doc=3236)
          0.09242121 = weight(abstract_txt:similar in 3236) [ClassicSimilarity], result of:
            0.09242121 = score(doc=3236,freq=1.0), product of:
              0.22722736 = queryWeight, product of:
                2.0092168 = boost
                5.206202 = idf(docFreq=661, maxDocs=44421)
                0.02172265 = queryNorm
              0.40673453 = fieldWeight in 3236, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.206202 = idf(docFreq=661, maxDocs=44421)
                0.078125 = fieldNorm(doc=3236)
        0.28 = coord(7/25)