Document (#28562)

Author
Price, L.
Thelwall, M.
Title
¬The clustering power of low frequency words in academic webs
Source
Journal of the American Society for Information Science and Technology. 56(2005) no.8, S.883-888
Year
2005
Series
Brief communication
Abstract
The value of low frequency words for subject-based academic Web site clustering is assessed. A new technique is introduced to compare the relative clustering power of different vocabularies. The technique is designed for word frequency tests in large document clustering exercises. Results for the Australian and New Zealand academic Web spaces indicate that low frequency words are useful for clustering academic Web sites along subject lines; removing low frequency words results in sites becoming, an average, less dissimilar to sites from other subjects.

Similar documents (author)

  1. Thelwall, M.; Price, L.: Language evolution and the spread of ideas on the Web : a procedure for identifying emergent hybrid word (2006) 5.42
    5.418141 = sum of:
      5.418141 = sum of:
        1.9371849 = weight(author_txt:thelwall in 896) [ClassicSimilarity], result of:
          1.9371849 = score(doc=896,freq=1.0), product of:
            0.5603679 = queryWeight, product of:
              6.9139757 = idf(docFreq=119, maxDocs=44421)
              0.08104858 = queryNorm
            3.4569879 = fieldWeight in 896, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              6.9139757 = idf(docFreq=119, maxDocs=44421)
              0.5 = fieldNorm(doc=896)
        3.480956 = weight(author_txt:price in 896) [ClassicSimilarity], result of:
          3.480956 = score(doc=896,freq=1.0), product of:
            0.82824385 = queryWeight, product of:
              1.215745 = boost
              8.405631 = idf(docFreq=26, maxDocs=44421)
              0.08104858 = queryNorm
            4.2028155 = fieldWeight in 896, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.405631 = idf(docFreq=26, maxDocs=44421)
              0.5 = fieldNorm(doc=896)
    
  2. Harries, G.; Wilkinson, D.; Price, L.; Fairclough, R.; Thelwall, M.: Hyperlinks as a data source for science mapping : making sense of it all (2005) 3.39
    3.3863382 = sum of:
      3.3863382 = sum of:
        1.2107406 = weight(author_txt:thelwall in 5654) [ClassicSimilarity], result of:
          1.2107406 = score(doc=5654,freq=1.0), product of:
            0.5603679 = queryWeight, product of:
              6.9139757 = idf(docFreq=119, maxDocs=44421)
              0.08104858 = queryNorm
            2.1606174 = fieldWeight in 5654, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              6.9139757 = idf(docFreq=119, maxDocs=44421)
              0.3125 = fieldNorm(doc=5654)
        2.1755977 = weight(author_txt:price in 5654) [ClassicSimilarity], result of:
          2.1755977 = score(doc=5654,freq=1.0), product of:
            0.82824385 = queryWeight, product of:
              1.215745 = boost
              8.405631 = idf(docFreq=26, maxDocs=44421)
              0.08104858 = queryNorm
            2.6267598 = fieldWeight in 5654, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.405631 = idf(docFreq=26, maxDocs=44421)
              0.3125 = fieldNorm(doc=5654)
    
  3. Thelwall, M.; Binns, R.; Harries, G.; Page-Kennedy, T.; Price, L.; Wilkinson, D.: Custom interfaces for advanced queries in search engines (2001) 2.71
    2.7090704 = sum of:
      2.7090704 = sum of:
        0.96859246 = weight(author_txt:thelwall in 822) [ClassicSimilarity], result of:
          0.96859246 = score(doc=822,freq=1.0), product of:
            0.5603679 = queryWeight, product of:
              6.9139757 = idf(docFreq=119, maxDocs=44421)
              0.08104858 = queryNorm
            1.7284939 = fieldWeight in 822, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              6.9139757 = idf(docFreq=119, maxDocs=44421)
              0.25 = fieldNorm(doc=822)
        1.740478 = weight(author_txt:price in 822) [ClassicSimilarity], result of:
          1.740478 = score(doc=822,freq=1.0), product of:
            0.82824385 = queryWeight, product of:
              1.215745 = boost
              8.405631 = idf(docFreq=26, maxDocs=44421)
              0.08104858 = queryNorm
            2.1014078 = fieldWeight in 822, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.405631 = idf(docFreq=26, maxDocs=44421)
              0.25 = fieldNorm(doc=822)
    
  4. Price, B.J.: ¬A talking terminal for the blind (1985) 2.18
    2.1755977 = sum of:
      2.1755977 = product of:
        4.3511953 = sum of:
          4.3511953 = weight(author_txt:price in 2151) [ClassicSimilarity], result of:
            4.3511953 = score(doc=2151,freq=1.0), product of:
              0.82824385 = queryWeight, product of:
                1.215745 = boost
                8.405631 = idf(docFreq=26, maxDocs=44421)
                0.08104858 = queryNorm
              5.2535195 = fieldWeight in 2151, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.405631 = idf(docFreq=26, maxDocs=44421)
                0.625 = fieldNorm(doc=2151)
        0.5 = coord(1/2)
    
  5. Price, M.S.: ¬The National Union Catalog programme (1987) 2.18
    2.1755977 = sum of:
      2.1755977 = product of:
        4.3511953 = sum of:
          4.3511953 = weight(author_txt:price in 2537) [ClassicSimilarity], result of:
            4.3511953 = score(doc=2537,freq=1.0), product of:
              0.82824385 = queryWeight, product of:
                1.215745 = boost
                8.405631 = idf(docFreq=26, maxDocs=44421)
                0.08104858 = queryNorm
              5.2535195 = fieldWeight in 2537, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.405631 = idf(docFreq=26, maxDocs=44421)
                0.625 = fieldNorm(doc=2537)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Thelwall, M.: Text characteristics of English language university Web sites (2005) 0.37
    0.3733958 = sum of:
      0.3733958 = product of:
        1.3335564 = sum of:
          0.10735764 = weight(abstract_txt:zealand in 4463) [ClassicSimilarity], result of:
            0.10735764 = score(doc=4463,freq=1.0), product of:
              0.16984975 = queryWeight, product of:
                1.4162872 = boost
                8.090549 = idf(docFreq=36, maxDocs=44421)
                0.014822981 = queryNorm
              0.6320742 = fieldWeight in 4463, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.090549 = idf(docFreq=36, maxDocs=44421)
                0.078125 = fieldNorm(doc=4463)
          0.11075906 = weight(abstract_txt:webs in 4463) [ClassicSimilarity], result of:
            0.11075906 = score(doc=4463,freq=1.0), product of:
              0.17341864 = queryWeight, product of:
                1.4310894 = boost
                8.175107 = idf(docFreq=33, maxDocs=44421)
                0.014822981 = queryNorm
              0.6386802 = fieldWeight in 4463, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.175107 = idf(docFreq=33, maxDocs=44421)
                0.078125 = fieldNorm(doc=4463)
          0.13575867 = weight(abstract_txt:sites in 4463) [ClassicSimilarity], result of:
            0.13575867 = score(doc=4463,freq=2.0), product of:
              0.22736228 = queryWeight, product of:
                2.838172 = boost
                5.4043584 = idf(docFreq=542, maxDocs=44421)
                0.014822981 = queryNorm
              0.5971029 = fieldWeight in 4463, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4043584 = idf(docFreq=542, maxDocs=44421)
                0.078125 = fieldNorm(doc=4463)
          0.11690723 = weight(abstract_txt:academic in 4463) [ClassicSimilarity], result of:
            0.11690723 = score(doc=4463,freq=2.0), product of:
              0.22650643 = queryWeight, product of:
                3.2710648 = boost
                4.6714945 = idf(docFreq=1129, maxDocs=44421)
                0.014822981 = queryNorm
              0.51613206 = fieldWeight in 4463, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.6714945 = idf(docFreq=1129, maxDocs=44421)
                0.078125 = fieldNorm(doc=4463)
          0.24915494 = weight(abstract_txt:words in 4463) [ClassicSimilarity], result of:
            0.24915494 = score(doc=4463,freq=4.0), product of:
              0.29773 = queryWeight, product of:
                3.7502496 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.014822981 = queryNorm
              0.8368486 = fieldWeight in 4463, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.078125 = fieldNorm(doc=4463)
          0.36960548 = weight(abstract_txt:frequency in 4463) [ClassicSimilarity], result of:
            0.36960548 = score(doc=4463,freq=3.0), product of:
              0.4591467 = queryWeight, product of:
                5.2069044 = boost
                5.948895 = idf(docFreq=314, maxDocs=44421)
                0.014822981 = queryNorm
              0.80498344 = fieldWeight in 4463, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.948895 = idf(docFreq=314, maxDocs=44421)
                0.078125 = fieldNorm(doc=4463)
          0.24401337 = weight(abstract_txt:clustering in 4463) [ClassicSimilarity], result of:
            0.24401337 = score(doc=4463,freq=1.0), product of:
              0.5020828 = queryWeight, product of:
                5.4449205 = boost
                6.2208285 = idf(docFreq=239, maxDocs=44421)
                0.014822981 = queryNorm
              0.48600224 = fieldWeight in 4463, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2208285 = idf(docFreq=239, maxDocs=44421)
                0.078125 = fieldNorm(doc=4463)
        0.28 = coord(7/25)
    
  2. Khoo, C.S.G.; Dai, D.; Loh, T.E.: Using statistical and contextual information to identify two- and three-character words in Chinese text (2002) 0.12
    0.12033025 = sum of:
      0.12033025 = product of:
        0.7520641 = sum of:
          0.052042037 = weight(abstract_txt:relative in 206) [ClassicSimilarity], result of:
            0.052042037 = score(doc=206,freq=2.0), product of:
              0.096533 = queryWeight, product of:
                1.0677185 = boost
                6.099349 = idf(docFreq=270, maxDocs=44421)
                0.014822981 = queryNorm
              0.5391114 = fieldWeight in 206, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.099349 = idf(docFreq=270, maxDocs=44421)
                0.0625 = fieldNorm(doc=206)
          0.019309206 = weight(abstract_txt:results in 206) [ClassicSimilarity], result of:
            0.019309206 = score(doc=206,freq=2.0), product of:
              0.06279994 = queryWeight, product of:
                1.2179052 = boost
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.014822981 = queryNorm
              0.30747172 = fieldWeight in 206, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.0625 = fieldNorm(doc=206)
          0.29898593 = weight(abstract_txt:words in 206) [ClassicSimilarity], result of:
            0.29898593 = score(doc=206,freq=9.0), product of:
              0.29773 = queryWeight, product of:
                3.7502496 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.014822981 = queryNorm
              1.0042183 = fieldWeight in 206, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.0625 = fieldNorm(doc=206)
          0.38172692 = weight(abstract_txt:frequency in 206) [ClassicSimilarity], result of:
            0.38172692 = score(doc=206,freq=5.0), product of:
              0.4591467 = queryWeight, product of:
                5.2069044 = boost
                5.948895 = idf(docFreq=314, maxDocs=44421)
                0.014822981 = queryNorm
              0.83138335 = fieldWeight in 206, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.948895 = idf(docFreq=314, maxDocs=44421)
                0.0625 = fieldNorm(doc=206)
        0.16 = coord(4/25)
    
  3. Li, X.; Zhang, A.; Li, C.; Ouyang, J.; Cai, Y.: Exploring coherent topics by topic modeling with term weighting (2018) 0.11
    0.111157864 = sum of:
      0.111157864 = product of:
        0.69473666 = sum of:
          0.01365367 = weight(abstract_txt:results in 45) [ClassicSimilarity], result of:
            0.01365367 = score(doc=45,freq=1.0), product of:
              0.06279994 = queryWeight, product of:
                1.2179052 = boost
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.014822981 = queryNorm
              0.21741535 = fieldWeight in 45, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.0625 = fieldNorm(doc=45)
          0.3151588 = weight(abstract_txt:words in 45) [ClassicSimilarity], result of:
            0.3151588 = score(doc=45,freq=10.0), product of:
              0.29773 = queryWeight, product of:
                3.7502496 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.014822981 = queryNorm
              1.058539 = fieldWeight in 45, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.0625 = fieldNorm(doc=45)
          0.17071347 = weight(abstract_txt:frequency in 45) [ClassicSimilarity], result of:
            0.17071347 = score(doc=45,freq=1.0), product of:
              0.4591467 = queryWeight, product of:
                5.2069044 = boost
                5.948895 = idf(docFreq=314, maxDocs=44421)
                0.014822981 = queryNorm
              0.37180594 = fieldWeight in 45, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.948895 = idf(docFreq=314, maxDocs=44421)
                0.0625 = fieldNorm(doc=45)
          0.1952107 = weight(abstract_txt:clustering in 45) [ClassicSimilarity], result of:
            0.1952107 = score(doc=45,freq=1.0), product of:
              0.5020828 = queryWeight, product of:
                5.4449205 = boost
                6.2208285 = idf(docFreq=239, maxDocs=44421)
                0.014822981 = queryNorm
              0.38880178 = fieldWeight in 45, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2208285 = idf(docFreq=239, maxDocs=44421)
                0.0625 = fieldNorm(doc=45)
        0.16 = coord(4/25)
    
  4. Thelwall, M.; Wilkinson, D.: Graph structure in three national academic Webs : power laws with anomalies (2003) 0.10
    0.100804806 = sum of:
      0.100804806 = product of:
        0.504024 = sum of:
          0.041684944 = weight(abstract_txt:average in 2681) [ClassicSimilarity], result of:
            0.041684944 = score(doc=2681,freq=1.0), product of:
              0.09039875 = queryWeight, product of:
                1.0332373 = boost
                5.9023747 = idf(docFreq=329, maxDocs=44421)
                0.014822981 = queryNorm
              0.46112302 = fieldWeight in 2681, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9023747 = idf(docFreq=329, maxDocs=44421)
                0.078125 = fieldNorm(doc=2681)
          0.10735764 = weight(abstract_txt:zealand in 2681) [ClassicSimilarity], result of:
            0.10735764 = score(doc=2681,freq=1.0), product of:
              0.16984975 = queryWeight, product of:
                1.4162872 = boost
                8.090549 = idf(docFreq=36, maxDocs=44421)
                0.014822981 = queryNorm
              0.6320742 = fieldWeight in 2681, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.090549 = idf(docFreq=36, maxDocs=44421)
                0.078125 = fieldNorm(doc=2681)
          0.11075906 = weight(abstract_txt:webs in 2681) [ClassicSimilarity], result of:
            0.11075906 = score(doc=2681,freq=1.0), product of:
              0.17341864 = queryWeight, product of:
                1.4310894 = boost
                8.175107 = idf(docFreq=33, maxDocs=44421)
                0.014822981 = queryNorm
              0.6386802 = fieldWeight in 2681, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.175107 = idf(docFreq=33, maxDocs=44421)
                0.078125 = fieldNorm(doc=2681)
          0.10846371 = weight(abstract_txt:power in 2681) [ClassicSimilarity], result of:
            0.10846371 = score(doc=2681,freq=2.0), product of:
              0.17101435 = queryWeight, product of:
                2.0097876 = boost
                5.7404623 = idf(docFreq=387, maxDocs=44421)
                0.014822981 = queryNorm
              0.63423747 = fieldWeight in 2681, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.7404623 = idf(docFreq=387, maxDocs=44421)
                0.078125 = fieldNorm(doc=2681)
          0.13575867 = weight(abstract_txt:sites in 2681) [ClassicSimilarity], result of:
            0.13575867 = score(doc=2681,freq=2.0), product of:
              0.22736228 = queryWeight, product of:
                2.838172 = boost
                5.4043584 = idf(docFreq=542, maxDocs=44421)
                0.014822981 = queryNorm
              0.5971029 = fieldWeight in 2681, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4043584 = idf(docFreq=542, maxDocs=44421)
                0.078125 = fieldNorm(doc=2681)
        0.2 = coord(5/25)
    
  5. Park, G.; Baek, Y.; Lee, H.-K.: Re-ranking algorithm using post-retrieval clustering for content-based image retrieval (2005) 0.10
    0.098876506 = sum of:
      0.098876506 = product of:
        0.61797816 = sum of:
          0.047161132 = weight(abstract_txt:average in 2005) [ClassicSimilarity], result of:
            0.047161132 = score(doc=2005,freq=2.0), product of:
              0.09039875 = queryWeight, product of:
                1.0332373 = boost
                5.9023747 = idf(docFreq=329, maxDocs=44421)
                0.014822981 = queryNorm
              0.52170116 = fieldWeight in 2005, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.9023747 = idf(docFreq=329, maxDocs=44421)
                0.0625 = fieldNorm(doc=2005)
          0.02730734 = weight(abstract_txt:results in 2005) [ClassicSimilarity], result of:
            0.02730734 = score(doc=2005,freq=4.0), product of:
              0.06279994 = queryWeight, product of:
                1.2179052 = boost
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.014822981 = queryNorm
              0.4348307 = fieldWeight in 2005, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.0625 = fieldNorm(doc=2005)
          0.10700529 = weight(abstract_txt:dissimilar in 2005) [ClassicSimilarity], result of:
            0.10700529 = score(doc=2005,freq=1.0), product of:
              0.1966617 = queryWeight, product of:
                1.5239782 = boost
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.014822981 = queryNorm
              0.54410845 = fieldWeight in 2005, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.0625 = fieldNorm(doc=2005)
          0.4365044 = weight(abstract_txt:clustering in 2005) [ClassicSimilarity], result of:
            0.4365044 = score(doc=2005,freq=5.0), product of:
              0.5020828 = queryWeight, product of:
                5.4449205 = boost
                6.2208285 = idf(docFreq=239, maxDocs=44421)
                0.014822981 = queryNorm
              0.8693872 = fieldWeight in 2005, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.2208285 = idf(docFreq=239, maxDocs=44421)
                0.0625 = fieldNorm(doc=2005)
        0.16 = coord(4/25)