Document (#31854)

Author
Kok, Y.H.
Holaday, D.A.
Goh, A.
Holaday, D.A.
Title
Using cluster analysis to determine the media agenda
Source
Aslib proceedings. 51(1999) no.10, S.361-371
Year
1999
Abstract
This paper describes a software tool that aids researchers in the study of agenda setting. Agenda setting theory claims that the mass media influences what the public thinks and talks about. The tool is used to cluster documents into topically coherent groupings that are to represent issues dominating press coverage. The documents are taken from the archives of online newspapers. In addition, the tool enables results to be visualised and displayed. Three methods were investigated for the purpose of clustering, of which the Group-Average-Linkage algorithm was chosen for the final testing. The choice of the clustering algorithm was predominantly made upon the quality of clusters produced. Comparisons between the computer-based results and a method involving human readers revealed comparable findings and potential usefulness of the software.
Field
Kommunikationswissenschaften

Similar documents (content)

  1. Kishida, K.: High-speed rough clustering for very large document collections (2010) 0.16
    0.160477 = sum of:
      0.160477 = product of:
        0.57313216 = sum of:
          0.02092983 = weight(abstract_txt:results in 450) [ClassicSimilarity], result of:
            0.02092983 = score(doc=450,freq=2.0), product of:
              0.06807075 = queryWeight, product of:
                1.0074742 = boost
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.019423004 = queryNorm
              0.30747172 = fieldWeight in 450, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.0625 = fieldNorm(doc=450)
          0.009864707 = weight(abstract_txt:that in 450) [ClassicSimilarity], result of:
            0.009864707 = score(doc=450,freq=2.0), product of:
              0.047192167 = queryWeight, product of:
                1.0273875 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.019423004 = queryNorm
              0.20903271 = fieldWeight in 450, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0625 = fieldNorm(doc=450)
          0.042689648 = weight(abstract_txt:documents in 450) [ClassicSimilarity], result of:
            0.042689648 = score(doc=450,freq=3.0), product of:
              0.09563892 = queryWeight, product of:
                1.1941833 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.019423004 = queryNorm
              0.4463627 = fieldWeight in 450, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.0625 = fieldNorm(doc=450)
          0.11307103 = weight(abstract_txt:algorithm in 450) [ClassicSimilarity], result of:
            0.11307103 = score(doc=450,freq=3.0), product of:
              0.18308546 = queryWeight, product of:
                1.6522684 = boost
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.019423004 = queryNorm
              0.6175861 = fieldWeight in 450, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.0625 = fieldNorm(doc=450)
          0.22393082 = weight(abstract_txt:clustering in 450) [ClassicSimilarity], result of:
            0.22393082 = score(doc=450,freq=7.0), product of:
              0.21768907 = queryWeight, product of:
                1.8016565 = boost
                6.2208285 = idf(docFreq=239, maxDocs=44421)
                0.019423004 = queryNorm
              1.0286728 = fieldWeight in 450, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                6.2208285 = idf(docFreq=239, maxDocs=44421)
                0.0625 = fieldNorm(doc=450)
          0.09871455 = weight(abstract_txt:cluster in 450) [ClassicSimilarity], result of:
            0.09871455 = score(doc=450,freq=1.0), product of:
              0.24120195 = queryWeight, product of:
                1.8964617 = boost
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.019423004 = queryNorm
              0.409261 = fieldWeight in 450, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.0625 = fieldNorm(doc=450)
          0.063931614 = weight(abstract_txt:tool in 450) [ClassicSimilarity], result of:
            0.063931614 = score(doc=450,freq=1.0), product of:
              0.20668116 = queryWeight, product of:
                2.150056 = boost
                4.9491973 = idf(docFreq=855, maxDocs=44421)
                0.019423004 = queryNorm
              0.30932483 = fieldWeight in 450, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9491973 = idf(docFreq=855, maxDocs=44421)
                0.0625 = fieldNorm(doc=450)
        0.28 = coord(7/25)
    
  2. Zamir, O.; Etzioni, O.: Grouper : a dynamic clustering interface to Web search results (1999) 0.14
    0.13558736 = sum of:
      0.13558736 = product of:
        0.56494737 = sum of:
          0.018499533 = weight(abstract_txt:results in 207) [ClassicSimilarity], result of:
            0.018499533 = score(doc=207,freq=1.0), product of:
              0.06807075 = queryWeight, product of:
                1.0074742 = boost
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.019423004 = queryNorm
              0.2717692 = fieldWeight in 207, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.078125 = fieldNorm(doc=207)
          0.0087192515 = weight(abstract_txt:that in 207) [ClassicSimilarity], result of:
            0.0087192515 = score(doc=207,freq=1.0), product of:
              0.047192167 = queryWeight, product of:
                1.0273875 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.019423004 = queryNorm
              0.18476056 = fieldWeight in 207, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.078125 = fieldNorm(doc=207)
          0.0308086 = weight(abstract_txt:documents in 207) [ClassicSimilarity], result of:
            0.0308086 = score(doc=207,freq=1.0), product of:
              0.09563892 = queryWeight, product of:
                1.1941833 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.019423004 = queryNorm
              0.32213452 = fieldWeight in 207, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.078125 = fieldNorm(doc=207)
          0.081601985 = weight(abstract_txt:algorithm in 207) [ClassicSimilarity], result of:
            0.081601985 = score(doc=207,freq=1.0), product of:
              0.18308546 = queryWeight, product of:
                1.6522684 = boost
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.019423004 = queryNorm
              0.44570434 = fieldWeight in 207, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.078125 = fieldNorm(doc=207)
          0.21159475 = weight(abstract_txt:clustering in 207) [ClassicSimilarity], result of:
            0.21159475 = score(doc=207,freq=4.0), product of:
              0.21768907 = queryWeight, product of:
                1.8016565 = boost
                6.2208285 = idf(docFreq=239, maxDocs=44421)
                0.019423004 = queryNorm
              0.9720045 = fieldWeight in 207, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.2208285 = idf(docFreq=239, maxDocs=44421)
                0.078125 = fieldNorm(doc=207)
          0.21372327 = weight(abstract_txt:cluster in 207) [ClassicSimilarity], result of:
            0.21372327 = score(doc=207,freq=3.0), product of:
              0.24120195 = queryWeight, product of:
                1.8964617 = boost
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.019423004 = queryNorm
              0.88607603 = fieldWeight in 207, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.078125 = fieldNorm(doc=207)
        0.24 = coord(6/25)
    
  3. Hu, G.; Zhou, S.; Guan, J.; Hu, X.: Towards effective document clustering : a constrained K-means based approach (2008) 0.12
    0.11758907 = sum of:
      0.11758907 = product of:
        0.58794534 = sum of:
          0.010463102 = weight(abstract_txt:that in 3113) [ClassicSimilarity], result of:
            0.010463102 = score(doc=3113,freq=1.0), product of:
              0.047192167 = queryWeight, product of:
                1.0273875 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.019423004 = queryNorm
              0.22171268 = fieldWeight in 3113, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.09375 = fieldNorm(doc=3113)
          0.05228393 = weight(abstract_txt:documents in 3113) [ClassicSimilarity], result of:
            0.05228393 = score(doc=3113,freq=2.0), product of:
              0.09563892 = queryWeight, product of:
                1.1941833 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.019423004 = queryNorm
              0.54668045 = fieldWeight in 3113, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.09375 = fieldNorm(doc=3113)
          0.21989569 = weight(abstract_txt:clustering in 3113) [ClassicSimilarity], result of:
            0.21989569 = score(doc=3113,freq=3.0), product of:
              0.21768907 = queryWeight, product of:
                1.8016565 = boost
                6.2208285 = idf(docFreq=239, maxDocs=44421)
                0.019423004 = queryNorm
              1.0101366 = fieldWeight in 3113, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.2208285 = idf(docFreq=239, maxDocs=44421)
                0.09375 = fieldNorm(doc=3113)
          0.20940518 = weight(abstract_txt:cluster in 3113) [ClassicSimilarity], result of:
            0.20940518 = score(doc=3113,freq=2.0), product of:
              0.24120195 = queryWeight, product of:
                1.8964617 = boost
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.019423004 = queryNorm
              0.86817366 = fieldWeight in 3113, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.09375 = fieldNorm(doc=3113)
          0.09589742 = weight(abstract_txt:tool in 3113) [ClassicSimilarity], result of:
            0.09589742 = score(doc=3113,freq=1.0), product of:
              0.20668116 = queryWeight, product of:
                2.150056 = boost
                4.9491973 = idf(docFreq=855, maxDocs=44421)
                0.019423004 = queryNorm
              0.46398723 = fieldWeight in 3113, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9491973 = idf(docFreq=855, maxDocs=44421)
                0.09375 = fieldNorm(doc=3113)
        0.2 = coord(5/25)
    
  4. Robertson, A.M.; Willett, P.: Use of genetic algorithms in information retrieval (1995) 0.10
    0.09940906 = sum of:
      0.09940906 = product of:
        0.49704528 = sum of:
          0.10130875 = weight(abstract_txt:comparable in 2486) [ClassicSimilarity], result of:
            0.10130875 = score(doc=2486,freq=1.0), product of:
              0.13412899 = queryWeight, product of:
                6.905677 = idf(docFreq=120, maxDocs=44421)
                0.019423004 = queryNorm
              0.7553084 = fieldWeight in 2486, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.905677 = idf(docFreq=120, maxDocs=44421)
                0.109375 = fieldNorm(doc=2486)
          0.025899345 = weight(abstract_txt:results in 2486) [ClassicSimilarity], result of:
            0.025899345 = score(doc=2486,freq=1.0), product of:
              0.06807075 = queryWeight, product of:
                1.0074742 = boost
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.019423004 = queryNorm
              0.38047686 = fieldWeight in 2486, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.109375 = fieldNorm(doc=2486)
          0.012206952 = weight(abstract_txt:that in 2486) [ClassicSimilarity], result of:
            0.012206952 = score(doc=2486,freq=1.0), product of:
              0.047192167 = queryWeight, product of:
                1.0273875 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.019423004 = queryNorm
              0.2586648 = fieldWeight in 2486, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.109375 = fieldNorm(doc=2486)
          0.15975593 = weight(abstract_txt:groupings in 2486) [ClassicSimilarity], result of:
            0.15975593 = score(doc=2486,freq=1.0), product of:
              0.18171719 = queryWeight, product of:
                1.1639563 = boost
                8.037906 = idf(docFreq=38, maxDocs=44421)
                0.019423004 = queryNorm
              0.8791459 = fieldWeight in 2486, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.037906 = idf(docFreq=38, maxDocs=44421)
                0.109375 = fieldNorm(doc=2486)
          0.1978743 = weight(abstract_txt:algorithm in 2486) [ClassicSimilarity], result of:
            0.1978743 = score(doc=2486,freq=3.0), product of:
              0.18308546 = queryWeight, product of:
                1.6522684 = boost
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.019423004 = queryNorm
              1.0807756 = fieldWeight in 2486, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.109375 = fieldNorm(doc=2486)
        0.2 = coord(5/25)
    
  5. Cathey, R.J.; Jensen, E.C.; Beitzel, S.M.; Frieder, O.; Grossman, D.: Exploiting parallelism to support scalable hierarchical clustering (2007) 0.10
    0.09656057 = sum of:
      0.09656057 = product of:
        0.48280284 = sum of:
          0.02092983 = weight(abstract_txt:results in 1448) [ClassicSimilarity], result of:
            0.02092983 = score(doc=1448,freq=2.0), product of:
              0.06807075 = queryWeight, product of:
                1.0074742 = boost
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.019423004 = queryNorm
              0.30747172 = fieldWeight in 1448, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.0625 = fieldNorm(doc=1448)
          0.009864707 = weight(abstract_txt:that in 1448) [ClassicSimilarity], result of:
            0.009864707 = score(doc=1448,freq=2.0), product of:
              0.047192167 = queryWeight, product of:
                1.0273875 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.019423004 = queryNorm
              0.20903271 = fieldWeight in 1448, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0625 = fieldNorm(doc=1448)
          0.14597407 = weight(abstract_txt:algorithm in 1448) [ClassicSimilarity], result of:
            0.14597407 = score(doc=1448,freq=5.0), product of:
              0.18308546 = queryWeight, product of:
                1.6522684 = boost
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.019423004 = queryNorm
              0.79730016 = fieldWeight in 1448, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.0625 = fieldNorm(doc=1448)
          0.20731966 = weight(abstract_txt:clustering in 1448) [ClassicSimilarity], result of:
            0.20731966 = score(doc=1448,freq=6.0), product of:
              0.21768907 = queryWeight, product of:
                1.8016565 = boost
                6.2208285 = idf(docFreq=239, maxDocs=44421)
                0.019423004 = queryNorm
              0.952366 = fieldWeight in 1448, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.2208285 = idf(docFreq=239, maxDocs=44421)
                0.0625 = fieldNorm(doc=1448)
          0.09871455 = weight(abstract_txt:cluster in 1448) [ClassicSimilarity], result of:
            0.09871455 = score(doc=1448,freq=1.0), product of:
              0.24120195 = queryWeight, product of:
                1.8964617 = boost
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.019423004 = queryNorm
              0.409261 = fieldWeight in 1448, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.0625 = fieldNorm(doc=1448)
        0.2 = coord(5/25)