Document (#6627)

Author
Can, F.
Title
Incremental clustering for dynamic information processing
Source
ACM transactions on information systems. 11(1993) no.2, S.143-164
Year
1993
Abstract
Clustering of very large document databases is useful for both searching and browsing. The periodic updating of clusters is required due to the dynamic nature of databases. Introduces an algorithm for incremental clustering and discusses the complexity and cost of analysis of the algorithm together with an investigation of its expected behaviour. Shows through empirical testing that the algortihm achieves cost effectiveness and generates statistically valid clusters that are compatible with those of reclustering. The experimental evidence shows that the algorithm creates an effective and effecient retrieval environment
Theme
Automatisches Indexieren
Retrievalalgorithmen

Similar documents (content)

  1. Cathey, R.J.; Jensen, E.C.; Beitzel, S.M.; Frieder, O.; Grossman, D.: Exploiting parallelism to support scalable hierarchical clustering (2007) 0.20
    0.19862863 = sum of:
      0.19862863 = product of:
        0.8276193 = sum of:
          0.041950524 = weight(abstract_txt:complexity in 1448) [ClassicSimilarity], result of:
            0.041950524 = score(doc=1448,freq=1.0), product of:
              0.11365991 = queryWeight, product of:
                1.082881 = boost
                5.90541 = idf(docFreq=328, maxDocs=44421)
                0.017773647 = queryNorm
              0.3690881 = fieldWeight in 1448, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.90541 = idf(docFreq=328, maxDocs=44421)
                0.0625 = fieldNorm(doc=1448)
          0.04725812 = weight(abstract_txt:expected in 1448) [ClassicSimilarity], result of:
            0.04725812 = score(doc=1448,freq=1.0), product of:
              0.12305521 = queryWeight, product of:
                1.1267487 = boost
                6.1446395 = idf(docFreq=258, maxDocs=44421)
                0.017773647 = queryNorm
              0.38403997 = fieldWeight in 1448, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1446395 = idf(docFreq=258, maxDocs=44421)
                0.0625 = fieldNorm(doc=1448)
          0.011430921 = weight(abstract_txt:that in 1448) [ClassicSimilarity], result of:
            0.011430921 = score(doc=1448,freq=2.0), product of:
              0.05468484 = queryWeight, product of:
                1.3009816 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.017773647 = queryNorm
              0.20903271 = fieldWeight in 1448, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0625 = fieldNorm(doc=1448)
          0.11290075 = weight(abstract_txt:clusters in 1448) [ClassicSimilarity], result of:
            0.11290075 = score(doc=1448,freq=1.0), product of:
              0.2770705 = queryWeight, product of:
                2.391042 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.017773647 = queryNorm
              0.40748024 = fieldWeight in 1448, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.0625 = fieldNorm(doc=1448)
          0.25372544 = weight(abstract_txt:algorithm in 1448) [ClassicSimilarity], result of:
            0.25372544 = score(doc=1448,freq=5.0), product of:
              0.31823075 = queryWeight, product of:
                3.1384034 = boost
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.017773647 = queryNorm
              0.79730016 = fieldWeight in 1448, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.0625 = fieldNorm(doc=1448)
          0.36035356 = weight(abstract_txt:clustering in 1448) [ClassicSimilarity], result of:
            0.36035356 = score(doc=1448,freq=6.0), product of:
              0.37837717 = queryWeight, product of:
                3.4221587 = boost
                6.2208285 = idf(docFreq=239, maxDocs=44421)
                0.017773647 = queryNorm
              0.952366 = fieldWeight in 1448, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.2208285 = idf(docFreq=239, maxDocs=44421)
                0.0625 = fieldNorm(doc=1448)
        0.24 = coord(6/25)
    
  2. Zamir, O.; Etzioni, O.: Grouper : a dynamic clustering interface to Web search results (1999) 0.16
    0.16168566 = sum of:
      0.16168566 = product of:
        0.8084283 = sum of:
          0.044266276 = weight(abstract_txt:browsing in 207) [ClassicSimilarity], result of:
            0.044266276 = score(doc=207,freq=1.0), product of:
              0.101521425 = queryWeight, product of:
                1.0234247 = boost
                5.58117 = idf(docFreq=454, maxDocs=44421)
                0.017773647 = queryNorm
              0.4360289 = fieldWeight in 207, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.58117 = idf(docFreq=454, maxDocs=44421)
                0.078125 = fieldNorm(doc=207)
          0.010103601 = weight(abstract_txt:that in 207) [ClassicSimilarity], result of:
            0.010103601 = score(doc=207,freq=1.0), product of:
              0.05468484 = queryWeight, product of:
                1.3009816 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.017773647 = queryNorm
              0.18476056 = fieldWeight in 207, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.078125 = fieldNorm(doc=207)
          0.24443729 = weight(abstract_txt:clusters in 207) [ClassicSimilarity], result of:
            0.24443729 = score(doc=207,freq=3.0), product of:
              0.2770705 = queryWeight, product of:
                2.391042 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.017773647 = queryNorm
              0.88222057 = fieldWeight in 207, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.078125 = fieldNorm(doc=207)
          0.14183682 = weight(abstract_txt:algorithm in 207) [ClassicSimilarity], result of:
            0.14183682 = score(doc=207,freq=1.0), product of:
              0.31823075 = queryWeight, product of:
                3.1384034 = boost
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.017773647 = queryNorm
              0.44570434 = fieldWeight in 207, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.078125 = fieldNorm(doc=207)
          0.3677843 = weight(abstract_txt:clustering in 207) [ClassicSimilarity], result of:
            0.3677843 = score(doc=207,freq=4.0), product of:
              0.37837717 = queryWeight, product of:
                3.4221587 = boost
                6.2208285 = idf(docFreq=239, maxDocs=44421)
                0.017773647 = queryNorm
              0.9720045 = fieldWeight in 207, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.2208285 = idf(docFreq=239, maxDocs=44421)
                0.078125 = fieldNorm(doc=207)
        0.2 = coord(5/25)
    
  3. Kishida, K.: High-speed rough clustering for very large document collections (2010) 0.15
    0.15040867 = sum of:
      0.15040867 = product of:
        0.75204337 = sum of:
          0.041950524 = weight(abstract_txt:complexity in 450) [ClassicSimilarity], result of:
            0.041950524 = score(doc=450,freq=1.0), product of:
              0.11365991 = queryWeight, product of:
                1.082881 = boost
                5.90541 = idf(docFreq=328, maxDocs=44421)
                0.017773647 = queryNorm
              0.3690881 = fieldWeight in 450, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.90541 = idf(docFreq=328, maxDocs=44421)
                0.0625 = fieldNorm(doc=450)
          0.011430921 = weight(abstract_txt:that in 450) [ClassicSimilarity], result of:
            0.011430921 = score(doc=450,freq=2.0), product of:
              0.05468484 = queryWeight, product of:
                1.3009816 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.017773647 = queryNorm
              0.20903271 = fieldWeight in 450, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0625 = fieldNorm(doc=450)
          0.11290075 = weight(abstract_txt:clusters in 450) [ClassicSimilarity], result of:
            0.11290075 = score(doc=450,freq=1.0), product of:
              0.2770705 = queryWeight, product of:
                2.391042 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.017773647 = queryNorm
              0.40748024 = fieldWeight in 450, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.0625 = fieldNorm(doc=450)
          0.19653487 = weight(abstract_txt:algorithm in 450) [ClassicSimilarity], result of:
            0.19653487 = score(doc=450,freq=3.0), product of:
              0.31823075 = queryWeight, product of:
                3.1384034 = boost
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.017773647 = queryNorm
              0.6175861 = fieldWeight in 450, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.0625 = fieldNorm(doc=450)
          0.38922632 = weight(abstract_txt:clustering in 450) [ClassicSimilarity], result of:
            0.38922632 = score(doc=450,freq=7.0), product of:
              0.37837717 = queryWeight, product of:
                3.4221587 = boost
                6.2208285 = idf(docFreq=239, maxDocs=44421)
                0.017773647 = queryNorm
              1.0286728 = fieldWeight in 450, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                6.2208285 = idf(docFreq=239, maxDocs=44421)
                0.0625 = fieldNorm(doc=450)
        0.2 = coord(5/25)
    
  4. Kostoff, R.N.; Block, J.A.: Factor matrix text filtering and clustering (2005) 0.14
    0.14240785 = sum of:
      0.14240785 = product of:
        0.71203923 = sum of:
          0.01749995 = weight(abstract_txt:that in 4683) [ClassicSimilarity], result of:
            0.01749995 = score(doc=4683,freq=3.0), product of:
              0.05468484 = queryWeight, product of:
                1.3009816 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.017773647 = queryNorm
              0.32001466 = fieldWeight in 4683, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.078125 = fieldNorm(doc=4683)
          0.0437922 = weight(abstract_txt:databases in 4683) [ClassicSimilarity], result of:
            0.0437922 = score(doc=4683,freq=1.0), product of:
              0.1269941 = queryWeight, product of:
                1.6187651 = boost
                4.413907 = idf(docFreq=1461, maxDocs=44421)
                0.017773647 = queryNorm
              0.34483647 = fieldWeight in 4683, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.413907 = idf(docFreq=1461, maxDocs=44421)
                0.078125 = fieldNorm(doc=4683)
          0.14112593 = weight(abstract_txt:clusters in 4683) [ClassicSimilarity], result of:
            0.14112593 = score(doc=4683,freq=1.0), product of:
              0.2770705 = queryWeight, product of:
                2.391042 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.017773647 = queryNorm
              0.5093503 = fieldWeight in 4683, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.078125 = fieldNorm(doc=4683)
          0.14183682 = weight(abstract_txt:algorithm in 4683) [ClassicSimilarity], result of:
            0.14183682 = score(doc=4683,freq=1.0), product of:
              0.31823075 = queryWeight, product of:
                3.1384034 = boost
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.017773647 = queryNorm
              0.44570434 = fieldWeight in 4683, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.078125 = fieldNorm(doc=4683)
          0.3677843 = weight(abstract_txt:clustering in 4683) [ClassicSimilarity], result of:
            0.3677843 = score(doc=4683,freq=4.0), product of:
              0.37837717 = queryWeight, product of:
                3.4221587 = boost
                6.2208285 = idf(docFreq=239, maxDocs=44421)
                0.017773647 = queryNorm
              0.9720045 = fieldWeight in 4683, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.2208285 = idf(docFreq=239, maxDocs=44421)
                0.078125 = fieldNorm(doc=4683)
        0.2 = coord(5/25)
    
  5. Frants, V.I.; Kamenoff, N.I.; Shapiro, J.: ¬One approach to classification of users and automatic clustering of documents (1993) 0.14
    0.13759293 = sum of:
      0.13759293 = product of:
        0.85995585 = sum of:
          0.016165763 = weight(abstract_txt:that in 4568) [ClassicSimilarity], result of:
            0.016165763 = score(doc=4568,freq=1.0), product of:
              0.05468484 = queryWeight, product of:
                1.3009816 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.017773647 = queryNorm
              0.2956169 = fieldWeight in 4568, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.125 = fieldNorm(doc=4568)
          0.10835811 = weight(abstract_txt:shows in 4568) [ClassicSimilarity], result of:
            0.10835811 = score(doc=4568,freq=1.0), product of:
              0.16982958 = queryWeight, product of:
                1.8719692 = boost
                5.104322 = idf(docFreq=732, maxDocs=44421)
                0.017773647 = queryNorm
              0.63804024 = fieldWeight in 4568, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.104322 = idf(docFreq=732, maxDocs=44421)
                0.125 = fieldNorm(doc=4568)
          0.31933156 = weight(abstract_txt:clusters in 4568) [ClassicSimilarity], result of:
            0.31933156 = score(doc=4568,freq=2.0), product of:
              0.2770705 = queryWeight, product of:
                2.391042 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.017773647 = queryNorm
              1.1525282 = fieldWeight in 4568, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.125 = fieldNorm(doc=4568)
          0.4161004 = weight(abstract_txt:clustering in 4568) [ClassicSimilarity], result of:
            0.4161004 = score(doc=4568,freq=2.0), product of:
              0.37837717 = queryWeight, product of:
                3.4221587 = boost
                6.2208285 = idf(docFreq=239, maxDocs=44421)
                0.017773647 = queryNorm
              1.0996975 = fieldWeight in 4568, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.2208285 = idf(docFreq=239, maxDocs=44421)
                0.125 = fieldNorm(doc=4568)
        0.16 = coord(4/25)