Document (#19662)

Author
Ruocco, A.S.
Frieder, O.
Title
Clustering and classification of large document bases in a parallel environment
Source
Journal of the American Society for Information Science. 48(1997) no.10, S.932-943
Year
1997
Abstract
Proposes the use of parallel computing systems to overcome the computationally intense clustering process. Examines 2 operations: clustering a document set and classifying the document set. Uses a subset of the TIPSTER corpus, specifically, articles from the Wall Street Journal. Document set classification was performed without the large storage requirements for ancillary data matrices. The time performance of the parallel systems was an improvement over sequential systems times, and produced the same clustering and classification scheme. Results show near linear speed up in higher threshold clustering applications
Theme
Automatisches Klassifizieren

Similar documents (author)

  1. Grossman, D.A.; Frieder, O.: Information retrieval : algorithms and heuristics (1998) 4.46
    4.4644394 = sum of:
      4.4644394 = weight(author_txt:frieder in 3182) [ClassicSimilarity], result of:
        4.4644394 = fieldWeight in 3182, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.928879 = idf(docFreq=15, maxDocs=44421)
          0.5 = fieldNorm(doc=3182)
    
  2. Grossman, D.A.; Frieder, O.: Information retrieval : algorithms and heuristics (2004) 4.46
    4.4644394 = sum of:
      4.4644394 = weight(author_txt:frieder in 2486) [ClassicSimilarity], result of:
        4.4644394 = fieldWeight in 2486, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.928879 = idf(docFreq=15, maxDocs=44421)
          0.5 = fieldNorm(doc=2486)
    
  3. Soo, J.; Frieder, O.: On searching misspelled collections (2015) 4.46
    4.4644394 = sum of:
      4.4644394 = weight(author_txt:frieder in 2862) [ClassicSimilarity], result of:
        4.4644394 = fieldWeight in 2862, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.928879 = idf(docFreq=15, maxDocs=44421)
          0.5 = fieldNorm(doc=2862)
    
  4. Aljlayl, M.; Frieder, O.; Grossman, D.: On bidirectional English-Arabic search (2002) 3.35
    3.3483295 = sum of:
      3.3483295 = weight(author_txt:frieder in 227) [ClassicSimilarity], result of:
        3.3483295 = fieldWeight in 227, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.928879 = idf(docFreq=15, maxDocs=44421)
          0.375 = fieldNorm(doc=227)
    
  5. Urbain, J.; Goharian, N.; Frieder, O.: Probabilistic passage models for semantic search of genomics literature (2008) 3.35
    3.3483295 = sum of:
      3.3483295 = weight(author_txt:frieder in 3380) [ClassicSimilarity], result of:
        3.3483295 = fieldWeight in 3380, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.928879 = idf(docFreq=15, maxDocs=44421)
          0.375 = fieldNorm(doc=3380)
    

Similar documents (content)

  1. Cathey, R.J.; Jensen, E.C.; Beitzel, S.M.; Frieder, O.; Grossman, D.: Exploiting parallelism to support scalable hierarchical clustering (2007) 0.30
    0.30369166 = sum of:
      0.30369166 = product of:
        1.0846131 = sum of:
          0.04920159 = weight(abstract_txt:operations in 1448) [ClassicSimilarity], result of:
            0.04920159 = score(doc=1448,freq=1.0), product of:
              0.12053704 = queryWeight, product of:
                1.0790566 = boost
                6.5309834 = idf(docFreq=175, maxDocs=44421)
                0.017104002 = queryNorm
              0.40818647 = fieldWeight in 1448, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5309834 = idf(docFreq=175, maxDocs=44421)
                0.0625 = fieldNorm(doc=1448)
          0.05968389 = weight(abstract_txt:near in 1448) [ClassicSimilarity], result of:
            0.05968389 = score(doc=1448,freq=1.0), product of:
              0.13710055 = queryWeight, product of:
                1.1508099 = boost
                6.965269 = idf(docFreq=113, maxDocs=44421)
                0.017104002 = queryNorm
              0.43532932 = fieldWeight in 1448, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.965269 = idf(docFreq=113, maxDocs=44421)
                0.0625 = fieldNorm(doc=1448)
          0.06536699 = weight(abstract_txt:subset in 1448) [ClassicSimilarity], result of:
            0.06536699 = score(doc=1448,freq=1.0), product of:
              0.14567111 = queryWeight, product of:
                1.1862348 = boost
                7.179679 = idf(docFreq=91, maxDocs=44421)
                0.017104002 = queryNorm
              0.44872993 = fieldWeight in 1448, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.179679 = idf(docFreq=91, maxDocs=44421)
                0.0625 = fieldNorm(doc=1448)
          0.030967928 = weight(abstract_txt:large in 1448) [ClassicSimilarity], result of:
            0.030967928 = score(doc=1448,freq=1.0), product of:
              0.11153707 = queryWeight, product of:
                1.4679409 = boost
                4.4423513 = idf(docFreq=1420, maxDocs=44421)
                0.017104002 = queryNorm
              0.27764696 = fieldWeight in 1448, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4423513 = idf(docFreq=1420, maxDocs=44421)
                0.0625 = fieldNorm(doc=1448)
          0.07911339 = weight(abstract_txt:document in 1448) [ClassicSimilarity], result of:
            0.07911339 = score(doc=1448,freq=2.0), product of:
              0.20843841 = queryWeight, product of:
                2.8379376 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.017104002 = queryNorm
              0.3795528 = fieldWeight in 1448, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.0625 = fieldNorm(doc=1448)
          0.279524 = weight(abstract_txt:parallel in 1448) [ClassicSimilarity], result of:
            0.279524 = score(doc=1448,freq=4.0), product of:
              0.3486858 = queryWeight, product of:
                3.1787891 = boost
                6.4132004 = idf(docFreq=197, maxDocs=44421)
                0.017104002 = queryNorm
              0.80165005 = fieldWeight in 1448, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.4132004 = idf(docFreq=197, maxDocs=44421)
                0.0625 = fieldNorm(doc=1448)
          0.52075535 = weight(abstract_txt:clustering in 1448) [ClassicSimilarity], result of:
            0.52075535 = score(doc=1448,freq=6.0), product of:
              0.5468017 = queryWeight, product of:
                5.1390624 = boost
                6.2208285 = idf(docFreq=239, maxDocs=44421)
                0.017104002 = queryNorm
              0.952366 = fieldWeight in 1448, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.2208285 = idf(docFreq=239, maxDocs=44421)
                0.0625 = fieldNorm(doc=1448)
        0.28 = coord(7/25)
    
  2. Rooney, N.; Patterson, D.; Galushka, M.; Dobrynin, V.; Smirnova, E.: ¬An investigation into the stability of contextual document clustering (2008) 0.14
    0.13530126 = sum of:
      0.13530126 = product of:
        0.6765063 = sum of:
          0.039160337 = weight(abstract_txt:times in 2356) [ClassicSimilarity], result of:
            0.039160337 = score(doc=2356,freq=1.0), product of:
              0.10352186 = queryWeight, product of:
                6.0524936 = idf(docFreq=283, maxDocs=44421)
                0.017104002 = queryNorm
              0.37828085 = fieldWeight in 2356, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0524936 = idf(docFreq=283, maxDocs=44421)
                0.0625 = fieldNorm(doc=2356)
          0.056472525 = weight(abstract_txt:corpus in 2356) [ClassicSimilarity], result of:
            0.056472525 = score(doc=2356,freq=2.0), product of:
              0.10487756 = queryWeight, product of:
                1.0065266 = boost
                6.0919957 = idf(docFreq=272, maxDocs=44421)
                0.017104002 = queryNorm
              0.53846145 = fieldWeight in 2356, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.0919957 = idf(docFreq=272, maxDocs=44421)
                0.0625 = fieldNorm(doc=2356)
          0.043795265 = weight(abstract_txt:large in 2356) [ClassicSimilarity], result of:
            0.043795265 = score(doc=2356,freq=2.0), product of:
              0.11153707 = queryWeight, product of:
                1.4679409 = boost
                4.4423513 = idf(docFreq=1420, maxDocs=44421)
                0.017104002 = queryNorm
              0.3926521 = fieldWeight in 2356, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4423513 = idf(docFreq=1420, maxDocs=44421)
                0.0625 = fieldNorm(doc=2356)
          0.11188322 = weight(abstract_txt:document in 2356) [ClassicSimilarity], result of:
            0.11188322 = score(doc=2356,freq=4.0), product of:
              0.20843841 = queryWeight, product of:
                2.8379376 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.017104002 = queryNorm
              0.53676873 = fieldWeight in 2356, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.0625 = fieldNorm(doc=2356)
          0.42519495 = weight(abstract_txt:clustering in 2356) [ClassicSimilarity], result of:
            0.42519495 = score(doc=2356,freq=4.0), product of:
              0.5468017 = queryWeight, product of:
                5.1390624 = boost
                6.2208285 = idf(docFreq=239, maxDocs=44421)
                0.017104002 = queryNorm
              0.77760357 = fieldWeight in 2356, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.2208285 = idf(docFreq=239, maxDocs=44421)
                0.0625 = fieldNorm(doc=2356)
        0.2 = coord(5/25)
    
  3. Mather, L.A.: ¬A linear algebra measure of cluster quality (2000) 0.12
    0.120664425 = sum of:
      0.120664425 = product of:
        0.75415266 = sum of:
          0.074154414 = weight(abstract_txt:linear in 5767) [ClassicSimilarity], result of:
            0.074154414 = score(doc=5767,freq=2.0), product of:
              0.12576191 = queryWeight, product of:
                1.1021953 = boost
                6.6710296 = idf(docFreq=152, maxDocs=44421)
                0.017104002 = queryNorm
              0.5896413 = fieldWeight in 5767, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.6710296 = idf(docFreq=152, maxDocs=44421)
                0.0625 = fieldNorm(doc=5767)
          0.12971409 = weight(abstract_txt:matrices in 5767) [ClassicSimilarity], result of:
            0.12971409 = score(doc=5767,freq=2.0), product of:
              0.18257833 = queryWeight, product of:
                1.3280321 = boost
                8.037906 = idf(docFreq=38, maxDocs=44421)
                0.017104002 = queryNorm
              0.7104572 = fieldWeight in 5767, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.037906 = idf(docFreq=38, maxDocs=44421)
                0.0625 = fieldNorm(doc=5767)
          0.12508924 = weight(abstract_txt:document in 5767) [ClassicSimilarity], result of:
            0.12508924 = score(doc=5767,freq=5.0), product of:
              0.20843841 = queryWeight, product of:
                2.8379376 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.017104002 = queryNorm
              0.6001257 = fieldWeight in 5767, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.0625 = fieldNorm(doc=5767)
          0.42519495 = weight(abstract_txt:clustering in 5767) [ClassicSimilarity], result of:
            0.42519495 = score(doc=5767,freq=4.0), product of:
              0.5468017 = queryWeight, product of:
                5.1390624 = boost
                6.2208285 = idf(docFreq=239, maxDocs=44421)
                0.017104002 = queryNorm
              0.77760357 = fieldWeight in 5767, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.2208285 = idf(docFreq=239, maxDocs=44421)
                0.0625 = fieldNorm(doc=5767)
        0.16 = coord(4/25)
    
  4. Kishida, K.: High-speed rough clustering for very large document collections (2010) 0.12
    0.117129765 = sum of:
      0.117129765 = product of:
        0.732061 = sum of:
          0.039932102 = weight(abstract_txt:corpus in 450) [ClassicSimilarity], result of:
            0.039932102 = score(doc=450,freq=1.0), product of:
              0.10487756 = queryWeight, product of:
                1.0065266 = boost
                6.0919957 = idf(docFreq=272, maxDocs=44421)
                0.017104002 = queryNorm
              0.38074973 = fieldWeight in 450, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0919957 = idf(docFreq=272, maxDocs=44421)
                0.0625 = fieldNorm(doc=450)
          0.05053553 = weight(abstract_txt:speed in 450) [ClassicSimilarity], result of:
            0.05053553 = score(doc=450,freq=1.0), product of:
              0.122705966 = queryWeight, product of:
                1.0887215 = boost
                6.58948 = idf(docFreq=165, maxDocs=44421)
                0.017104002 = queryNorm
              0.4118425 = fieldWeight in 450, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.58948 = idf(docFreq=165, maxDocs=44421)
                0.0625 = fieldNorm(doc=450)
          0.07911339 = weight(abstract_txt:document in 450) [ClassicSimilarity], result of:
            0.07911339 = score(doc=450,freq=2.0), product of:
              0.20843841 = queryWeight, product of:
                2.8379376 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.017104002 = queryNorm
              0.3795528 = fieldWeight in 450, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.0625 = fieldNorm(doc=450)
          0.56248003 = weight(abstract_txt:clustering in 450) [ClassicSimilarity], result of:
            0.56248003 = score(doc=450,freq=7.0), product of:
              0.5468017 = queryWeight, product of:
                5.1390624 = boost
                6.2208285 = idf(docFreq=239, maxDocs=44421)
                0.017104002 = queryNorm
              1.0286728 = fieldWeight in 450, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                6.2208285 = idf(docFreq=239, maxDocs=44421)
                0.0625 = fieldNorm(doc=450)
        0.16 = coord(4/25)
    
  5. Zamir, O.; Etzioni, O.: Grouper : a dynamic clustering interface to Web search results (1999) 0.12
    0.11691746 = sum of:
      0.11691746 = product of:
        0.73073417 = sum of:
          0.06316941 = weight(abstract_txt:speed in 207) [ClassicSimilarity], result of:
            0.06316941 = score(doc=207,freq=1.0), product of:
              0.122705966 = queryWeight, product of:
                1.0887215 = boost
                6.58948 = idf(docFreq=165, maxDocs=44421)
                0.017104002 = queryNorm
              0.5148031 = fieldWeight in 207, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.58948 = idf(docFreq=165, maxDocs=44421)
                0.078125 = fieldNorm(doc=207)
          0.037179347 = weight(abstract_txt:systems in 207) [ClassicSimilarity], result of:
            0.037179347 = score(doc=207,freq=2.0), product of:
              0.098649 = queryWeight, product of:
                1.690795 = boost
                3.411175 = idf(docFreq=3984, maxDocs=44421)
                0.017104002 = queryNorm
              0.37688518 = fieldWeight in 207, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.411175 = idf(docFreq=3984, maxDocs=44421)
                0.078125 = fieldNorm(doc=207)
          0.098891735 = weight(abstract_txt:document in 207) [ClassicSimilarity], result of:
            0.098891735 = score(doc=207,freq=2.0), product of:
              0.20843841 = queryWeight, product of:
                2.8379376 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.017104002 = queryNorm
              0.47444102 = fieldWeight in 207, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.078125 = fieldNorm(doc=207)
          0.53149366 = weight(abstract_txt:clustering in 207) [ClassicSimilarity], result of:
            0.53149366 = score(doc=207,freq=4.0), product of:
              0.5468017 = queryWeight, product of:
                5.1390624 = boost
                6.2208285 = idf(docFreq=239, maxDocs=44421)
                0.017104002 = queryNorm
              0.9720045 = fieldWeight in 207, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.2208285 = idf(docFreq=239, maxDocs=44421)
                0.078125 = fieldNorm(doc=207)
        0.16 = coord(4/25)