Document (#20953)

Author
Galal, G.M.
Cook, D.J.
Holder, L.B.
Title
Exploiting parallelism in a structural scientific discovery system to improve scalability
Source
Journal of the American Society for Information Science. 50(1999) no.1, S.65-73
Year
1999
Abstract
The large amount of data collected today is quickly overwhelming researchers' abilities to interpret the data and discover interesting patterns. Knowledge discovery and data mining approaches hold the potential to automate the interpretation process, but these approaches frequently utilize computationally expensive algorithms. In particular, scientific discovery systems focus on the utilization of richer data representation, sometimes without regard for scalability. This research investigates approaches for scaling a particular knowledge discovery in databases (KDD) system, SUBDUE, using parallel and distributed resources. SUBDUE has been used to discover interesting and repetitive concepts in graph-based databases from a variety of domains, but requires a substantial amount of processing time. Experiments that demonstrate scalability of parallel versions of the SUBDUE system are performed using CAD circuit databases and artificially-generated databases, and potential achievements and obstacles are discussed
Theme
Data Mining
Object
SUBDUE

Similar documents (author)

  1. Cook, M.: ¬The management of information from archives (1999) 5.76
    5.7603507 = sum of:
      5.7603507 = weight(author_txt:cook in 6785) [ClassicSimilarity], result of:
        5.7603507 = fieldWeight in 6785, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.216561 = idf(docFreq=11, maxDocs=44421)
          0.625 = fieldNorm(doc=6785)
    
  2. Cook, M.: New directions in records management (1994) 5.76
    5.7603507 = sum of:
      5.7603507 = weight(author_txt:cook in 758) [ClassicSimilarity], result of:
        5.7603507 = fieldWeight in 758, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.216561 = idf(docFreq=11, maxDocs=44421)
          0.625 = fieldNorm(doc=758)
    
  3. Cook, K.: ¬The incredible expanding OPAC (1994) 5.76
    5.7603507 = sum of:
      5.7603507 = weight(author_txt:cook in 2468) [ClassicSimilarity], result of:
        5.7603507 = fieldWeight in 2468, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.216561 = idf(docFreq=11, maxDocs=44421)
          0.625 = fieldNorm(doc=2468)
    
  4. Cook, T.: Keeping our electronic memory : approaches for securing computer-generated records (1995) 5.76
    5.7603507 = sum of:
      5.7603507 = weight(author_txt:cook in 6440) [ClassicSimilarity], result of:
        5.7603507 = fieldWeight in 6440, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.216561 = idf(docFreq=11, maxDocs=44421)
          0.625 = fieldNorm(doc=6440)
    
  5. Cook, M.: ¬The International Description Standards : new departures (1996) 5.76
    5.7603507 = sum of:
      5.7603507 = weight(author_txt:cook in 941) [ClassicSimilarity], result of:
        5.7603507 = fieldWeight in 941, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.216561 = idf(docFreq=11, maxDocs=44421)
          0.625 = fieldNorm(doc=941)
    

Similar documents (content)

  1. Chen, Z.: Knowledge discovery and system-user partnership : on a production 'adversarial partnership' approach (1994) 0.14
    0.13774188 = sum of:
      0.13774188 = product of:
        0.6887094 = sum of:
          0.05727099 = weight(abstract_txt:potential in 6827) [ClassicSimilarity], result of:
            0.05727099 = score(doc=6827,freq=1.0), product of:
              0.1134672 = queryWeight, product of:
                1.2544011 = boost
                4.61473 = idf(docFreq=1195, maxDocs=44421)
                0.019601425 = queryNorm
              0.50473607 = fieldWeight in 6827, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.61473 = idf(docFreq=1195, maxDocs=44421)
                0.109375 = fieldNorm(doc=6827)
          0.033538934 = weight(abstract_txt:system in 6827) [ClassicSimilarity], result of:
            0.033538934 = score(doc=6827,freq=1.0), product of:
              0.09091673 = queryWeight, product of:
                1.3752092 = boost
                3.372775 = idf(docFreq=4140, maxDocs=44421)
                0.019601425 = queryNorm
              0.36889726 = fieldWeight in 6827, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.372775 = idf(docFreq=4140, maxDocs=44421)
                0.109375 = fieldNorm(doc=6827)
          0.043047387 = weight(abstract_txt:data in 6827) [ClassicSimilarity], result of:
            0.043047387 = score(doc=6827,freq=1.0), product of:
              0.11818302 = queryWeight, product of:
                1.8104802 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.019601425 = queryNorm
              0.36424342 = fieldWeight in 6827, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.109375 = fieldNorm(doc=6827)
          0.14174587 = weight(abstract_txt:databases in 6827) [ClassicSimilarity], result of:
            0.14174587 = score(doc=6827,freq=2.0), product of:
              0.20761281 = queryWeight, product of:
                2.3996248 = boost
                4.413907 = idf(docFreq=1461, maxDocs=44421)
                0.019601425 = queryNorm
              0.6827414 = fieldWeight in 6827, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.413907 = idf(docFreq=1461, maxDocs=44421)
                0.109375 = fieldNorm(doc=6827)
          0.4131062 = weight(abstract_txt:discovery in 6827) [ClassicSimilarity], result of:
            0.4131062 = score(doc=6827,freq=4.0), product of:
              0.33621082 = queryWeight, product of:
                3.05367 = boost
                5.616968 = idf(docFreq=438, maxDocs=44421)
                0.019601425 = queryNorm
              1.2287118 = fieldWeight in 6827, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.616968 = idf(docFreq=438, maxDocs=44421)
                0.109375 = fieldNorm(doc=6827)
        0.2 = coord(5/25)
    
  2. Janée, G.; Frew, J.; Hill, L.L.: Issues in georeferenced digital libraries (2004) 0.11
    0.10766652 = sum of:
      0.10766652 = product of:
        0.67291576 = sum of:
          0.028747657 = weight(abstract_txt:system in 2165) [ClassicSimilarity], result of:
            0.028747657 = score(doc=2165,freq=1.0), product of:
              0.09091673 = queryWeight, product of:
                1.3752092 = boost
                3.372775 = idf(docFreq=4140, maxDocs=44421)
                0.019601425 = queryNorm
              0.31619766 = fieldWeight in 2165, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.372775 = idf(docFreq=4140, maxDocs=44421)
                0.09375 = fieldNorm(doc=2165)
          0.03689776 = weight(abstract_txt:data in 2165) [ClassicSimilarity], result of:
            0.03689776 = score(doc=2165,freq=1.0), product of:
              0.11818302 = queryWeight, product of:
                1.8104802 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.019601425 = queryNorm
              0.31220865 = fieldWeight in 2165, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.09375 = fieldNorm(doc=2165)
          0.25038016 = weight(abstract_txt:discovery in 2165) [ClassicSimilarity], result of:
            0.25038016 = score(doc=2165,freq=2.0), product of:
              0.33621082 = queryWeight, product of:
                3.05367 = boost
                5.616968 = idf(docFreq=438, maxDocs=44421)
                0.019601425 = queryNorm
              0.7447118 = fieldWeight in 2165, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.616968 = idf(docFreq=438, maxDocs=44421)
                0.09375 = fieldNorm(doc=2165)
          0.35689017 = weight(abstract_txt:scalability in 2165) [ClassicSimilarity], result of:
            0.35689017 = score(doc=2165,freq=1.0), product of:
              0.48745203 = queryWeight, product of:
                3.1842916 = boost
                7.809647 = idf(docFreq=48, maxDocs=44421)
                0.019601425 = queryNorm
              0.7321544 = fieldWeight in 2165, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.809647 = idf(docFreq=48, maxDocs=44421)
                0.09375 = fieldNorm(doc=2165)
        0.16 = coord(4/25)
    
  3. Hsu, C.-N.; Chang, C.-H.; Hsieh, C.-H.; Lu, J.-J.; Chang, C.-C.: Reconfigurable Web wrapper agents for biological information integration (2005) 0.11
    0.10593182 = sum of:
      0.10593182 = product of:
        0.5296591 = sum of:
          0.08486428 = weight(abstract_txt:automate in 263) [ClassicSimilarity], result of:
            0.08486428 = score(doc=263,freq=1.0), product of:
              0.16998576 = queryWeight, product of:
                1.0856568 = boost
                7.9878955 = idf(docFreq=40, maxDocs=44421)
                0.019601425 = queryNorm
              0.49924347 = fieldWeight in 263, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.9878955 = idf(docFreq=40, maxDocs=44421)
                0.0625 = fieldNorm(doc=263)
          0.08646822 = weight(abstract_txt:overwhelming in 263) [ClassicSimilarity], result of:
            0.08646822 = score(doc=263,freq=1.0), product of:
              0.1721209 = queryWeight, product of:
                1.0924537 = boost
                8.037906 = idf(docFreq=38, maxDocs=44421)
                0.019601425 = queryNorm
              0.5023691 = fieldWeight in 263, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.037906 = idf(docFreq=38, maxDocs=44421)
                0.0625 = fieldNorm(doc=263)
          0.12632495 = weight(abstract_txt:discover in 263) [ClassicSimilarity], result of:
            0.12632495 = score(doc=263,freq=2.0), product of:
              0.22160965 = queryWeight, product of:
                1.7530552 = boost
                6.449194 = idf(docFreq=190, maxDocs=44421)
                0.019601425 = queryNorm
              0.5700336 = fieldWeight in 263, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.449194 = idf(docFreq=190, maxDocs=44421)
                0.0625 = fieldNorm(doc=263)
          0.06508153 = weight(abstract_txt:data in 263) [ClassicSimilarity], result of:
            0.06508153 = score(doc=263,freq=7.0), product of:
              0.11818302 = queryWeight, product of:
                1.8104802 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.019601425 = queryNorm
              0.5506843 = fieldWeight in 263, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.0625 = fieldNorm(doc=263)
          0.16692011 = weight(abstract_txt:discovery in 263) [ClassicSimilarity], result of:
            0.16692011 = score(doc=263,freq=2.0), product of:
              0.33621082 = queryWeight, product of:
                3.05367 = boost
                5.616968 = idf(docFreq=438, maxDocs=44421)
                0.019601425 = queryNorm
              0.49647453 = fieldWeight in 263, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.616968 = idf(docFreq=438, maxDocs=44421)
                0.0625 = fieldNorm(doc=263)
        0.2 = coord(5/25)
    
  4. Networked knowledge organization systems (2001) 0.10
    0.09902429 = sum of:
      0.09902429 = product of:
        0.49512142 = sum of:
          0.04908942 = weight(abstract_txt:potential in 473) [ClassicSimilarity], result of:
            0.04908942 = score(doc=473,freq=1.0), product of:
              0.1134672 = queryWeight, product of:
                1.2544011 = boost
                4.61473 = idf(docFreq=1195, maxDocs=44421)
                0.019601425 = queryNorm
              0.43263093 = fieldWeight in 473, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.61473 = idf(docFreq=1195, maxDocs=44421)
                0.09375 = fieldNorm(doc=473)
          0.03689776 = weight(abstract_txt:data in 473) [ClassicSimilarity], result of:
            0.03689776 = score(doc=473,freq=1.0), product of:
              0.11818302 = queryWeight, product of:
                1.8104802 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.019601425 = queryNorm
              0.31220865 = fieldWeight in 473, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.09375 = fieldNorm(doc=473)
          0.072843105 = weight(abstract_txt:approaches in 473) [ClassicSimilarity], result of:
            0.072843105 = score(doc=473,freq=1.0), product of:
              0.16897967 = queryWeight, product of:
                1.8748395 = boost
                4.5981455 = idf(docFreq=1215, maxDocs=44421)
                0.019601425 = queryNorm
              0.43107614 = fieldWeight in 473, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5981455 = idf(docFreq=1215, maxDocs=44421)
                0.09375 = fieldNorm(doc=473)
          0.08591097 = weight(abstract_txt:databases in 473) [ClassicSimilarity], result of:
            0.08591097 = score(doc=473,freq=1.0), product of:
              0.20761281 = queryWeight, product of:
                2.3996248 = boost
                4.413907 = idf(docFreq=1461, maxDocs=44421)
                0.019601425 = queryNorm
              0.4138038 = fieldWeight in 473, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.413907 = idf(docFreq=1461, maxDocs=44421)
                0.09375 = fieldNorm(doc=473)
          0.25038016 = weight(abstract_txt:discovery in 473) [ClassicSimilarity], result of:
            0.25038016 = score(doc=473,freq=2.0), product of:
              0.33621082 = queryWeight, product of:
                3.05367 = boost
                5.616968 = idf(docFreq=438, maxDocs=44421)
                0.019601425 = queryNorm
              0.7447118 = fieldWeight in 473, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.616968 = idf(docFreq=438, maxDocs=44421)
                0.09375 = fieldNorm(doc=473)
        0.2 = coord(5/25)
    
  5. Barrio, P.; Gravano, L.: Sampling strategies for information extraction over the deep web (2017) 0.09
    0.090200655 = sum of:
      0.090200655 = product of:
        0.37583607 = sum of:
          0.082067356 = weight(abstract_txt:expensive in 4412) [ClassicSimilarity], result of:
            0.082067356 = score(doc=4412,freq=2.0), product of:
              0.14422067 = queryWeight, product of:
                7.357662 = idf(docFreq=76, maxDocs=44421)
                0.019601425 = queryNorm
              0.5690402 = fieldWeight in 4412, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.357662 = idf(docFreq=76, maxDocs=44421)
                0.0546875 = fieldNorm(doc=4412)
          0.058340225 = weight(abstract_txt:richer in 4412) [ClassicSimilarity], result of:
            0.058340225 = score(doc=4412,freq=1.0), product of:
              0.14473358 = queryWeight, product of:
                1.0017767 = boost
                7.370734 = idf(docFreq=75, maxDocs=44421)
                0.019601425 = queryNorm
              0.40308702 = fieldWeight in 4412, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.370734 = idf(docFreq=75, maxDocs=44421)
                0.0546875 = fieldNorm(doc=4412)
          0.084338 = weight(abstract_txt:computationally in 4412) [ClassicSimilarity], result of:
            0.084338 = score(doc=4412,freq=1.0), product of:
              0.18504304 = queryWeight, product of:
                1.1327201 = boost
                8.334172 = idf(docFreq=28, maxDocs=44421)
                0.019601425 = queryNorm
              0.45577505 = fieldWeight in 4412, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.334172 = idf(docFreq=28, maxDocs=44421)
                0.0546875 = fieldNorm(doc=4412)
          0.07815957 = weight(abstract_txt:discover in 4412) [ClassicSimilarity], result of:
            0.07815957 = score(doc=4412,freq=1.0), product of:
              0.22160965 = queryWeight, product of:
                1.7530552 = boost
                6.449194 = idf(docFreq=190, maxDocs=44421)
                0.019601425 = queryNorm
              0.35269028 = fieldWeight in 4412, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.449194 = idf(docFreq=190, maxDocs=44421)
                0.0546875 = fieldNorm(doc=4412)
          0.0304391 = weight(abstract_txt:data in 4412) [ClassicSimilarity], result of:
            0.0304391 = score(doc=4412,freq=2.0), product of:
              0.11818302 = queryWeight, product of:
                1.8104802 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.019601425 = queryNorm
              0.257559 = fieldWeight in 4412, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.0546875 = fieldNorm(doc=4412)
          0.042491812 = weight(abstract_txt:approaches in 4412) [ClassicSimilarity], result of:
            0.042491812 = score(doc=4412,freq=1.0), product of:
              0.16897967 = queryWeight, product of:
                1.8748395 = boost
                4.5981455 = idf(docFreq=1215, maxDocs=44421)
                0.019601425 = queryNorm
              0.2514611 = fieldWeight in 4412, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5981455 = idf(docFreq=1215, maxDocs=44421)
                0.0546875 = fieldNorm(doc=4412)
        0.24 = coord(6/25)