Document (#27226)

Author
Srinivasan, P.
Title
Text mining : generating hypotheses from MEDLINE
Source
Journal of the American Society for Information Science and technology. 55(2004) no.5, S.396-413
Year
2004
Abstract
Hypothesis generation, a crucial initial step for making scientific discoveries, relies an prior knowledge, experience, and intuition. Chance connections made between seemingly distinct subareas sometimes turn out to be fruitful. The goal in text mining is to assist in this process by automatically discovering a small set of interesting hypotheses from a suitable text collection. In this report, we present open and closed text mining algorithms that are built within the discovery framework established by Swanson and Smalheiser. Our algorithms represent topics using metadata profiles. When applied to MEDLINE, these are McSH based profiles. We present experiments that demonstrate the effectiveness of our algorithms. Specifically, our algorithms successfully generate ranked term lists where the key terms representing novel relationships between topics are ranked high.
Theme
Data Mining
Field
Medizin
Object
Medline

Similar documents (author)

  1. Srinivasan, P.: Expert interface to Library of Congress Subject Headings (1990/91) 5.41
    5.4105906 = sum of:
      5.4105906 = weight(author_txt:srinivasan in 2208) [ClassicSimilarity], result of:
        5.4105906 = fieldWeight in 2208, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.656945 = idf(docFreq=20, maxDocs=44421)
          0.625 = fieldNorm(doc=2208)
    
  2. Srinivasan, P.: Query expansion and MEDLINE (1996) 5.41
    5.4105906 = sum of:
      5.4105906 = weight(author_txt:srinivasan in 67) [ClassicSimilarity], result of:
        5.4105906 = fieldWeight in 67, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.656945 = idf(docFreq=20, maxDocs=44421)
          0.625 = fieldNorm(doc=67)
    
  3. Srinivasan, P.: Intelligent information retrieval using rough set approximations (1989) 5.41
    5.4105906 = sum of:
      5.4105906 = weight(author_txt:srinivasan in 2594) [ClassicSimilarity], result of:
        5.4105906 = fieldWeight in 2594, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.656945 = idf(docFreq=20, maxDocs=44421)
          0.625 = fieldNorm(doc=2594)
    
  4. Srinivasan, P.: On generalizing the Two-Poisson Model (1990) 5.41
    5.4105906 = sum of:
      5.4105906 = weight(author_txt:srinivasan in 2948) [ClassicSimilarity], result of:
        5.4105906 = fieldWeight in 2948, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.656945 = idf(docFreq=20, maxDocs=44421)
          0.625 = fieldNorm(doc=2948)
    
  5. Srinivasan, P.: Optimal document-indexing vocabulary for MEDLINE (1996) 5.41
    5.4105906 = sum of:
      5.4105906 = weight(author_txt:srinivasan in 6702) [ClassicSimilarity], result of:
        5.4105906 = fieldWeight in 6702, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.656945 = idf(docFreq=20, maxDocs=44421)
          0.625 = fieldNorm(doc=6702)
    

Similar documents (content)

  1. Srinivasan, P.: Text mining in biomedicine : challenges and opportunities (2006) 0.30
    0.30467626 = sum of:
      0.30467626 = product of:
        1.0881295 = sum of:
          0.05747819 = weight(abstract_txt:connections in 2497) [ClassicSimilarity], result of:
            0.05747819 = score(doc=2497,freq=1.0), product of:
              0.114629254 = queryWeight, product of:
                6.418264 = idf(docFreq=196, maxDocs=44421)
                0.017859854 = queryNorm
              0.5014269 = fieldWeight in 2497, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.418264 = idf(docFreq=196, maxDocs=44421)
                0.078125 = fieldNorm(doc=2497)
          0.07548559 = weight(abstract_txt:chance in 2497) [ClassicSimilarity], result of:
            0.07548559 = score(doc=2497,freq=1.0), product of:
              0.13746837 = queryWeight, product of:
                1.0950997 = boost
                7.028639 = idf(docFreq=106, maxDocs=44421)
                0.017859854 = queryNorm
              0.54911244 = fieldWeight in 2497, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.028639 = idf(docFreq=106, maxDocs=44421)
                0.078125 = fieldNorm(doc=2497)
          0.17373405 = weight(abstract_txt:discoveries in 2497) [ClassicSimilarity], result of:
            0.17373405 = score(doc=2497,freq=2.0), product of:
              0.19019835 = queryWeight, product of:
                1.2881179 = boost
                8.267481 = idf(docFreq=30, maxDocs=44421)
                0.017859854 = queryNorm
              0.9134362 = fieldWeight in 2497, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.267481 = idf(docFreq=30, maxDocs=44421)
                0.078125 = fieldNorm(doc=2497)
          0.13923995 = weight(abstract_txt:profiles in 2497) [ClassicSimilarity], result of:
            0.13923995 = score(doc=2497,freq=1.0), product of:
              0.26050296 = queryWeight, product of:
                2.1319332 = boost
                6.8416553 = idf(docFreq=128, maxDocs=44421)
                0.017859854 = queryNorm
              0.5345043 = fieldWeight in 2497, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8416553 = idf(docFreq=128, maxDocs=44421)
                0.078125 = fieldNorm(doc=2497)
          0.15876669 = weight(abstract_txt:hypotheses in 2497) [ClassicSimilarity], result of:
            0.15876669 = score(doc=2497,freq=1.0), product of:
              0.2843215 = queryWeight, product of:
                2.227266 = boost
                7.1475906 = idf(docFreq=94, maxDocs=44421)
                0.017859854 = queryNorm
              0.5584055 = fieldWeight in 2497, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.1475906 = idf(docFreq=94, maxDocs=44421)
                0.078125 = fieldNorm(doc=2497)
          0.14054473 = weight(abstract_txt:text in 2497) [ClassicSimilarity], result of:
            0.14054473 = score(doc=2497,freq=6.0), product of:
              0.18174927 = queryWeight, product of:
                2.5183647 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.017859854 = queryNorm
              0.7732891 = fieldWeight in 2497, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.078125 = fieldNorm(doc=2497)
          0.34288028 = weight(abstract_txt:mining in 2497) [ClassicSimilarity], result of:
            0.34288028 = score(doc=2497,freq=5.0), product of:
              0.31800857 = queryWeight, product of:
                2.88491 = boost
                6.1720386 = idf(docFreq=251, maxDocs=44421)
                0.017859854 = queryNorm
              1.0782108 = fieldWeight in 2497, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.1720386 = idf(docFreq=251, maxDocs=44421)
                0.078125 = fieldNorm(doc=2497)
        0.28 = coord(7/25)
    
  2. Menczer, F.: Lexical and semantic clustering by Web links (2004) 0.15
    0.15463878 = sum of:
      0.15463878 = product of:
        0.6443283 = sum of:
          0.05747819 = weight(abstract_txt:connections in 4090) [ClassicSimilarity], result of:
            0.05747819 = score(doc=4090,freq=1.0), product of:
              0.114629254 = queryWeight, product of:
                6.418264 = idf(docFreq=196, maxDocs=44421)
                0.017859854 = queryNorm
              0.5014269 = fieldWeight in 4090, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.418264 = idf(docFreq=196, maxDocs=44421)
                0.078125 = fieldNorm(doc=4090)
          0.025412232 = weight(abstract_txt:between in 4090) [ClassicSimilarity], result of:
            0.025412232 = score(doc=4090,freq=2.0), product of:
              0.06652557 = queryWeight, product of:
                1.0773618 = boost
                3.4573963 = idf(docFreq=3804, maxDocs=44421)
                0.017859854 = queryNorm
              0.38199192 = fieldWeight in 4090, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4573963 = idf(docFreq=3804, maxDocs=44421)
                0.078125 = fieldNorm(doc=4090)
          0.035685975 = weight(abstract_txt:present in 4090) [ClassicSimilarity], result of:
            0.035685975 = score(doc=4090,freq=1.0), product of:
              0.105107844 = queryWeight, product of:
                1.3542063 = boost
                4.3458266 = idf(docFreq=1564, maxDocs=44421)
                0.017859854 = queryNorm
              0.3395177 = fieldWeight in 4090, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3458266 = idf(docFreq=1564, maxDocs=44421)
                0.078125 = fieldNorm(doc=4090)
          0.08114353 = weight(abstract_txt:text in 4090) [ClassicSimilarity], result of:
            0.08114353 = score(doc=4090,freq=2.0), product of:
              0.18174927 = queryWeight, product of:
                2.5183647 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.017859854 = queryNorm
              0.4464586 = fieldWeight in 4090, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.078125 = fieldNorm(doc=4090)
          0.21685651 = weight(abstract_txt:mining in 4090) [ClassicSimilarity], result of:
            0.21685651 = score(doc=4090,freq=2.0), product of:
              0.31800857 = queryWeight, product of:
                2.88491 = boost
                6.1720386 = idf(docFreq=251, maxDocs=44421)
                0.017859854 = queryNorm
              0.68192035 = fieldWeight in 4090, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.1720386 = idf(docFreq=251, maxDocs=44421)
                0.078125 = fieldNorm(doc=4090)
          0.22775187 = weight(abstract_txt:algorithms in 4090) [ClassicSimilarity], result of:
            0.22775187 = score(doc=4090,freq=2.0), product of:
              0.36164132 = queryWeight, product of:
                3.5523953 = boost
                5.7000527 = idf(docFreq=403, maxDocs=44421)
                0.017859854 = queryNorm
              0.6297728 = fieldWeight in 4090, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.7000527 = idf(docFreq=403, maxDocs=44421)
                0.078125 = fieldNorm(doc=4090)
        0.24 = coord(6/25)
    
  3. Lindsay, R.K.; Gordon, M.D.: Literature-based discovery by lexical statistics (1999) 0.13
    0.13262041 = sum of:
      0.13262041 = product of:
        0.66310203 = sum of:
          0.113800995 = weight(abstract_txt:connections in 4544) [ClassicSimilarity], result of:
            0.113800995 = score(doc=4544,freq=2.0), product of:
              0.114629254 = queryWeight, product of:
                6.418264 = idf(docFreq=196, maxDocs=44421)
                0.017859854 = queryNorm
              0.9927745 = fieldWeight in 4544, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.418264 = idf(docFreq=196, maxDocs=44421)
                0.109375 = fieldNorm(doc=4544)
          0.025156826 = weight(abstract_txt:between in 4544) [ClassicSimilarity], result of:
            0.025156826 = score(doc=4544,freq=1.0), product of:
              0.06652557 = queryWeight, product of:
                1.0773618 = boost
                3.4573963 = idf(docFreq=3804, maxDocs=44421)
                0.017859854 = queryNorm
              0.37815273 = fieldWeight in 4544, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4573963 = idf(docFreq=3804, maxDocs=44421)
                0.109375 = fieldNorm(doc=4544)
          0.2527011 = weight(abstract_txt:swanson in 4544) [ClassicSimilarity], result of:
            0.2527011 = score(doc=4544,freq=1.0), product of:
              0.24581751 = queryWeight, product of:
                1.4643965 = boost
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.017859854 = queryNorm
              1.0280029 = fieldWeight in 4544, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.109375 = fieldNorm(doc=4544)
          0.07973959 = weight(abstract_txt:topics in 4544) [ClassicSimilarity], result of:
            0.07973959 = score(doc=4544,freq=1.0), product of:
              0.14354919 = queryWeight, product of:
                1.5825871 = boost
                5.078731 = idf(docFreq=751, maxDocs=44421)
                0.017859854 = queryNorm
              0.5554862 = fieldWeight in 4544, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.078731 = idf(docFreq=751, maxDocs=44421)
                0.109375 = fieldNorm(doc=4544)
          0.19170351 = weight(abstract_txt:medline in 4544) [ClassicSimilarity], result of:
            0.19170351 = score(doc=4544,freq=1.0), product of:
              0.25761518 = queryWeight, product of:
                2.1200836 = boost
                6.803628 = idf(docFreq=133, maxDocs=44421)
                0.017859854 = queryNorm
              0.7441468 = fieldWeight in 4544, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.803628 = idf(docFreq=133, maxDocs=44421)
                0.109375 = fieldNorm(doc=4544)
        0.2 = coord(5/25)
    
  4. Liu, B.: Web data mining : exploring hyperlinks, contents, and usage data (2011) 0.13
    0.12567513 = sum of:
      0.12567513 = product of:
        0.78546953 = sum of:
          0.045565482 = weight(abstract_txt:topics in 1354) [ClassicSimilarity], result of:
            0.045565482 = score(doc=1354,freq=1.0), product of:
              0.14354919 = queryWeight, product of:
                1.5825871 = boost
                5.078731 = idf(docFreq=751, maxDocs=44421)
                0.017859854 = queryNorm
              0.3174207 = fieldWeight in 1354, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.078731 = idf(docFreq=751, maxDocs=44421)
                0.0625 = fieldNorm(doc=1354)
          0.091803424 = weight(abstract_txt:text in 1354) [ClassicSimilarity], result of:
            0.091803424 = score(doc=1354,freq=4.0), product of:
              0.18174927 = queryWeight, product of:
                2.5183647 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.017859854 = queryNorm
              0.50511026 = fieldWeight in 1354, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0625 = fieldNorm(doc=1354)
          0.42495024 = weight(abstract_txt:mining in 1354) [ClassicSimilarity], result of:
            0.42495024 = score(doc=1354,freq=12.0), product of:
              0.31800857 = queryWeight, product of:
                2.88491 = boost
                6.1720386 = idf(docFreq=251, maxDocs=44421)
                0.017859854 = queryNorm
              1.3362855 = fieldWeight in 1354, product of:
                3.4641016 = tf(freq=12.0), with freq of:
                  12.0 = termFreq=12.0
                6.1720386 = idf(docFreq=251, maxDocs=44421)
                0.0625 = fieldNorm(doc=1354)
          0.22315034 = weight(abstract_txt:algorithms in 1354) [ClassicSimilarity], result of:
            0.22315034 = score(doc=1354,freq=3.0), product of:
              0.36164132 = queryWeight, product of:
                3.5523953 = boost
                5.7000527 = idf(docFreq=403, maxDocs=44421)
                0.017859854 = queryNorm
              0.6170488 = fieldWeight in 1354, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.7000527 = idf(docFreq=403, maxDocs=44421)
                0.0625 = fieldNorm(doc=1354)
        0.16 = coord(4/25)
    
  5. Weeber, M.; Klein, H.; Jong-van den Berg, L.T.W. de; Vos, R.: Using concepts in literature-based discovery : simulating Swanson's Raynaud-Fish Oil and Migraine-Manesium discoveries (2001) 0.11
    0.113817975 = sum of:
      0.113817975 = product of:
        0.71136236 = sum of:
          0.07790719 = weight(abstract_txt:successfully in 6910) [ClassicSimilarity], result of:
            0.07790719 = score(doc=6910,freq=1.0), product of:
              0.12432476 = queryWeight, product of:
                1.0414324 = boost
                6.684188 = idf(docFreq=150, maxDocs=44421)
                0.017859854 = queryNorm
              0.6266426 = fieldWeight in 6910, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.684188 = idf(docFreq=150, maxDocs=44421)
                0.09375 = fieldNorm(doc=6910)
          0.14741823 = weight(abstract_txt:discoveries in 6910) [ClassicSimilarity], result of:
            0.14741823 = score(doc=6910,freq=1.0), product of:
              0.19019835 = queryWeight, product of:
                1.2881179 = boost
                8.267481 = idf(docFreq=30, maxDocs=44421)
                0.017859854 = queryNorm
              0.7750763 = fieldWeight in 6910, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.267481 = idf(docFreq=30, maxDocs=44421)
                0.09375 = fieldNorm(doc=6910)
          0.21660092 = weight(abstract_txt:swanson in 6910) [ClassicSimilarity], result of:
            0.21660092 = score(doc=6910,freq=1.0), product of:
              0.24581751 = queryWeight, product of:
                1.4643965 = boost
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.017859854 = queryNorm
              0.88114524 = fieldWeight in 6910, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.09375 = fieldNorm(doc=6910)
          0.269436 = weight(abstract_txt:hypotheses in 6910) [ClassicSimilarity], result of:
            0.269436 = score(doc=6910,freq=2.0), product of:
              0.2843215 = queryWeight, product of:
                2.227266 = boost
                7.1475906 = idf(docFreq=94, maxDocs=44421)
                0.017859854 = queryNorm
              0.94764555 = fieldWeight in 6910, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.1475906 = idf(docFreq=94, maxDocs=44421)
                0.09375 = fieldNorm(doc=6910)
        0.16 = coord(4/25)