Document (#40928)

Author
Wang, P.
Hao, T.
Yan, J.
Jin, L.
Title
Large-scale extraction of drug-disease pairs from the medical literature
Source
Journal of the Association for Information Science and Technology. 68(2017) no.11, S.2649-2661
Year
2017
Abstract
Automatic extraction of large-scale and accurate drug-disease pairs from the medical literature plays an important role for drug repurposing. However, many existing extraction methods are mainly in a supervised manner. It is costly and time-consuming to manually label drug-disease pairs datasets. There are many drug-disease pairs buried in free text. In this work, we first leverage a pattern-based method to automatically extract drug-disease pairs with treatment and inducement relationships from free text. Then, to reflect a drug-disease relation, a network embedding algorithm is proposed to calculate the degree of correlation of a drug-disease pair. In the experiments, we use the method to extract treatment and inducement drug-disease pairs from 27 million medical abstracts and titles available on PubMed. We extract 138,318 unique treatment pairs and 75,396 unique inducement pairs. Our algorithm achieves a precision of 0.912 and a recall of 0.898 in extracting the frequent treatment drug-disease pairs, and a precision of 0.923 and a recall of 0.833 in extracting the frequent inducement drug-disease pairs. Besides, our proposed information network embedding algorithm can efficiently reflect the degree of correlation of drug-disease pairs. Our algorithm can achieve a precision of 0.802, a recall of 0.783 in the fine-grained evaluation of extracting frequent pairs.
Content
Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23876/full.
Footnote
Beitrag in einem Special issue on biomedical information retrieval.
Field
Medizin

Similar documents (author)

  1. Wang, H.; Wang, C.: Ontologies for universal information systems (1995) 4.62
    4.6221313 = sum of:
      4.6221313 = weight(author_txt:wang in 3262) [ClassicSimilarity], result of:
        4.6221313 = score(doc=3262,freq=2.0), product of:
          0.99999994 = queryWeight, product of:
            6.5366817 = idf(docFreq=174, maxDocs=44421)
            0.15298282 = queryNorm
          4.622132 = fieldWeight in 3262, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            6.5366817 = idf(docFreq=174, maxDocs=44421)
            0.5 = fieldNorm(doc=3262)
    
  2. Wang, F.; Wang, X.: Tracing theory diffusion : a text mining and citation-based analysis of TAM (2020) 4.62
    4.6221313 = sum of:
      4.6221313 = weight(author_txt:wang in 980) [ClassicSimilarity], result of:
        4.6221313 = score(doc=980,freq=2.0), product of:
          0.99999994 = queryWeight, product of:
            6.5366817 = idf(docFreq=174, maxDocs=44421)
            0.15298282 = queryNorm
          4.622132 = fieldWeight in 980, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            6.5366817 = idf(docFreq=174, maxDocs=44421)
            0.5 = fieldNorm(doc=980)
    
  3. Wang, C.: ¬The online catalogue, subject access and user reactions : a review (1985) 4.09
    4.0854254 = sum of:
      4.0854254 = weight(author_txt:wang in 985) [ClassicSimilarity], result of:
        4.0854254 = score(doc=985,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            6.5366817 = idf(docFreq=174, maxDocs=44421)
            0.15298282 = queryNorm
          4.085426 = fieldWeight in 985, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            6.5366817 = idf(docFreq=174, maxDocs=44421)
            0.625 = fieldNorm(doc=985)
    
  4. Wang, C.: Bibliometrics : a textbook (1990) 4.09
    4.0854254 = sum of:
      4.0854254 = weight(author_txt:wang in 5108) [ClassicSimilarity], result of:
        4.0854254 = score(doc=5108,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            6.5366817 = idf(docFreq=174, maxDocs=44421)
            0.15298282 = queryNorm
          4.085426 = fieldWeight in 5108, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            6.5366817 = idf(docFreq=174, maxDocs=44421)
            0.625 = fieldNorm(doc=5108)
    
  5. Wang, P.: Users' information needs at different stages of a research project : a cognitive view (1997) 4.09
    4.0854254 = sum of:
      4.0854254 = weight(author_txt:wang in 1320) [ClassicSimilarity], result of:
        4.0854254 = score(doc=1320,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            6.5366817 = idf(docFreq=174, maxDocs=44421)
            0.15298282 = queryNorm
          4.085426 = fieldWeight in 1320, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            6.5366817 = idf(docFreq=174, maxDocs=44421)
            0.625 = fieldNorm(doc=1320)
    

Similar documents (content)

  1. Song, M.; Kang, K.; An, J.Y.: Investigating drug-disease interactions in drug-symptom-disease triples via citation relations (2018) 0.35
    0.35402545 = sum of:
      0.35402545 = product of:
        1.7701273 = sum of:
          0.0040039658 = weight(abstract_txt:from in 545) [ClassicSimilarity], result of:
            0.0040039658 = score(doc=545,freq=1.0), product of:
              0.02321645 = queryWeight, product of:
                1.2267249 = boost
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.006858579 = queryNorm
              0.17246243 = fieldWeight in 545, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.0625 = fieldNorm(doc=545)
          0.06729301 = weight(abstract_txt:extracting in 545) [ClassicSimilarity], result of:
            0.06729301 = score(doc=545,freq=2.0), product of:
              0.10984813 = queryWeight, product of:
                2.310874 = boost
                6.930783 = idf(docFreq=117, maxDocs=44421)
                0.006858579 = queryNorm
              0.61260045 = fieldWeight in 545, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.930783 = idf(docFreq=117, maxDocs=44421)
                0.0625 = fieldNorm(doc=545)
          0.35774994 = weight(abstract_txt:pairs in 545) [ClassicSimilarity], result of:
            0.35774994 = score(doc=545,freq=4.0), product of:
              0.42157584 = queryWeight, product of:
                9.054152 = boost
                6.7888126 = idf(docFreq=135, maxDocs=44421)
                0.006858579 = queryNorm
              0.8486016 = fieldWeight in 545, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.7888126 = idf(docFreq=135, maxDocs=44421)
                0.0625 = fieldNorm(doc=545)
          0.5375852 = weight(abstract_txt:disease in 545) [ClassicSimilarity], result of:
            0.5375852 = score(doc=545,freq=5.0), product of:
              0.49875587 = queryWeight, product of:
                9.428869 = boost
                7.7124834 = idf(docFreq=53, maxDocs=44421)
                0.006858579 = queryNorm
              1.0778524 = fieldWeight in 545, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.7124834 = idf(docFreq=53, maxDocs=44421)
                0.0625 = fieldNorm(doc=545)
          0.80349517 = weight(abstract_txt:drug in 545) [ClassicSimilarity], result of:
            0.80349517 = score(doc=545,freq=5.0), product of:
              0.6711839 = queryWeight, product of:
                11.424328 = boost
                8.565973 = idf(docFreq=22, maxDocs=44421)
                0.006858579 = queryNorm
              1.1971312 = fieldWeight in 545, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                8.565973 = idf(docFreq=22, maxDocs=44421)
                0.0625 = fieldNorm(doc=545)
        0.2 = coord(5/25)
    
  2. Liu, R.-L.: ¬A passage extractor for classification of disease aspect information (2013) 0.18
    0.18460707 = sum of:
      0.18460707 = product of:
        0.76919615 = sum of:
          0.0040039658 = weight(abstract_txt:from in 2107) [ClassicSimilarity], result of:
            0.0040039658 = score(doc=2107,freq=1.0), product of:
              0.02321645 = queryWeight, product of:
                1.2267249 = boost
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.006858579 = queryNorm
              0.17246243 = fieldWeight in 2107, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.0625 = fieldNorm(doc=2107)
          0.048546452 = weight(abstract_txt:medical in 2107) [ClassicSimilarity], result of:
            0.048546452 = score(doc=2107,freq=3.0), product of:
              0.07718874 = queryWeight, product of:
                1.9371215 = boost
                5.8098235 = idf(docFreq=361, maxDocs=44421)
                0.006858579 = queryNorm
              0.6289318 = fieldWeight in 2107, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.8098235 = idf(docFreq=361, maxDocs=44421)
                0.0625 = fieldNorm(doc=2107)
          0.033932634 = weight(abstract_txt:extraction in 2107) [ClassicSimilarity], result of:
            0.033932634 = score(doc=2107,freq=1.0), product of:
              0.08768011 = queryWeight, product of:
                2.064574 = boost
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.006858579 = queryNorm
              0.38700494 = fieldWeight in 2107, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.0625 = fieldNorm(doc=2107)
          0.04100677 = weight(abstract_txt:extract in 2107) [ClassicSimilarity], result of:
            0.04100677 = score(doc=2107,freq=1.0), product of:
              0.09947785 = queryWeight, product of:
                2.1990905 = boost
                6.595522 = idf(docFreq=164, maxDocs=44421)
                0.006858579 = queryNorm
              0.41222012 = fieldWeight in 2107, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.595522 = idf(docFreq=164, maxDocs=44421)
                0.0625 = fieldNorm(doc=2107)
          0.05281125 = weight(abstract_txt:treatment in 2107) [ClassicSimilarity], result of:
            0.05281125 = score(doc=2107,freq=1.0), product of:
              0.12960444 = queryWeight, product of:
                2.898406 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.006858579 = queryNorm
              0.40748024 = fieldWeight in 2107, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.0625 = fieldNorm(doc=2107)
          0.5888951 = weight(abstract_txt:disease in 2107) [ClassicSimilarity], result of:
            0.5888951 = score(doc=2107,freq=6.0), product of:
              0.49875587 = queryWeight, product of:
                9.428869 = boost
                7.7124834 = idf(docFreq=53, maxDocs=44421)
                0.006858579 = queryNorm
              1.1807281 = fieldWeight in 2107, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.7124834 = idf(docFreq=53, maxDocs=44421)
                0.0625 = fieldNorm(doc=2107)
        0.24 = coord(6/25)
    
  3. Lee, C.-H.; Khoo, C.; Na, J.-C.: Automatic identification of treatment relations for medical ontology learning : an exploratory study (2004) 0.18
    0.18325226 = sum of:
      0.18325226 = product of:
        0.7635511 = sum of:
          0.012269413 = weight(abstract_txt:method in 3661) [ClassicSimilarity], result of:
            0.012269413 = score(doc=3661,freq=2.0), product of:
              0.03085542 = queryWeight, product of:
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.006858579 = queryNorm
              0.39764208 = fieldWeight in 3661, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.0625 = fieldNorm(doc=3661)
          0.0040039658 = weight(abstract_txt:from in 3661) [ClassicSimilarity], result of:
            0.0040039658 = score(doc=3661,freq=1.0), product of:
              0.02321645 = queryWeight, product of:
                1.2267249 = boost
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.006858579 = queryNorm
              0.17246243 = fieldWeight in 3661, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.0625 = fieldNorm(doc=3661)
          0.05605662 = weight(abstract_txt:medical in 3661) [ClassicSimilarity], result of:
            0.05605662 = score(doc=3661,freq=4.0), product of:
              0.07718874 = queryWeight, product of:
                1.9371215 = boost
                5.8098235 = idf(docFreq=361, maxDocs=44421)
                0.006858579 = queryNorm
              0.72622794 = fieldWeight in 3661, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.8098235 = idf(docFreq=361, maxDocs=44421)
                0.0625 = fieldNorm(doc=3661)
          0.09147176 = weight(abstract_txt:treatment in 3661) [ClassicSimilarity], result of:
            0.09147176 = score(doc=3661,freq=3.0), product of:
              0.12960444 = queryWeight, product of:
                2.898406 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.006858579 = queryNorm
              0.70577645 = fieldWeight in 3661, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.0625 = fieldNorm(doc=3661)
          0.2404154 = weight(abstract_txt:disease in 3661) [ClassicSimilarity], result of:
            0.2404154 = score(doc=3661,freq=1.0), product of:
              0.49875587 = queryWeight, product of:
                9.428869 = boost
                7.7124834 = idf(docFreq=53, maxDocs=44421)
                0.006858579 = queryNorm
              0.4820302 = fieldWeight in 3661, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7124834 = idf(docFreq=53, maxDocs=44421)
                0.0625 = fieldNorm(doc=3661)
          0.35933396 = weight(abstract_txt:drug in 3661) [ClassicSimilarity], result of:
            0.35933396 = score(doc=3661,freq=1.0), product of:
              0.6711839 = queryWeight, product of:
                11.424328 = boost
                8.565973 = idf(docFreq=22, maxDocs=44421)
                0.006858579 = queryNorm
              0.53537333 = fieldWeight in 3661, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.565973 = idf(docFreq=22, maxDocs=44421)
                0.0625 = fieldNorm(doc=3661)
        0.24 = coord(6/25)
    
  4. Tsuji, K.; Kageura, K.: Automatic generation of Japanese-English bilingual thesauri based on bilingual corpora (2006) 0.12
    0.1249217 = sum of:
      0.1249217 = product of:
        0.5205071 = sum of:
          0.01735157 = weight(abstract_txt:method in 61) [ClassicSimilarity], result of:
            0.01735157 = score(doc=61,freq=4.0), product of:
              0.03085542 = queryWeight, product of:
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.006858579 = queryNorm
              0.5623508 = fieldWeight in 61, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.0625 = fieldNorm(doc=61)
          0.013185211 = weight(abstract_txt:proposed in 61) [ClassicSimilarity], result of:
            0.013185211 = score(doc=61,freq=2.0), product of:
              0.032372307 = queryWeight, product of:
                1.0242857 = boost
                4.608063 = idf(docFreq=1203, maxDocs=44421)
                0.006858579 = queryNorm
              0.4072991 = fieldWeight in 61, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.608063 = idf(docFreq=1203, maxDocs=44421)
                0.0625 = fieldNorm(doc=61)
          0.0069350717 = weight(abstract_txt:from in 61) [ClassicSimilarity], result of:
            0.0069350717 = score(doc=61,freq=3.0), product of:
              0.02321645 = queryWeight, product of:
                1.2267249 = boost
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.006858579 = queryNorm
              0.29871368 = fieldWeight in 61, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.0625 = fieldNorm(doc=61)
          0.05799233 = weight(abstract_txt:extract in 61) [ClassicSimilarity], result of:
            0.05799233 = score(doc=61,freq=2.0), product of:
              0.09947785 = queryWeight, product of:
                2.1990905 = boost
                6.595522 = idf(docFreq=164, maxDocs=44421)
                0.006858579 = queryNorm
              0.5829673 = fieldWeight in 61, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.595522 = idf(docFreq=164, maxDocs=44421)
                0.0625 = fieldNorm(doc=61)
          0.06729301 = weight(abstract_txt:extracting in 61) [ClassicSimilarity], result of:
            0.06729301 = score(doc=61,freq=2.0), product of:
              0.10984813 = queryWeight, product of:
                2.310874 = boost
                6.930783 = idf(docFreq=117, maxDocs=44421)
                0.006858579 = queryNorm
              0.61260045 = fieldWeight in 61, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.930783 = idf(docFreq=117, maxDocs=44421)
                0.0625 = fieldNorm(doc=61)
          0.35774994 = weight(abstract_txt:pairs in 61) [ClassicSimilarity], result of:
            0.35774994 = score(doc=61,freq=4.0), product of:
              0.42157584 = queryWeight, product of:
                9.054152 = boost
                6.7888126 = idf(docFreq=135, maxDocs=44421)
                0.006858579 = queryNorm
              0.8486016 = fieldWeight in 61, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.7888126 = idf(docFreq=135, maxDocs=44421)
                0.0625 = fieldNorm(doc=61)
        0.24 = coord(6/25)
    
  5. Naing, M.-M.; Lim, E.-P.; Chiang, R.H.L.: Extracting link chains of relationship instances from a Web site (2006) 0.11
    0.11326195 = sum of:
      0.11326195 = product of:
        0.40450698 = sum of:
          0.015336767 = weight(abstract_txt:method in 111) [ClassicSimilarity], result of:
            0.015336767 = score(doc=111,freq=2.0), product of:
              0.03085542 = queryWeight, product of:
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.006858579 = queryNorm
              0.4970526 = fieldWeight in 111, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.078125 = fieldNorm(doc=111)
          0.01165419 = weight(abstract_txt:proposed in 111) [ClassicSimilarity], result of:
            0.01165419 = score(doc=111,freq=1.0), product of:
              0.032372307 = queryWeight, product of:
                1.0242857 = boost
                4.608063 = idf(docFreq=1203, maxDocs=44421)
                0.006858579 = queryNorm
              0.36000493 = fieldWeight in 111, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.608063 = idf(docFreq=1203, maxDocs=44421)
                0.078125 = fieldNorm(doc=111)
          0.0050049573 = weight(abstract_txt:from in 111) [ClassicSimilarity], result of:
            0.0050049573 = score(doc=111,freq=1.0), product of:
              0.02321645 = queryWeight, product of:
                1.2267249 = boost
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.006858579 = queryNorm
              0.21557805 = fieldWeight in 111, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.078125 = fieldNorm(doc=111)
          0.030106945 = weight(abstract_txt:precision in 111) [ClassicSimilarity], result of:
            0.030106945 = score(doc=111,freq=1.0), product of:
              0.069768675 = queryWeight, product of:
                1.8416629 = boost
                5.5235233 = idf(docFreq=481, maxDocs=44421)
                0.006858579 = queryNorm
              0.43152526 = fieldWeight in 111, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5235233 = idf(docFreq=481, maxDocs=44421)
                0.078125 = fieldNorm(doc=111)
          0.033978842 = weight(abstract_txt:recall in 111) [ClassicSimilarity], result of:
            0.033978842 = score(doc=111,freq=1.0), product of:
              0.07562901 = queryWeight, product of:
                1.9174503 = boost
                5.750825 = idf(docFreq=383, maxDocs=44421)
                0.006858579 = queryNorm
              0.44928318 = fieldWeight in 111, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.750825 = idf(docFreq=383, maxDocs=44421)
                0.078125 = fieldNorm(doc=111)
          0.08483159 = weight(abstract_txt:extraction in 111) [ClassicSimilarity], result of:
            0.08483159 = score(doc=111,freq=4.0), product of:
              0.08768011 = queryWeight, product of:
                2.064574 = boost
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.006858579 = queryNorm
              0.96751237 = fieldWeight in 111, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.078125 = fieldNorm(doc=111)
          0.2235937 = weight(abstract_txt:pairs in 111) [ClassicSimilarity], result of:
            0.2235937 = score(doc=111,freq=1.0), product of:
              0.42157584 = queryWeight, product of:
                9.054152 = boost
                6.7888126 = idf(docFreq=135, maxDocs=44421)
                0.006858579 = queryNorm
              0.53037596 = fieldWeight in 111, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7888126 = idf(docFreq=135, maxDocs=44421)
                0.078125 = fieldNorm(doc=111)
        0.28 = coord(7/25)