Document (#40921)

Author
Kholghi, M.
Vine, L.D.
Sitbon, L.
Zuccon, G.
Nguyen, A.
Title
Clinical information extraction using small data : an active learning approach based on sequence representations and word embeddings
Source
Journal of the Association for Information Science and Technology. 68(2017) no.11, S.2543-2556
Year
2017
Abstract
This article demonstrates the benefits of using sequence representations based on word embeddings to inform the seed selection and sample selection processes in an active learning pipeline for clinical information extraction. Seed selection refers to choosing an initial sample set to label to form an initial learning model. Sample selection refers to selecting informative samples to update the model at each iteration of the active learning process. Compared to supervised machine learning approaches, active learning offers the opportunity to build statistical classifiers with a reduced amount of training samples that require manual annotation. Reducing the manual annotation effort can support automating the clinical information extraction process. This is particularly beneficial in the clinical domain, where manual annotation is a time-consuming and costly task, as it requires extensive labor from clinical experts. Our empirical findings demonstrate that (a) using sequence representations along with the length of sequence for seed selection shows potential towards more effective initial models, and (b) using sequence representations for sample selection leads to significantly lower manual annotation efforts, with up to 3% and 6% fewer tokens and concepts requiring annotation, respectively, compared to state-of-the-art query strategies.
Content
Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23936/full.
Footnote
Beitrag in einem Special issue on biomedical information retrieval.
Field
Medizin

Similar documents (author)

  1. Koopman, B.; Zuccon, G.; Bruza, P.; Nguyen, A.: What makes an effective clinical query and querier? (2017) 4.23
    4.2263775 = sum of:
      4.2263775 = sum of:
        1.8842409 = weight(author_txt:nguyen in 4922) [ClassicSimilarity], result of:
          1.8842409 = score(doc=4922,freq=1.0), product of:
            0.65421045 = queryWeight, product of:
              9.216561 = idf(docFreq=11, maxDocs=44421)
              0.070982054 = queryNorm
            2.8801754 = fieldWeight in 4922, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              9.216561 = idf(docFreq=11, maxDocs=44421)
              0.3125 = fieldNorm(doc=4922)
        2.3421366 = weight(author_txt:zuccon in 4922) [ClassicSimilarity], result of:
          2.3421366 = score(doc=4922,freq=1.0), product of:
            0.7563126 = queryWeight, product of:
              1.0752066 = boost
              9.909708 = idf(docFreq=5, maxDocs=44421)
              0.070982054 = queryNorm
            3.0967836 = fieldWeight in 4922, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              9.909708 = idf(docFreq=5, maxDocs=44421)
              0.3125 = fieldNorm(doc=4922)
    
  2. Nguyen, P.H.P.; Kaneiwa, K.; Nguyen, M.-Q.: Ontology inferencing rules and operations in conceptual structure theory (2010) 1.60
    1.5988312 = sum of:
      1.5988312 = product of:
        3.1976624 = sum of:
          3.1976624 = weight(author_txt:nguyen in 421) [ClassicSimilarity], result of:
            3.1976624 = score(doc=421,freq=2.0), product of:
              0.65421045 = queryWeight, product of:
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.070982054 = queryNorm
              4.8878193 = fieldWeight in 421, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.375 = fieldNorm(doc=421)
        0.5 = coord(1/2)
    
  3. Zhang, J.; Nguyen, T.: WebStar: a visualization model for hyperlink structures (2005) 1.51
    1.5073926 = sum of:
      1.5073926 = product of:
        3.0147853 = sum of:
          3.0147853 = weight(author_txt:nguyen in 2056) [ClassicSimilarity], result of:
            3.0147853 = score(doc=2056,freq=1.0), product of:
              0.65421045 = queryWeight, product of:
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.070982054 = queryNorm
              4.6082807 = fieldWeight in 2056, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.5 = fieldNorm(doc=2056)
        0.5 = coord(1/2)
    
  4. Nguyen-Kim, M.T.: ¬Die kleinste gemeinsame Wirklichkeit : wahr, falsch, plausibel? : die größten Streitfragen wissenschaftlich geprüft (2021) 1.51
    1.5073926 = sum of:
      1.5073926 = product of:
        3.0147853 = sum of:
          3.0147853 = weight(author_txt:nguyen in 1236) [ClassicSimilarity], result of:
            3.0147853 = score(doc=1236,freq=1.0), product of:
              0.65421045 = queryWeight, product of:
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.070982054 = queryNorm
              4.6082807 = fieldWeight in 1236, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.5 = fieldNorm(doc=1236)
        0.5 = coord(1/2)
    
  5. Nguyen, S.-H.; Chowdhury, G.: Interpreting the knowledge map of digital library research (1990-2010) (2013) 1.32
    1.3189687 = sum of:
      1.3189687 = product of:
        2.6379373 = sum of:
          2.6379373 = weight(author_txt:nguyen in 1958) [ClassicSimilarity], result of:
            2.6379373 = score(doc=1958,freq=1.0), product of:
              0.65421045 = queryWeight, product of:
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.070982054 = queryNorm
              4.0322456 = fieldWeight in 1958, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.4375 = fieldNorm(doc=1958)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Li, J.; Zhang, Z.; Li, X.; Chen, H.: Kernel-based learning for biomedical relation extraction (2008) 0.14
    0.1367674 = sum of:
      0.1367674 = product of:
        0.5698642 = sum of:
          0.015045684 = weight(abstract_txt:process in 2611) [ClassicSimilarity], result of:
            0.015045684 = score(doc=2611,freq=1.0), product of:
              0.04756445 = queryWeight, product of:
                1.0474272 = boost
                4.048922 = idf(docFreq=2105, maxDocs=44421)
                0.011215515 = queryNorm
              0.31632203 = fieldWeight in 2611, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.048922 = idf(docFreq=2105, maxDocs=44421)
                0.078125 = fieldNorm(doc=2611)
          0.03645255 = weight(abstract_txt:word in 2611) [ClassicSimilarity], result of:
            0.03645255 = score(doc=2611,freq=1.0), product of:
              0.08580116 = queryWeight, product of:
                1.4067897 = boost
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.011215515 = queryNorm
              0.42484915 = fieldWeight in 2611, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.078125 = fieldNorm(doc=2611)
          0.13981542 = weight(abstract_txt:extraction in 2611) [ClassicSimilarity], result of:
            0.13981542 = score(doc=2611,freq=3.0), product of:
              0.16686602 = queryWeight, product of:
                2.4027698 = boost
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.011215515 = queryNorm
              0.83789027 = fieldWeight in 2611, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.078125 = fieldNorm(doc=2611)
          0.09574738 = weight(abstract_txt:manual in 2611) [ClassicSimilarity], result of:
            0.09574738 = score(doc=2611,freq=1.0), product of:
              0.20579547 = queryWeight, product of:
                3.081169 = boost
                5.9552646 = idf(docFreq=312, maxDocs=44421)
                0.011215515 = queryNorm
              0.46525505 = fieldWeight in 2611, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9552646 = idf(docFreq=312, maxDocs=44421)
                0.078125 = fieldNorm(doc=2611)
          0.102549516 = weight(abstract_txt:learning in 2611) [ClassicSimilarity], result of:
            0.102549516 = score(doc=2611,freq=2.0), product of:
              0.1957315 = queryWeight, product of:
                3.6802185 = boost
                4.7420692 = idf(docFreq=1052, maxDocs=44421)
                0.011215515 = queryNorm
              0.52392954 = fieldWeight in 2611, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.7420692 = idf(docFreq=1052, maxDocs=44421)
                0.078125 = fieldNorm(doc=2611)
          0.18025367 = weight(abstract_txt:sequence in 2611) [ClassicSimilarity], result of:
            0.18025367 = score(doc=2611,freq=1.0), product of:
              0.33799526 = queryWeight, product of:
                4.414768 = boost
                6.82627 = idf(docFreq=130, maxDocs=44421)
                0.011215515 = queryNorm
              0.53330237 = fieldWeight in 2611, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.82627 = idf(docFreq=130, maxDocs=44421)
                0.078125 = fieldNorm(doc=2611)
        0.24 = coord(6/25)
    
  2. Du, C.; Cohoon, J.; Lopez, P.; Howison, J.: Softcite dataset : a dataset of software mentions in biomedical and economic research publications (2021) 0.12
    0.1176119 = sum of:
      0.1176119 = product of:
        0.5880595 = sum of:
          0.01805482 = weight(abstract_txt:process in 1263) [ClassicSimilarity], result of:
            0.01805482 = score(doc=1263,freq=1.0), product of:
              0.04756445 = queryWeight, product of:
                1.0474272 = boost
                4.048922 = idf(docFreq=2105, maxDocs=44421)
                0.011215515 = queryNorm
              0.37958646 = fieldWeight in 1263, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.048922 = idf(docFreq=2105, maxDocs=44421)
                0.09375 = fieldNorm(doc=1263)
          0.09686696 = weight(abstract_txt:extraction in 1263) [ClassicSimilarity], result of:
            0.09686696 = score(doc=1263,freq=1.0), product of:
              0.16686602 = queryWeight, product of:
                2.4027698 = boost
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.011215515 = queryNorm
              0.5805074 = fieldWeight in 1263, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.09375 = fieldNorm(doc=1263)
          0.11489685 = weight(abstract_txt:manual in 1263) [ClassicSimilarity], result of:
            0.11489685 = score(doc=1263,freq=1.0), product of:
              0.20579547 = queryWeight, product of:
                3.081169 = boost
                5.9552646 = idf(docFreq=312, maxDocs=44421)
                0.011215515 = queryNorm
              0.55830604 = fieldWeight in 1263, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9552646 = idf(docFreq=312, maxDocs=44421)
                0.09375 = fieldNorm(doc=1263)
          0.12305942 = weight(abstract_txt:learning in 1263) [ClassicSimilarity], result of:
            0.12305942 = score(doc=1263,freq=2.0), product of:
              0.1957315 = queryWeight, product of:
                3.6802185 = boost
                4.7420692 = idf(docFreq=1052, maxDocs=44421)
                0.011215515 = queryNorm
              0.62871546 = fieldWeight in 1263, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.7420692 = idf(docFreq=1052, maxDocs=44421)
                0.09375 = fieldNorm(doc=1263)
          0.23518147 = weight(abstract_txt:annotation in 1263) [ClassicSimilarity], result of:
            0.23518147 = score(doc=1263,freq=1.0), product of:
              0.35738456 = queryWeight, product of:
                4.5396304 = boost
                7.019336 = idf(docFreq=107, maxDocs=44421)
                0.011215515 = queryNorm
              0.65806276 = fieldWeight in 1263, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.019336 = idf(docFreq=107, maxDocs=44421)
                0.09375 = fieldNorm(doc=1263)
        0.2 = coord(5/25)
    
  3. Billal, B.; Fonseca, A.; Sadat, F.; Lounis, H.: Semi-supervised learning and social media text analysis towards multi-labeling categorization (2017) 0.11
    0.11022205 = sum of:
      0.11022205 = product of:
        0.39365017 = sum of:
          0.010024238 = weight(abstract_txt:model in 95) [ClassicSimilarity], result of:
            0.010024238 = score(doc=95,freq=1.0), product of:
              0.046023197 = queryWeight, product of:
                1.0303173 = boost
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.011215515 = queryNorm
              0.2178084 = fieldWeight in 95, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.0546875 = fieldNorm(doc=95)
          0.028207177 = weight(abstract_txt:compared in 95) [ClassicSimilarity], result of:
            0.028207177 = score(doc=95,freq=2.0), product of:
              0.07280686 = queryWeight, product of:
                1.2958918 = boost
                5.0093837 = idf(docFreq=805, maxDocs=44421)
                0.011215515 = queryNorm
              0.38742474 = fieldWeight in 95, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.0093837 = idf(docFreq=805, maxDocs=44421)
                0.0546875 = fieldNorm(doc=95)
          0.025516788 = weight(abstract_txt:word in 95) [ClassicSimilarity], result of:
            0.025516788 = score(doc=95,freq=1.0), product of:
              0.08580116 = queryWeight, product of:
                1.4067897 = boost
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.011215515 = queryNorm
              0.29739442 = fieldWeight in 95, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.0546875 = fieldNorm(doc=95)
          0.022705538 = weight(abstract_txt:using in 95) [ClassicSimilarity], result of:
            0.022705538 = score(doc=95,freq=3.0), product of:
              0.069342576 = queryWeight, product of:
                1.7885356 = boost
                3.4568708 = idf(docFreq=3806, maxDocs=44421)
                0.011215515 = queryNorm
              0.32744008 = fieldWeight in 95, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.4568708 = idf(docFreq=3806, maxDocs=44421)
                0.0546875 = fieldNorm(doc=95)
          0.05650573 = weight(abstract_txt:extraction in 95) [ClassicSimilarity], result of:
            0.05650573 = score(doc=95,freq=1.0), product of:
              0.16686602 = queryWeight, product of:
                2.4027698 = boost
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.011215515 = queryNorm
              0.33862934 = fieldWeight in 95, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.0546875 = fieldNorm(doc=95)
          0.11350152 = weight(abstract_txt:learning in 95) [ClassicSimilarity], result of:
            0.11350152 = score(doc=95,freq=5.0), product of:
              0.1957315 = queryWeight, product of:
                3.6802185 = boost
                4.7420692 = idf(docFreq=1052, maxDocs=44421)
                0.011215515 = queryNorm
              0.57988375 = fieldWeight in 95, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.7420692 = idf(docFreq=1052, maxDocs=44421)
                0.0546875 = fieldNorm(doc=95)
          0.1371892 = weight(abstract_txt:annotation in 95) [ClassicSimilarity], result of:
            0.1371892 = score(doc=95,freq=1.0), product of:
              0.35738456 = queryWeight, product of:
                4.5396304 = boost
                7.019336 = idf(docFreq=107, maxDocs=44421)
                0.011215515 = queryNorm
              0.38386995 = fieldWeight in 95, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.019336 = idf(docFreq=107, maxDocs=44421)
                0.0546875 = fieldNorm(doc=95)
        0.28 = coord(7/25)
    
  4. Wu, T.; Pottenger, W.M.: ¬A semi-supervised active learning algorithm for information extraction from textual data (2005) 0.10
    0.1033135 = sum of:
      0.1033135 = product of:
        0.6457094 = sum of:
          0.12915595 = weight(abstract_txt:extraction in 4237) [ClassicSimilarity], result of:
            0.12915595 = score(doc=4237,freq=4.0), product of:
              0.16686602 = queryWeight, product of:
                2.4027698 = boost
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.011215515 = queryNorm
              0.7740099 = fieldWeight in 4237, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.0625 = fieldNorm(doc=4237)
          0.18404566 = weight(abstract_txt:active in 4237) [ClassicSimilarity], result of:
            0.18404566 = score(doc=4237,freq=4.0), product of:
              0.23257066 = queryWeight, product of:
                3.2754807 = boost
                6.3308296 = idf(docFreq=214, maxDocs=44421)
                0.011215515 = queryNorm
              0.7913537 = fieldWeight in 4237, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.3308296 = idf(docFreq=214, maxDocs=44421)
                0.0625 = fieldNorm(doc=4237)
          0.16842853 = weight(abstract_txt:seed in 4237) [ClassicSimilarity], result of:
            0.16842853 = score(doc=4237,freq=1.0), product of:
              0.31617102 = queryWeight, product of:
                3.3074193 = boost
                8.523414 = idf(docFreq=23, maxDocs=44421)
                0.011215515 = queryNorm
              0.53271335 = fieldWeight in 4237, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.523414 = idf(docFreq=23, maxDocs=44421)
                0.0625 = fieldNorm(doc=4237)
          0.16407923 = weight(abstract_txt:learning in 4237) [ClassicSimilarity], result of:
            0.16407923 = score(doc=4237,freq=8.0), product of:
              0.1957315 = queryWeight, product of:
                3.6802185 = boost
                4.7420692 = idf(docFreq=1052, maxDocs=44421)
                0.011215515 = queryNorm
              0.8382873 = fieldWeight in 4237, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                4.7420692 = idf(docFreq=1052, maxDocs=44421)
                0.0625 = fieldNorm(doc=4237)
        0.16 = coord(4/25)
    
  5. Jung, H.; Yi, E.; Kim, D.; Lee, G.G.: Information extraction with automatic knowledge expansion (2005) 0.10
    0.101940274 = sum of:
      0.101940274 = product of:
        0.63712674 = sum of:
          0.18265408 = weight(abstract_txt:extraction in 2008) [ClassicSimilarity], result of:
            0.18265408 = score(doc=2008,freq=8.0), product of:
              0.16686602 = queryWeight, product of:
                2.4027698 = boost
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.011215515 = queryNorm
              1.0946152 = fieldWeight in 2008, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.0625 = fieldNorm(doc=2008)
          0.15348206 = weight(abstract_txt:learning in 2008) [ClassicSimilarity], result of:
            0.15348206 = score(doc=2008,freq=7.0), product of:
              0.1957315 = queryWeight, product of:
                3.6802185 = boost
                4.7420692 = idf(docFreq=1052, maxDocs=44421)
                0.011215515 = queryNorm
              0.78414595 = fieldWeight in 2008, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                4.7420692 = idf(docFreq=1052, maxDocs=44421)
                0.0625 = fieldNorm(doc=2008)
          0.14420293 = weight(abstract_txt:sequence in 2008) [ClassicSimilarity], result of:
            0.14420293 = score(doc=2008,freq=1.0), product of:
              0.33799526 = queryWeight, product of:
                4.414768 = boost
                6.82627 = idf(docFreq=130, maxDocs=44421)
                0.011215515 = queryNorm
              0.42664188 = fieldWeight in 2008, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.82627 = idf(docFreq=130, maxDocs=44421)
                0.0625 = fieldNorm(doc=2008)
          0.15678765 = weight(abstract_txt:annotation in 2008) [ClassicSimilarity], result of:
            0.15678765 = score(doc=2008,freq=1.0), product of:
              0.35738456 = queryWeight, product of:
                4.5396304 = boost
                7.019336 = idf(docFreq=107, maxDocs=44421)
                0.011215515 = queryNorm
              0.4387085 = fieldWeight in 2008, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.019336 = idf(docFreq=107, maxDocs=44421)
                0.0625 = fieldNorm(doc=2008)
        0.16 = coord(4/25)