Document (#35423)

Author
Cui, H.
Boufford, D.
Selden, P.
Title
Semantic annotation of biosystematics literature without training examples
Source
Journal of the American Society for Information Science and Technology. 61(2010) no.3, S.522-542
Year
2010
Abstract
This article presents an unsupervised algorithm for semantic annotation of morphological descriptions of whole organisms. The algorithm is able to annotate plain text descriptions with high accuracy at the clause level by exploiting the corpus itself. In other words, the algorithm does not need lexicons, syntactic parsers, training examples, or annotation templates. The evaluation on two real-life description collections in botany and paleontology shows that the algorithm has the following desirable features: (a) reduces/eliminates manual labor required to compile dictionaries and prepare source documents; (b) improves annotation coverage: the algorithm annotates what appears in documents and is not limited by predefined and often incomplete templates; (c) learns clean and reusable concepts: the algorithm learns organ names and character states that can be used to construct reusable domain lexicons, as opposed to collection-dependent patterns whose applicability is often limited to a particular collection; (d) insensitive to collection size; and (e) runs in linear time with respect to the number of clauses to be annotated.
Theme
Automatisches Indexieren
Field
Biologie

Similar documents (content)

  1. Cui, H.: CharaParser for fine-grained semantic annotation of organism morphological descriptions (2012) 0.16
    0.1620132 = sum of:
      0.1620132 = product of:
        0.675055 = sum of:
          0.030477282 = weight(abstract_txt:semantic in 1045) [ClassicSimilarity], result of:
            0.030477282 = score(doc=1045,freq=2.0), product of:
              0.07706082 = queryWeight, product of:
                1.0598923 = boost
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.016248912 = queryNorm
              0.39549646 = fieldWeight in 1045, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.0625 = fieldNorm(doc=1045)
          0.087490864 = weight(abstract_txt:organ in 1045) [ClassicSimilarity], result of:
            0.087490864 = score(doc=1045,freq=1.0), product of:
              0.15565315 = queryWeight, product of:
                1.0651454 = boost
                8.993418 = idf(docFreq=14, maxDocs=44421)
                0.016248912 = queryNorm
              0.5620886 = fieldWeight in 1045, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.993418 = idf(docFreq=14, maxDocs=44421)
                0.0625 = fieldNorm(doc=1045)
          0.11704989 = weight(abstract_txt:annotates in 1045) [ClassicSimilarity], result of:
            0.11704989 = score(doc=1045,freq=1.0), product of:
              0.18898621 = queryWeight, product of:
                1.1736673 = boost
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.016248912 = queryNorm
              0.61935675 = fieldWeight in 1045, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.0625 = fieldNorm(doc=1045)
          0.0707193 = weight(abstract_txt:descriptions in 1045) [ClassicSimilarity], result of:
            0.0707193 = score(doc=1045,freq=2.0), product of:
              0.13506457 = queryWeight, product of:
                1.4031873 = boost
                5.9238153 = idf(docFreq=322, maxDocs=44421)
                0.016248912 = queryNorm
              0.5235962 = fieldWeight in 1045, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.9238153 = idf(docFreq=322, maxDocs=44421)
                0.0625 = fieldNorm(doc=1045)
          0.23531605 = weight(abstract_txt:annotation in 1045) [ClassicSimilarity], result of:
            0.23531605 = score(doc=1045,freq=2.0), product of:
              0.37928048 = queryWeight, product of:
                3.3253715 = boost
                7.019336 = idf(docFreq=107, maxDocs=44421)
                0.016248912 = queryNorm
              0.62042755 = fieldWeight in 1045, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.019336 = idf(docFreq=107, maxDocs=44421)
                0.0625 = fieldNorm(doc=1045)
          0.13400169 = weight(abstract_txt:algorithm in 1045) [ClassicSimilarity], result of:
            0.13400169 = score(doc=1045,freq=1.0), product of:
              0.37581438 = queryWeight, product of:
                4.0540795 = boost
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.016248912 = queryNorm
              0.35656348 = fieldWeight in 1045, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.0625 = fieldNorm(doc=1045)
        0.24 = coord(6/25)
    
  2. Malo, P.; Sinha, A.; Wallenius, J.; Korhonen, P.: Concept-based document classification using Wikipedia and value function (2011) 0.09
    0.09215348 = sum of:
      0.09215348 = product of:
        0.4607674 = sum of:
          0.026938364 = weight(abstract_txt:semantic in 948) [ClassicSimilarity], result of:
            0.026938364 = score(doc=948,freq=1.0), product of:
              0.07706082 = queryWeight, product of:
                1.0598923 = boost
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.016248912 = queryNorm
              0.34957278 = fieldWeight in 948, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.078125 = fieldNorm(doc=948)
          0.039989196 = weight(abstract_txt:training in 948) [ClassicSimilarity], result of:
            0.039989196 = score(doc=948,freq=1.0), product of:
              0.100280054 = queryWeight, product of:
                1.2090721 = boost
                5.104322 = idf(docFreq=732, maxDocs=44421)
                0.016248912 = queryNorm
              0.39877516 = fieldWeight in 948, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.104322 = idf(docFreq=732, maxDocs=44421)
                0.078125 = fieldNorm(doc=948)
          0.04533878 = weight(abstract_txt:collection in 948) [ClassicSimilarity], result of:
            0.04533878 = score(doc=948,freq=1.0), product of:
              0.12481394 = queryWeight, product of:
                1.6520458 = boost
                4.649612 = idf(docFreq=1154, maxDocs=44421)
                0.016248912 = queryNorm
              0.36325094 = fieldWeight in 948, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.649612 = idf(docFreq=1154, maxDocs=44421)
                0.078125 = fieldNorm(doc=948)
          0.18099894 = weight(abstract_txt:learns in 948) [ClassicSimilarity], result of:
            0.18099894 = score(doc=948,freq=1.0), product of:
              0.27439117 = queryWeight, product of:
                2.0 = boost
                8.443371 = idf(docFreq=25, maxDocs=44421)
                0.016248912 = queryNorm
              0.65963835 = fieldWeight in 948, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.443371 = idf(docFreq=25, maxDocs=44421)
                0.078125 = fieldNorm(doc=948)
          0.1675021 = weight(abstract_txt:algorithm in 948) [ClassicSimilarity], result of:
            0.1675021 = score(doc=948,freq=1.0), product of:
              0.37581438 = queryWeight, product of:
                4.0540795 = boost
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.016248912 = queryNorm
              0.44570434 = fieldWeight in 948, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.078125 = fieldNorm(doc=948)
        0.2 = coord(5/25)
    
  3. Robert, C.A.; Davis, A.: Annotation and its application to information research in economic intelligence (2006) 0.09
    0.08774053 = sum of:
      0.08774053 = product of:
        0.73117113 = sum of:
          0.09754218 = weight(abstract_txt:annotate in 3288) [ClassicSimilarity], result of:
            0.09754218 = score(doc=3288,freq=1.0), product of:
              0.14422408 = queryWeight, product of:
                1.0252949 = boost
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.016248912 = queryNorm
              0.67632383 = fieldWeight in 3288, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.078125 = fieldNorm(doc=3288)
          0.04533878 = weight(abstract_txt:collection in 3288) [ClassicSimilarity], result of:
            0.04533878 = score(doc=3288,freq=1.0), product of:
              0.12481394 = queryWeight, product of:
                1.6520458 = boost
                4.649612 = idf(docFreq=1154, maxDocs=44421)
                0.016248912 = queryNorm
              0.36325094 = fieldWeight in 3288, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.649612 = idf(docFreq=1154, maxDocs=44421)
                0.078125 = fieldNorm(doc=3288)
          0.58829015 = weight(abstract_txt:annotation in 3288) [ClassicSimilarity], result of:
            0.58829015 = score(doc=3288,freq=8.0), product of:
              0.37928048 = queryWeight, product of:
                3.3253715 = boost
                7.019336 = idf(docFreq=107, maxDocs=44421)
                0.016248912 = queryNorm
              1.5510689 = fieldWeight in 3288, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                7.019336 = idf(docFreq=107, maxDocs=44421)
                0.078125 = fieldNorm(doc=3288)
        0.12 = coord(3/25)
    
  4. Vallet, D.; Fernández, M.; Castells, P.: ¬An ontology-based information retrieval model (2005) 0.08
    0.083186984 = sum of:
      0.083186984 = product of:
        0.6932249 = sum of:
          0.05599034 = weight(abstract_txt:semantic in 708) [ClassicSimilarity], result of:
            0.05599034 = score(doc=708,freq=3.0), product of:
              0.07706082 = queryWeight, product of:
                1.0598923 = boost
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.016248912 = queryNorm
              0.72657335 = fieldWeight in 708, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.09375 = fieldNorm(doc=708)
          0.3529741 = weight(abstract_txt:annotation in 708) [ClassicSimilarity], result of:
            0.3529741 = score(doc=708,freq=2.0), product of:
              0.37928048 = queryWeight, product of:
                3.3253715 = boost
                7.019336 = idf(docFreq=107, maxDocs=44421)
                0.016248912 = queryNorm
              0.9306413 = fieldWeight in 708, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.019336 = idf(docFreq=107, maxDocs=44421)
                0.09375 = fieldNorm(doc=708)
          0.28426048 = weight(abstract_txt:algorithm in 708) [ClassicSimilarity], result of:
            0.28426048 = score(doc=708,freq=2.0), product of:
              0.37581438 = queryWeight, product of:
                4.0540795 = boost
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.016248912 = queryNorm
              0.7563853 = fieldWeight in 708, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.09375 = fieldNorm(doc=708)
        0.12 = coord(3/25)
    
  5. Zhao, G.; Wu, J.; Wang, D.; Li, T.: Entity disambiguation to Wikipedia using collective ranking (2016) 0.08
    0.07547874 = sum of:
      0.07547874 = product of:
        0.47174215 = sum of:
          0.09049947 = weight(abstract_txt:plain in 4266) [ClassicSimilarity], result of:
            0.09049947 = score(doc=4266,freq=1.0), product of:
              0.13719559 = queryWeight, product of:
                8.443371 = idf(docFreq=25, maxDocs=44421)
                0.016248912 = queryNorm
              0.65963835 = fieldWeight in 4266, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.443371 = idf(docFreq=25, maxDocs=44421)
                0.078125 = fieldNorm(doc=4266)
          0.026938364 = weight(abstract_txt:semantic in 4266) [ClassicSimilarity], result of:
            0.026938364 = score(doc=4266,freq=1.0), product of:
              0.07706082 = queryWeight, product of:
                1.0598923 = boost
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.016248912 = queryNorm
              0.34957278 = fieldWeight in 4266, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.078125 = fieldNorm(doc=4266)
          0.14631236 = weight(abstract_txt:annotates in 4266) [ClassicSimilarity], result of:
            0.14631236 = score(doc=4266,freq=1.0), product of:
              0.18898621 = queryWeight, product of:
                1.1736673 = boost
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.016248912 = queryNorm
              0.7741959 = fieldWeight in 4266, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.078125 = fieldNorm(doc=4266)
          0.20799196 = weight(abstract_txt:annotation in 4266) [ClassicSimilarity], result of:
            0.20799196 = score(doc=4266,freq=1.0), product of:
              0.37928048 = queryWeight, product of:
                3.3253715 = boost
                7.019336 = idf(docFreq=107, maxDocs=44421)
                0.016248912 = queryNorm
              0.5483856 = fieldWeight in 4266, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.019336 = idf(docFreq=107, maxDocs=44421)
                0.078125 = fieldNorm(doc=4266)
        0.16 = coord(4/25)