Document (#27561)

Author
Terada, A.
Tokunaga, T.
Tanaka, H.
Title
Automatic expansion of abbreviations by using context and character information
Source
Information processing and management. 40(2004) no.1, S.31-45
Year
2004
Abstract
Unknown words such as proper nouns, abbreviations, and acronyms are a major obstacle in text processing. Abbreviations, in particular, are difficult to read/process because they are often domain specific. In this paper, we propose a method for automatic expansion of abbreviations by using context and character information. In previous studies dictionaries were used to search for abbreviation expansion candidates (candidates words for original form of abbreviations) to expand abbreviations. We use a corpus with few abbreviations from the same field instead of a dictionary. We calculate the adequacy of abbreviation expansion candidates based on the similarity between the context of the target abbreviation and that of its expansion candidate. The similarity is calculated using a vector space model in which each vector element consists of words surrounding the target abbreviation and those of its expansion candidate. Experiments using approximately 10,000 documents in the field of aviation showed that the accuracy of the proposed method is 10% higher than that of previously developed methods.

Similar documents (content)

  1. HaCohen-Kerner, Y.; Kass, A.; Peretz, A.: HAADS: a Hebrew Aramaic abbreviation disambiguation system (2010) 0.22
    0.21702571 = sum of:
      0.21702571 = product of:
        1.0851285 = sum of:
          0.018041257 = weight(abstract_txt:method in 977) [ClassicSimilarity], result of:
            0.018041257 = score(doc=977,freq=1.0), product of:
              0.05133097 = queryWeight, product of:
                1.2493808 = boost
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.0091324495 = queryNorm
              0.35146925 = fieldWeight in 977, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.078125 = fieldNorm(doc=977)
          0.024180798 = weight(abstract_txt:context in 977) [ClassicSimilarity], result of:
            0.024180798 = score(doc=977,freq=1.0), product of:
              0.071429744 = queryWeight, product of:
                1.8050544 = boost
                4.333128 = idf(docFreq=1584, maxDocs=44421)
                0.0091324495 = queryNorm
              0.33852562 = fieldWeight in 977, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.333128 = idf(docFreq=1584, maxDocs=44421)
                0.078125 = fieldNorm(doc=977)
          0.016370209 = weight(abstract_txt:using in 977) [ClassicSimilarity], result of:
            0.016370209 = score(doc=977,freq=1.0), product of:
              0.060615133 = queryWeight, product of:
                1.9200417 = boost
                3.4568708 = idf(docFreq=3806, maxDocs=44421)
                0.0091324495 = queryNorm
              0.27006802 = fieldWeight in 977, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4568708 = idf(docFreq=3806, maxDocs=44421)
                0.078125 = fieldNorm(doc=977)
          0.36792648 = weight(abstract_txt:abbreviation in 977) [ClassicSimilarity], result of:
            0.36792648 = score(doc=977,freq=1.0), product of:
              0.48274627 = queryWeight, product of:
                5.418506 = boost
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.0091324495 = queryNorm
              0.7621529 = fieldWeight in 977, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.078125 = fieldNorm(doc=977)
          0.65860975 = weight(abstract_txt:abbreviations in 977) [ClassicSimilarity], result of:
            0.65860975 = score(doc=977,freq=2.0), product of:
              0.68071663 = queryWeight, product of:
                8.511818 = boost
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.0091324495 = queryNorm
              0.9675241 = fieldWeight in 977, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.078125 = fieldNorm(doc=977)
        0.2 = coord(5/25)
    
  2. Beall, J.: Abbreviations, full spellings, and searchers' preferences (2011) 0.21
    0.20900497 = sum of:
      0.20900497 = product of:
        1.3062811 = sum of:
          0.019644251 = weight(abstract_txt:using in 166) [ClassicSimilarity], result of:
            0.019644251 = score(doc=166,freq=1.0), product of:
              0.060615133 = queryWeight, product of:
                1.9200417 = boost
                3.4568708 = idf(docFreq=3806, maxDocs=44421)
                0.0091324495 = queryNorm
              0.32408163 = fieldWeight in 166, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4568708 = idf(docFreq=3806, maxDocs=44421)
                0.09375 = fieldNorm(doc=166)
          0.054793365 = weight(abstract_txt:words in 166) [ClassicSimilarity], result of:
            0.054793365 = score(doc=166,freq=1.0), product of:
              0.1091264 = queryWeight, product of:
                2.2310827 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.0091324495 = queryNorm
              0.50210917 = fieldWeight in 166, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.09375 = fieldNorm(doc=166)
          0.44151175 = weight(abstract_txt:abbreviation in 166) [ClassicSimilarity], result of:
            0.44151175 = score(doc=166,freq=1.0), product of:
              0.48274627 = queryWeight, product of:
                5.418506 = boost
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.0091324495 = queryNorm
              0.91458344 = fieldWeight in 166, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.09375 = fieldNorm(doc=166)
          0.7903317 = weight(abstract_txt:abbreviations in 166) [ClassicSimilarity], result of:
            0.7903317 = score(doc=166,freq=2.0), product of:
              0.68071663 = queryWeight, product of:
                8.511818 = boost
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.0091324495 = queryNorm
              1.161029 = fieldWeight in 166, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.09375 = fieldNorm(doc=166)
        0.16 = coord(4/25)
    
  3. Wacholder, N.; Byrd, R.J.: Retrieving information from full text using linguistic knowledge (1994) 0.16
    0.15518986 = sum of:
      0.15518986 = product of:
        0.64662445 = sum of:
          0.0653674 = weight(abstract_txt:acronyms in 138) [ClassicSimilarity], result of:
            0.0653674 = score(doc=138,freq=1.0), product of:
              0.09610937 = queryWeight, product of:
                1.2088516 = boost
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.0091324495 = queryNorm
              0.68013555 = fieldWeight in 138, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.078125 = fieldNorm(doc=138)
          0.017916776 = weight(abstract_txt:field in 138) [ClassicSimilarity], result of:
            0.017916776 = score(doc=138,freq=1.0), product of:
              0.05109458 = queryWeight, product of:
                1.2465007 = boost
                4.4884357 = idf(docFreq=1356, maxDocs=44421)
                0.0091324495 = queryNorm
              0.35065904 = fieldWeight in 138, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4884357 = idf(docFreq=1356, maxDocs=44421)
                0.078125 = fieldNorm(doc=138)
          0.027790917 = weight(abstract_txt:automatic in 138) [ClassicSimilarity], result of:
            0.027790917 = score(doc=138,freq=1.0), product of:
              0.06846525 = queryWeight, product of:
                1.4429132 = boost
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.0091324495 = queryNorm
              0.40591276 = fieldWeight in 138, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.078125 = fieldNorm(doc=138)
          0.024180798 = weight(abstract_txt:context in 138) [ClassicSimilarity], result of:
            0.024180798 = score(doc=138,freq=1.0), product of:
              0.071429744 = queryWeight, product of:
                1.8050544 = boost
                4.333128 = idf(docFreq=1584, maxDocs=44421)
                0.0091324495 = queryNorm
              0.33852562 = fieldWeight in 138, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.333128 = idf(docFreq=1584, maxDocs=44421)
                0.078125 = fieldNorm(doc=138)
          0.045661137 = weight(abstract_txt:words in 138) [ClassicSimilarity], result of:
            0.045661137 = score(doc=138,freq=1.0), product of:
              0.1091264 = queryWeight, product of:
                2.2310827 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.0091324495 = queryNorm
              0.4184243 = fieldWeight in 138, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.078125 = fieldNorm(doc=138)
          0.4657074 = weight(abstract_txt:abbreviations in 138) [ClassicSimilarity], result of:
            0.4657074 = score(doc=138,freq=1.0), product of:
              0.68071663 = queryWeight, product of:
                8.511818 = boost
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.0091324495 = queryNorm
              0.6841428 = fieldWeight in 138, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.078125 = fieldNorm(doc=138)
        0.24 = coord(6/25)
    
  4. HaCohen-Kerner, Y.; Kass, A.; Peretz, A.: Initialism disambiguation : man versus machine (2013) 0.13
    0.13137755 = sum of:
      0.13137755 = product of:
        0.6568877 = sum of:
          0.05229392 = weight(abstract_txt:acronyms in 2094) [ClassicSimilarity], result of:
            0.05229392 = score(doc=2094,freq=1.0), product of:
              0.09610937 = queryWeight, product of:
                1.2088516 = boost
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.0091324495 = queryNorm
              0.54410845 = fieldWeight in 2094, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.0625 = fieldNorm(doc=2094)
          0.014433006 = weight(abstract_txt:method in 2094) [ClassicSimilarity], result of:
            0.014433006 = score(doc=2094,freq=1.0), product of:
              0.05133097 = queryWeight, product of:
                1.2493808 = boost
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.0091324495 = queryNorm
              0.2811754 = fieldWeight in 2094, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.0625 = fieldNorm(doc=2094)
          0.019344639 = weight(abstract_txt:context in 2094) [ClassicSimilarity], result of:
            0.019344639 = score(doc=2094,freq=1.0), product of:
              0.071429744 = queryWeight, product of:
                1.8050544 = boost
                4.333128 = idf(docFreq=1584, maxDocs=44421)
                0.0091324495 = queryNorm
              0.2708205 = fieldWeight in 2094, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.333128 = idf(docFreq=1584, maxDocs=44421)
                0.0625 = fieldNorm(doc=2094)
          0.0439283 = weight(abstract_txt:vector in 2094) [ClassicSimilarity], result of:
            0.0439283 = score(doc=2094,freq=1.0), product of:
              0.10780474 = queryWeight, product of:
                1.8106064 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.0091324495 = queryNorm
              0.40748024 = fieldWeight in 2094, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.0625 = fieldNorm(doc=2094)
          0.52688783 = weight(abstract_txt:abbreviations in 2094) [ClassicSimilarity], result of:
            0.52688783 = score(doc=2094,freq=2.0), product of:
              0.68071663 = queryWeight, product of:
                8.511818 = boost
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.0091324495 = queryNorm
              0.7740193 = fieldWeight in 2094, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.0625 = fieldNorm(doc=2094)
        0.2 = coord(5/25)
    
  5. Franceschini, F.; Maisano, D.; Mastrogiacomo, L.: ¬A novel approach for estimating the omitted-citation rate of bibliometric databases with an application to the field of bibliometrics (2013) 0.13
    0.13137755 = sum of:
      0.13137755 = product of:
        0.6568877 = sum of:
          0.05229392 = weight(abstract_txt:acronyms in 2097) [ClassicSimilarity], result of:
            0.05229392 = score(doc=2097,freq=1.0), product of:
              0.09610937 = queryWeight, product of:
                1.2088516 = boost
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.0091324495 = queryNorm
              0.54410845 = fieldWeight in 2097, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.0625 = fieldNorm(doc=2097)
          0.014433006 = weight(abstract_txt:method in 2097) [ClassicSimilarity], result of:
            0.014433006 = score(doc=2097,freq=1.0), product of:
              0.05133097 = queryWeight, product of:
                1.2493808 = boost
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.0091324495 = queryNorm
              0.2811754 = fieldWeight in 2097, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.0625 = fieldNorm(doc=2097)
          0.019344639 = weight(abstract_txt:context in 2097) [ClassicSimilarity], result of:
            0.019344639 = score(doc=2097,freq=1.0), product of:
              0.071429744 = queryWeight, product of:
                1.8050544 = boost
                4.333128 = idf(docFreq=1584, maxDocs=44421)
                0.0091324495 = queryNorm
              0.2708205 = fieldWeight in 2097, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.333128 = idf(docFreq=1584, maxDocs=44421)
                0.0625 = fieldNorm(doc=2097)
          0.0439283 = weight(abstract_txt:vector in 2097) [ClassicSimilarity], result of:
            0.0439283 = score(doc=2097,freq=1.0), product of:
              0.10780474 = queryWeight, product of:
                1.8106064 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.0091324495 = queryNorm
              0.40748024 = fieldWeight in 2097, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.0625 = fieldNorm(doc=2097)
          0.52688783 = weight(abstract_txt:abbreviations in 2097) [ClassicSimilarity], result of:
            0.52688783 = score(doc=2097,freq=2.0), product of:
              0.68071663 = queryWeight, product of:
                8.511818 = boost
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.0091324495 = queryNorm
              0.7740193 = fieldWeight in 2097, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.0625 = fieldNorm(doc=2097)
        0.2 = coord(5/25)