Document (#27542)

Author
Doszkocs, T.E.
Zamora, A.
Title
Dictionary services and spelling aids for Web searching
Source
Online. 28(2004) no.3, S.22-29
Year
2004
Abstract
The Specialized Information Services Division (SIS) of the National Library of Medicine (NLM) provides Web access to more than a dozen scientific databases on toxicology and the environment on TOXNET . Search queries on TOXNET often include misspelled or variant English words, medical and scientific jargon and chemical names. Following the example of search engines like Google and ClinicalTrials.gov, we set out to develop a spelling "suggestion" system for increased recall and precision in TOXNET searching. This paper describes development of dictionary technology that can be used in a variety of applications such as orthographic verification, writing aid, natural language processing, and information storage and retrieval. The design of the technology allows building complex applications using the components developed in the earlier phases of the work in a modular fashion without extensive rewriting of computer code. Since many of the potential applications envisioned for this work have on-line or web-based interfaces, the dictionaries and other computer components must have fast response, and must be adaptable to open-ended database vocabularies, including chemical nomenclature. The dictionary vocabulary for this work was derived from SIS and other databases and specialized resources, such as NLM's Unified Medical Language Systems (UMLS) . The resulting technology, A-Z Dictionary (AZdict), has three major constituents: 1) the vocabulary list, 2) the word attributes that define part of speech and morphological relationships between words in the list, and 3) a set of programs that implements the retrieval of words and their attributes, and determines similarity between words (ChemSpell). These three components can be used in various applications such as spelling verification, spelling aid, part-of-speech tagging, paraphrasing, and many other natural language processing functions.
Theme
Computerlinguistik
Field
Chemie
Object
TOXNET

Similar documents (author)

  1. Doszkocs, T.E.: CITE NLM: Natural language searching in an online catalog (1983) 6.01
    6.0137663 = sum of:
      6.0137663 = weight(author_txt:doszkocs in 783) [ClassicSimilarity], result of:
        6.0137663 = fieldWeight in 783, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.622026 = idf(docFreq=7, maxDocs=44421)
          0.625 = fieldNorm(doc=783)
    
  2. Doszkocs, T.E.: Natural language processing in information retrieval (1986) 6.01
    6.0137663 = sum of:
      6.0137663 = weight(author_txt:doszkocs in 2695) [ClassicSimilarity], result of:
        6.0137663 = fieldWeight in 2695, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.622026 = idf(docFreq=7, maxDocs=44421)
          0.625 = fieldNorm(doc=2695)
    
  3. Doszkocs, T.E.: Simultaneous searching of distributed information and subject repositories on the World Wide Web (1998) 6.01
    6.0137663 = sum of:
      6.0137663 = weight(author_txt:doszkocs in 3334) [ClassicSimilarity], result of:
        6.0137663 = fieldWeight in 3334, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.622026 = idf(docFreq=7, maxDocs=44421)
          0.625 = fieldNorm(doc=3334)
    
  4. Doszkocs, T.E.: Virtual hypertext searching of online databases via the World Wide Web (1996) 6.01
    6.0137663 = sum of:
      6.0137663 = weight(author_txt:doszkocs in 3416) [ClassicSimilarity], result of:
        6.0137663 = fieldWeight in 3416, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.622026 = idf(docFreq=7, maxDocs=44421)
          0.625 = fieldNorm(doc=3416)
    
  5. Doszkocs, T.E.; Weinberg, B.H.: Natural language interfaces for information retrieval (1988) 4.81
    4.811013 = sum of:
      4.811013 = weight(author_txt:doszkocs in 2696) [ClassicSimilarity], result of:
        4.811013 = fieldWeight in 2696, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.622026 = idf(docFreq=7, maxDocs=44421)
          0.5 = fieldNorm(doc=2696)
    

Similar documents (content)

  1. Bellaachia, A.; Amor-Tijani, G.: Proper nouns in English-Arabic cross language information retrieval (2008) 0.23
    0.23051007 = sum of:
      0.23051007 = product of:
        0.82325023 = sum of:
          0.024438871 = weight(abstract_txt:other in 3372) [ClassicSimilarity], result of:
            0.024438871 = score(doc=3372,freq=2.0), product of:
              0.07858012 = queryWeight, product of:
                1.1346056 = boost
                3.5186288 = idf(docFreq=3578, maxDocs=44421)
                0.019683136 = queryNorm
              0.31100577 = fieldWeight in 3372, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5186288 = idf(docFreq=3578, maxDocs=44421)
                0.0625 = fieldNorm(doc=3372)
          0.04074913 = weight(abstract_txt:vocabulary in 3372) [ClassicSimilarity], result of:
            0.04074913 = score(doc=3372,freq=1.0), product of:
              0.12161404 = queryWeight, product of:
                1.1524838 = boost
                5.3611083 = idf(docFreq=566, maxDocs=44421)
                0.019683136 = queryNorm
              0.33506927 = fieldWeight in 3372, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3611083 = idf(docFreq=566, maxDocs=44421)
                0.0625 = fieldNorm(doc=3372)
          0.040707786 = weight(abstract_txt:language in 3372) [ClassicSimilarity], result of:
            0.040707786 = score(doc=3372,freq=2.0), product of:
              0.11041894 = queryWeight, product of:
                1.3449632 = boost
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.019683136 = queryNorm
              0.3686667 = fieldWeight in 3372, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.0625 = fieldNorm(doc=3372)
          0.085866295 = weight(abstract_txt:speech in 3372) [ClassicSimilarity], result of:
            0.085866295 = score(doc=3372,freq=1.0), product of:
              0.19988796 = queryWeight, product of:
                1.47753 = boost
                6.8731537 = idf(docFreq=124, maxDocs=44421)
                0.019683136 = queryNorm
              0.4295721 = fieldWeight in 3372, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8731537 = idf(docFreq=124, maxDocs=44421)
                0.0625 = fieldNorm(doc=3372)
          0.14074266 = weight(abstract_txt:words in 3372) [ClassicSimilarity], result of:
            0.14074266 = score(doc=3372,freq=3.0), product of:
              0.24274945 = queryWeight, product of:
                2.3026986 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.019683136 = queryNorm
              0.5797857 = fieldWeight in 3372, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.0625 = fieldNorm(doc=3372)
          0.15476315 = weight(abstract_txt:dictionary in 3372) [ClassicSimilarity], result of:
            0.15476315 = score(doc=3372,freq=1.0), product of:
              0.37298658 = queryWeight, product of:
                2.8543327 = boost
                6.6388726 = idf(docFreq=157, maxDocs=44421)
                0.019683136 = queryNorm
              0.41492954 = fieldWeight in 3372, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6388726 = idf(docFreq=157, maxDocs=44421)
                0.0625 = fieldNorm(doc=3372)
          0.33598232 = weight(abstract_txt:spelling in 3372) [ClassicSimilarity], result of:
            0.33598232 = score(doc=3372,freq=2.0), product of:
              0.49634364 = queryWeight, product of:
                3.2926776 = boost
                7.6584163 = idf(docFreq=56, maxDocs=44421)
                0.019683136 = queryNorm
              0.67691475 = fieldWeight in 3372, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.6584163 = idf(docFreq=56, maxDocs=44421)
                0.0625 = fieldNorm(doc=3372)
        0.28 = coord(7/25)
    
  2. Toivonen, J.; Pirkola, A.; Keskustalo, H.; Visala, K.; Järvelin, K.: Translating cross-lingual spelling variants using transformation rules (2005) 0.20
    0.19882075 = sum of:
      0.19882075 = product of:
        0.8284198 = sum of:
          0.019852666 = weight(abstract_txt:such in 2052) [ClassicSimilarity], result of:
            0.019852666 = score(doc=2052,freq=1.0), product of:
              0.07428044 = queryWeight, product of:
                1.1031278 = boost
                3.42101 = idf(docFreq=3945, maxDocs=44421)
                0.019683136 = queryNorm
              0.2672664 = fieldWeight in 2052, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.42101 = idf(docFreq=3945, maxDocs=44421)
                0.078125 = fieldNorm(doc=2052)
          0.021601114 = weight(abstract_txt:other in 2052) [ClassicSimilarity], result of:
            0.021601114 = score(doc=2052,freq=1.0), product of:
              0.07858012 = queryWeight, product of:
                1.1346056 = boost
                3.5186288 = idf(docFreq=3578, maxDocs=44421)
                0.019683136 = queryNorm
              0.27489287 = fieldWeight in 2052, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5186288 = idf(docFreq=3578, maxDocs=44421)
                0.078125 = fieldNorm(doc=2052)
          0.07196188 = weight(abstract_txt:language in 2052) [ClassicSimilarity], result of:
            0.07196188 = score(doc=2052,freq=4.0), product of:
              0.11041894 = queryWeight, product of:
                1.3449632 = boost
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.019683136 = queryNorm
              0.6517168 = fieldWeight in 2052, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.078125 = fieldNorm(doc=2052)
          0.101572275 = weight(abstract_txt:words in 2052) [ClassicSimilarity], result of:
            0.101572275 = score(doc=2052,freq=1.0), product of:
              0.24274945 = queryWeight, product of:
                2.3026986 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.019683136 = queryNorm
              0.4184243 = fieldWeight in 2052, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.078125 = fieldNorm(doc=2052)
          0.19345394 = weight(abstract_txt:dictionary in 2052) [ClassicSimilarity], result of:
            0.19345394 = score(doc=2052,freq=1.0), product of:
              0.37298658 = queryWeight, product of:
                2.8543327 = boost
                6.6388726 = idf(docFreq=157, maxDocs=44421)
                0.019683136 = queryNorm
              0.5186619 = fieldWeight in 2052, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6388726 = idf(docFreq=157, maxDocs=44421)
                0.078125 = fieldNorm(doc=2052)
          0.4199779 = weight(abstract_txt:spelling in 2052) [ClassicSimilarity], result of:
            0.4199779 = score(doc=2052,freq=2.0), product of:
              0.49634364 = queryWeight, product of:
                3.2926776 = boost
                7.6584163 = idf(docFreq=56, maxDocs=44421)
                0.019683136 = queryNorm
              0.8461434 = fieldWeight in 2052, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.6584163 = idf(docFreq=56, maxDocs=44421)
                0.078125 = fieldNorm(doc=2052)
        0.24 = coord(6/25)
    
  3. Zimmermann, H.H.: Language and language technology (1991) 0.19
    0.18585491 = sum of:
      0.18585491 = product of:
        1.1615932 = sum of:
          0.078977406 = weight(abstract_txt:processing in 3568) [ClassicSimilarity], result of:
            0.078977406 = score(doc=3568,freq=1.0), product of:
              0.10263135 = queryWeight, product of:
                1.0587246 = boost
                4.9249606 = idf(docFreq=876, maxDocs=44421)
                0.019683136 = queryNorm
              0.7695251 = fieldWeight in 3568, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9249606 = idf(docFreq=876, maxDocs=44421)
                0.15625 = fieldNorm(doc=3568)
          0.10176946 = weight(abstract_txt:language in 3568) [ClassicSimilarity], result of:
            0.10176946 = score(doc=3568,freq=2.0), product of:
              0.11041894 = queryWeight, product of:
                1.3449632 = boost
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.019683136 = queryNorm
              0.92166674 = fieldWeight in 3568, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.15625 = fieldNorm(doc=3568)
          0.38690788 = weight(abstract_txt:dictionary in 3568) [ClassicSimilarity], result of:
            0.38690788 = score(doc=3568,freq=1.0), product of:
              0.37298658 = queryWeight, product of:
                2.8543327 = boost
                6.6388726 = idf(docFreq=157, maxDocs=44421)
                0.019683136 = queryNorm
              1.0373238 = fieldWeight in 3568, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6388726 = idf(docFreq=157, maxDocs=44421)
                0.15625 = fieldNorm(doc=3568)
          0.59393847 = weight(abstract_txt:spelling in 3568) [ClassicSimilarity], result of:
            0.59393847 = score(doc=3568,freq=1.0), product of:
              0.49634364 = queryWeight, product of:
                3.2926776 = boost
                7.6584163 = idf(docFreq=56, maxDocs=44421)
                0.019683136 = queryNorm
              1.1966275 = fieldWeight in 3568, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.6584163 = idf(docFreq=56, maxDocs=44421)
                0.15625 = fieldNorm(doc=3568)
        0.16 = coord(4/25)
    
  4. Wacholder, N.; Byrd, R.J.: Retrieving information from full text using linguistic knowledge (1994) 0.18
    0.17939702 = sum of:
      0.17939702 = product of:
        0.6407036 = sum of:
          0.055845454 = weight(abstract_txt:processing in 138) [ClassicSimilarity], result of:
            0.055845454 = score(doc=138,freq=2.0), product of:
              0.10263135 = queryWeight, product of:
                1.0587246 = boost
                4.9249606 = idf(docFreq=876, maxDocs=44421)
                0.019683136 = queryNorm
              0.5441364 = fieldWeight in 138, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.9249606 = idf(docFreq=876, maxDocs=44421)
                0.078125 = fieldNorm(doc=138)
          0.060906846 = weight(abstract_txt:natural in 138) [ClassicSimilarity], result of:
            0.060906846 = score(doc=138,freq=2.0), product of:
              0.1087424 = queryWeight, product of:
                1.089789 = boost
                5.0694656 = idf(docFreq=758, maxDocs=44421)
                0.019683136 = queryNorm
              0.5601021 = fieldWeight in 138, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.0694656 = idf(docFreq=758, maxDocs=44421)
                0.078125 = fieldNorm(doc=138)
          0.043236315 = weight(abstract_txt:must in 138) [ClassicSimilarity], result of:
            0.043236315 = score(doc=138,freq=1.0), product of:
              0.109026134 = queryWeight, product of:
                1.0912099 = boost
                5.076075 = idf(docFreq=753, maxDocs=44421)
                0.019683136 = queryNorm
              0.39656836 = fieldWeight in 138, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.076075 = idf(docFreq=753, maxDocs=44421)
                0.078125 = fieldNorm(doc=138)
          0.019852666 = weight(abstract_txt:such in 138) [ClassicSimilarity], result of:
            0.019852666 = score(doc=138,freq=1.0), product of:
              0.07428044 = queryWeight, product of:
                1.1031278 = boost
                3.42101 = idf(docFreq=3945, maxDocs=44421)
                0.019683136 = queryNorm
              0.2672664 = fieldWeight in 138, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.42101 = idf(docFreq=3945, maxDocs=44421)
                0.078125 = fieldNorm(doc=138)
          0.062320814 = weight(abstract_txt:language in 138) [ClassicSimilarity], result of:
            0.062320814 = score(doc=138,freq=3.0), product of:
              0.11041894 = queryWeight, product of:
                1.3449632 = boost
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.019683136 = queryNorm
              0.5644033 = fieldWeight in 138, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.078125 = fieldNorm(doc=138)
          0.101572275 = weight(abstract_txt:words in 138) [ClassicSimilarity], result of:
            0.101572275 = score(doc=138,freq=1.0), product of:
              0.24274945 = queryWeight, product of:
                2.3026986 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.019683136 = queryNorm
              0.4184243 = fieldWeight in 138, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.078125 = fieldNorm(doc=138)
          0.29696923 = weight(abstract_txt:spelling in 138) [ClassicSimilarity], result of:
            0.29696923 = score(doc=138,freq=1.0), product of:
              0.49634364 = queryWeight, product of:
                3.2926776 = boost
                7.6584163 = idf(docFreq=56, maxDocs=44421)
                0.019683136 = queryNorm
              0.59831375 = fieldWeight in 138, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.6584163 = idf(docFreq=56, maxDocs=44421)
                0.078125 = fieldNorm(doc=138)
        0.28 = coord(7/25)
    
  5. Ballard, T.; Lifshin, A.: Prediction of OPAC spelling errors through a keyword inventory (1992) 0.18
    0.17679447 = sum of:
      0.17679447 = product of:
        0.7366436 = sum of:
          0.14724284 = weight(abstract_txt:misspelled in 1498) [ClassicSimilarity], result of:
            0.14724284 = score(doc=1498,freq=1.0), product of:
              0.19587438 = queryWeight, product of:
                1.0342292 = boost
                9.622026 = idf(docFreq=7, maxDocs=44421)
                0.019683136 = queryNorm
              0.7517208 = fieldWeight in 1498, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.622026 = idf(docFreq=7, maxDocs=44421)
                0.078125 = fieldNorm(doc=1498)
          0.019852666 = weight(abstract_txt:such in 1498) [ClassicSimilarity], result of:
            0.019852666 = score(doc=1498,freq=1.0), product of:
              0.07428044 = queryWeight, product of:
                1.1031278 = boost
                3.42101 = idf(docFreq=3945, maxDocs=44421)
                0.019683136 = queryNorm
              0.2672664 = fieldWeight in 1498, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.42101 = idf(docFreq=3945, maxDocs=44421)
                0.078125 = fieldNorm(doc=1498)
          0.021601114 = weight(abstract_txt:other in 1498) [ClassicSimilarity], result of:
            0.021601114 = score(doc=1498,freq=1.0), product of:
              0.07858012 = queryWeight, product of:
                1.1346056 = boost
                3.5186288 = idf(docFreq=3578, maxDocs=44421)
                0.019683136 = queryNorm
              0.27489287 = fieldWeight in 1498, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5186288 = idf(docFreq=3578, maxDocs=44421)
                0.078125 = fieldNorm(doc=1498)
          0.10733286 = weight(abstract_txt:speech in 1498) [ClassicSimilarity], result of:
            0.10733286 = score(doc=1498,freq=1.0), product of:
              0.19988796 = queryWeight, product of:
                1.47753 = boost
                6.8731537 = idf(docFreq=124, maxDocs=44421)
                0.019683136 = queryNorm
              0.53696513 = fieldWeight in 1498, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8731537 = idf(docFreq=124, maxDocs=44421)
                0.078125 = fieldNorm(doc=1498)
          0.14364488 = weight(abstract_txt:words in 1498) [ClassicSimilarity], result of:
            0.14364488 = score(doc=1498,freq=2.0), product of:
              0.24274945 = queryWeight, product of:
                2.3026986 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.019683136 = queryNorm
              0.5917413 = fieldWeight in 1498, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.078125 = fieldNorm(doc=1498)
          0.29696923 = weight(abstract_txt:spelling in 1498) [ClassicSimilarity], result of:
            0.29696923 = score(doc=1498,freq=1.0), product of:
              0.49634364 = queryWeight, product of:
                3.2926776 = boost
                7.6584163 = idf(docFreq=56, maxDocs=44421)
                0.019683136 = queryNorm
              0.59831375 = fieldWeight in 1498, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.6584163 = idf(docFreq=56, maxDocs=44421)
                0.078125 = fieldNorm(doc=1498)
        0.24 = coord(6/25)