Document (#23264)

Author
Young, C.W.
Eastman, C.M.
Oakman, R.L.
Title
¬An analysis of ill-formed input in natural language queries to document retrieval systems
Source
Information processing and management. 27(1991) no.6, S.615-622
Year
1991
Abstract
Natrual language document retrieval queries from the Thomas Cooper Library, South Carolina Univ. were analysed in oder to investigate the frequency of various types of ill-formed input, such as spelling errors, cooccurrence violations, conjunctions, ellipsis, and missing or incorrect punctuation. Users were requested to write out their requests for information in complete sentences on the form normally used by the library. The primary reason for analysing ill-formed inputs was to determine whether there is a significant need to study ill-formed inputs in detail. Results indicated that most of the queries were sentence fragments and that many of them contained some type of ill-formed input. Conjunctions caused the most problems. The next most serious problem was caused by punctuation errors. Spelling errors occured in a small number of queries. The remaining types of ill-formed input considered, allipsis and cooccurrence violations, were not found in the queries
Theme
Benutzerstudien
Sprachretrieval

Similar documents (author)

  1. Eastman, C.M.: Overlaps in postings to thesaurus terms : a preliminary study (1988) 2.27
    2.2665296 = sum of:
      2.2665296 = product of:
        4.533059 = sum of:
          4.533059 = weight(author_txt:eastman in 3555) [ClassicSimilarity], result of:
            4.533059 = score(doc=3555,freq=1.0), product of:
              0.77996564 = queryWeight, product of:
                1.1163803 = boost
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.07513242 = queryNorm
              5.81187 = fieldWeight in 3555, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.625 = fieldNorm(doc=3555)
        0.5 = coord(1/2)
    
  2. Eastman, C.M.: 30,000 hits may be better than 300 : precision anomalies in Internet searches (2002) 2.27
    2.2665296 = sum of:
      2.2665296 = product of:
        4.533059 = sum of:
          4.533059 = weight(author_txt:eastman in 5231) [ClassicSimilarity], result of:
            4.533059 = score(doc=5231,freq=1.0), product of:
              0.77996564 = queryWeight, product of:
                1.1163803 = boost
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.07513242 = queryNorm
              5.81187 = fieldWeight in 5231, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.625 = fieldNorm(doc=5231)
        0.5 = coord(1/2)
    
  3. Chang, Y.F.; Eastman, C.M.: ¬An information retrieval system for reusable software (1993) 1.81
    1.8132236 = sum of:
      1.8132236 = product of:
        3.6264472 = sum of:
          3.6264472 = weight(author_txt:eastman in 6348) [ClassicSimilarity], result of:
            3.6264472 = score(doc=6348,freq=1.0), product of:
              0.77996564 = queryWeight, product of:
                1.1163803 = boost
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.07513242 = queryNorm
              4.649496 = fieldWeight in 6348, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.5 = fieldNorm(doc=6348)
        0.5 = coord(1/2)
    
  4. Eastman, C.M.; Carter, R.M.: Anthropological perspectives on classification schemes (1994) 1.81
    1.8132236 = sum of:
      1.8132236 = product of:
        3.6264472 = sum of:
          3.6264472 = weight(author_txt:eastman in 8888) [ClassicSimilarity], result of:
            3.6264472 = score(doc=8888,freq=1.0), product of:
              0.77996564 = queryWeight, product of:
                1.1163803 = boost
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.07513242 = queryNorm
              4.649496 = fieldWeight in 8888, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.5 = fieldNorm(doc=8888)
        0.5 = coord(1/2)
    
  5. Rose, J.R.; Eastman, C.M.: Hierarchical classification as an aid to browsing (1994) 1.81
    1.8132236 = sum of:
      1.8132236 = product of:
        3.6264472 = sum of:
          3.6264472 = weight(author_txt:eastman in 8894) [ClassicSimilarity], result of:
            3.6264472 = score(doc=8894,freq=1.0), product of:
              0.77996564 = queryWeight, product of:
                1.1163803 = boost
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.07513242 = queryNorm
              4.649496 = fieldWeight in 8894, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.5 = fieldNorm(doc=8894)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Lee, Y.-H.; Evens, M.W.: Natural language interface for an expert system (1998) 0.19
    0.19251916 = sum of:
      0.19251916 = product of:
        0.6875684 = sum of:
          0.08610539 = weight(abstract_txt:fragments in 5108) [ClassicSimilarity], result of:
            0.08610539 = score(doc=5108,freq=1.0), product of:
              0.113203 = queryWeight, product of:
                1.0954496 = boost
                8.113368 = idf(docFreq=35, maxDocs=44218)
                0.0127369175 = queryNorm
              0.7606282 = fieldWeight in 5108, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.113368 = idf(docFreq=35, maxDocs=44218)
                0.09375 = fieldNorm(doc=5108)
          0.03335425 = weight(abstract_txt:language in 5108) [ClassicSimilarity], result of:
            0.03335425 = score(doc=5108,freq=2.0), product of:
              0.060155004 = queryWeight, product of:
                1.129313 = boost
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.0127369175 = queryNorm
              0.55447173 = fieldWeight in 5108, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.09375 = fieldNorm(doc=5108)
          0.028770123 = weight(abstract_txt:types in 5108) [ClassicSimilarity], result of:
            0.028770123 = score(doc=5108,freq=1.0), product of:
              0.0686766 = queryWeight, product of:
                1.2066542 = boost
                4.4684987 = idf(docFreq=1377, maxDocs=44218)
                0.0127369175 = queryNorm
              0.41892177 = fieldWeight in 5108, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4684987 = idf(docFreq=1377, maxDocs=44218)
                0.09375 = fieldNorm(doc=5108)
          0.05138282 = weight(abstract_txt:most in 5108) [ClassicSimilarity], result of:
            0.05138282 = score(doc=5108,freq=3.0), product of:
              0.08023853 = queryWeight, product of:
                1.5974069 = boost
                3.943693 = idf(docFreq=2328, maxDocs=44218)
                0.0127369175 = queryNorm
              0.6403759 = fieldWeight in 5108, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.943693 = idf(docFreq=2328, maxDocs=44218)
                0.09375 = fieldNorm(doc=5108)
          0.2044604 = weight(abstract_txt:spelling in 5108) [ClassicSimilarity], result of:
            0.2044604 = score(doc=5108,freq=2.0), product of:
              0.2014855 = queryWeight, product of:
                2.066809 = boost
                7.653836 = idf(docFreq=56, maxDocs=44218)
                0.0127369175 = queryNorm
              1.0147648 = fieldWeight in 5108, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.653836 = idf(docFreq=56, maxDocs=44218)
                0.09375 = fieldNorm(doc=5108)
          0.13587864 = weight(abstract_txt:errors in 5108) [ClassicSimilarity], result of:
            0.13587864 = score(doc=5108,freq=1.0), product of:
              0.2212987 = queryWeight, product of:
                2.652855 = boost
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.0127369175 = queryNorm
              0.61400557 = fieldWeight in 5108, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.09375 = fieldNorm(doc=5108)
          0.1476168 = weight(abstract_txt:input in 5108) [ClassicSimilarity], result of:
            0.1476168 = score(doc=5108,freq=1.0), product of:
              0.25740373 = queryWeight, product of:
                3.3037019 = boost
                6.1171575 = idf(docFreq=264, maxDocs=44218)
                0.0127369175 = queryNorm
              0.5734835 = fieldWeight in 5108, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1171575 = idf(docFreq=264, maxDocs=44218)
                0.09375 = fieldNorm(doc=5108)
        0.28 = coord(7/25)
    
  2. Drabenstott, K.M.; Weller, M.S.: Handling spelling errors in online catalog searches (1996) 0.12
    0.1213096 = sum of:
      0.1213096 = product of:
        0.606548 = sum of:
          0.019777257 = weight(abstract_txt:most in 5973) [ClassicSimilarity], result of:
            0.019777257 = score(doc=5973,freq=1.0), product of:
              0.08023853 = queryWeight, product of:
                1.5974069 = boost
                3.943693 = idf(docFreq=2328, maxDocs=44218)
                0.0127369175 = queryNorm
              0.24648081 = fieldWeight in 5973, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.943693 = idf(docFreq=2328, maxDocs=44218)
                0.0625 = fieldNorm(doc=5973)
          0.030056013 = weight(abstract_txt:were in 5973) [ClassicSimilarity], result of:
            0.030056013 = score(doc=5973,freq=2.0), product of:
              0.0926537 = queryWeight, product of:
                1.9820966 = boost
                3.6700637 = idf(docFreq=3061, maxDocs=44218)
                0.0127369175 = queryNorm
              0.32439086 = fieldWeight in 5973, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.6700637 = idf(docFreq=3061, maxDocs=44218)
                0.0625 = fieldNorm(doc=5973)
          0.2155202 = weight(abstract_txt:spelling in 5973) [ClassicSimilarity], result of:
            0.2155202 = score(doc=5973,freq=5.0), product of:
              0.2014855 = queryWeight, product of:
                2.066809 = boost
                7.653836 = idf(docFreq=56, maxDocs=44218)
                0.0127369175 = queryNorm
              1.0696561 = fieldWeight in 5973, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.653836 = idf(docFreq=56, maxDocs=44218)
                0.0625 = fieldNorm(doc=5973)
          0.1811715 = weight(abstract_txt:errors in 5973) [ClassicSimilarity], result of:
            0.1811715 = score(doc=5973,freq=4.0), product of:
              0.2212987 = queryWeight, product of:
                2.652855 = boost
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.0127369175 = queryNorm
              0.8186741 = fieldWeight in 5973, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.0625 = fieldNorm(doc=5973)
          0.16002305 = weight(abstract_txt:queries in 5973) [ClassicSimilarity], result of:
            0.16002305 = score(doc=5973,freq=5.0), product of:
              0.22422646 = queryWeight, product of:
                3.4474015 = boost
                5.106586 = idf(docFreq=727, maxDocs=44218)
                0.0127369175 = queryNorm
              0.7136671 = fieldWeight in 5973, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.106586 = idf(docFreq=727, maxDocs=44218)
                0.0625 = fieldNorm(doc=5973)
        0.2 = coord(5/25)
    
  3. Soricut, R.; Marcu, D.: Abstractive headline generation using WIDL-expressions (2007) 0.10
    0.10490558 = sum of:
      0.10490558 = product of:
        0.5245279 = sum of:
          0.034042038 = weight(abstract_txt:language in 943) [ClassicSimilarity], result of:
            0.034042038 = score(doc=943,freq=3.0), product of:
              0.060155004 = queryWeight, product of:
                1.129313 = boost
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.0127369175 = queryNorm
              0.56590533 = fieldWeight in 943, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.078125 = fieldNorm(doc=943)
          0.04250787 = weight(abstract_txt:document in 943) [ClassicSimilarity], result of:
            0.04250787 = score(doc=943,freq=4.0), product of:
              0.06337647 = queryWeight, product of:
                1.1591575 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.0127369175 = queryNorm
              0.67072004 = fieldWeight in 943, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.078125 = fieldNorm(doc=943)
          0.02472157 = weight(abstract_txt:most in 943) [ClassicSimilarity], result of:
            0.02472157 = score(doc=943,freq=1.0), product of:
              0.08023853 = queryWeight, product of:
                1.5974069 = boost
                3.943693 = idf(docFreq=2328, maxDocs=44218)
                0.0127369175 = queryNorm
              0.308101 = fieldWeight in 943, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.943693 = idf(docFreq=2328, maxDocs=44218)
                0.078125 = fieldNorm(doc=943)
          0.17396805 = weight(abstract_txt:input in 943) [ClassicSimilarity], result of:
            0.17396805 = score(doc=943,freq=2.0), product of:
              0.25740373 = queryWeight, product of:
                3.3037019 = boost
                6.1171575 = idf(docFreq=264, maxDocs=44218)
                0.0127369175 = queryNorm
              0.67585677 = fieldWeight in 943, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.1171575 = idf(docFreq=264, maxDocs=44218)
                0.078125 = fieldNorm(doc=943)
          0.24928837 = weight(abstract_txt:formed in 943) [ClassicSimilarity], result of:
            0.24928837 = score(doc=943,freq=1.0), product of:
              0.47185683 = queryWeight, product of:
                5.4782796 = boost
                6.7624135 = idf(docFreq=138, maxDocs=44218)
                0.0127369175 = queryNorm
              0.5283136 = fieldWeight in 943, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7624135 = idf(docFreq=138, maxDocs=44218)
                0.078125 = fieldNorm(doc=943)
        0.2 = coord(5/25)
    
  4. Spink, A.; Wolfram, D.; Jansen, B.J.; Saracevic, T.: Searching the Web : the public and their queries (2001) 0.09
    0.09086858 = sum of:
      0.09086858 = product of:
        0.37861907 = sum of:
          0.011792508 = weight(abstract_txt:language in 6980) [ClassicSimilarity], result of:
            0.011792508 = score(doc=6980,freq=1.0), product of:
              0.060155004 = queryWeight, product of:
                1.129313 = boost
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.0127369175 = queryNorm
              0.19603536 = fieldWeight in 6980, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.046875 = fieldNorm(doc=6980)
          0.020976948 = weight(abstract_txt:most in 6980) [ClassicSimilarity], result of:
            0.020976948 = score(doc=6980,freq=2.0), product of:
              0.08023853 = queryWeight, product of:
                1.5974069 = boost
                3.943693 = idf(docFreq=2328, maxDocs=44218)
                0.0127369175 = queryNorm
              0.26143235 = fieldWeight in 6980, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.943693 = idf(docFreq=2328, maxDocs=44218)
                0.046875 = fieldNorm(doc=6980)
          0.02760821 = weight(abstract_txt:were in 6980) [ClassicSimilarity], result of:
            0.02760821 = score(doc=6980,freq=3.0), product of:
              0.0926537 = queryWeight, product of:
                1.9820966 = boost
                3.6700637 = idf(docFreq=3061, maxDocs=44218)
                0.0127369175 = queryNorm
              0.29797202 = fieldWeight in 6980, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.6700637 = idf(docFreq=3061, maxDocs=44218)
                0.046875 = fieldNorm(doc=6980)
          0.07228767 = weight(abstract_txt:spelling in 6980) [ClassicSimilarity], result of:
            0.07228767 = score(doc=6980,freq=1.0), product of:
              0.2014855 = queryWeight, product of:
                2.066809 = boost
                7.653836 = idf(docFreq=56, maxDocs=44218)
                0.0127369175 = queryNorm
              0.35877356 = fieldWeight in 6980, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.653836 = idf(docFreq=56, maxDocs=44218)
                0.046875 = fieldNorm(doc=6980)
          0.06793932 = weight(abstract_txt:errors in 6980) [ClassicSimilarity], result of:
            0.06793932 = score(doc=6980,freq=1.0), product of:
              0.2212987 = queryWeight, product of:
                2.652855 = boost
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.0127369175 = queryNorm
              0.30700278 = fieldWeight in 6980, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.046875 = fieldNorm(doc=6980)
          0.17801441 = weight(abstract_txt:queries in 6980) [ClassicSimilarity], result of:
            0.17801441 = score(doc=6980,freq=11.0), product of:
              0.22422646 = queryWeight, product of:
                3.4474015 = boost
                5.106586 = idf(docFreq=727, maxDocs=44218)
                0.0127369175 = queryNorm
              0.79390454 = fieldWeight in 6980, product of:
                3.3166249 = tf(freq=11.0), with freq of:
                  11.0 = termFreq=11.0
                5.106586 = idf(docFreq=727, maxDocs=44218)
                0.046875 = fieldNorm(doc=6980)
        0.24 = coord(6/25)
    
  5. Dewdney, P.; Michell, G.: Oranges and peaches : understanding communication accidents in the reference interview (1996) 0.09
    0.086857215 = sum of:
      0.086857215 = product of:
        0.5428576 = sum of:
          0.104485616 = weight(abstract_txt:caused in 6423) [ClassicSimilarity], result of:
            0.104485616 = score(doc=6423,freq=1.0), product of:
              0.1622627 = queryWeight, product of:
                1.8547602 = boost
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.0127369175 = queryNorm
              0.64392877 = fieldWeight in 6423, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.09375 = fieldNorm(doc=6423)
          0.031879216 = weight(abstract_txt:were in 6423) [ClassicSimilarity], result of:
            0.031879216 = score(doc=6423,freq=1.0), product of:
              0.0926537 = queryWeight, product of:
                1.9820966 = boost
                3.6700637 = idf(docFreq=3061, maxDocs=44218)
                0.0127369175 = queryNorm
              0.34406847 = fieldWeight in 6423, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6700637 = idf(docFreq=3061, maxDocs=44218)
                0.09375 = fieldNorm(doc=6423)
          0.10734672 = weight(abstract_txt:queries in 6423) [ClassicSimilarity], result of:
            0.10734672 = score(doc=6423,freq=1.0), product of:
              0.22422646 = queryWeight, product of:
                3.4474015 = boost
                5.106586 = idf(docFreq=727, maxDocs=44218)
                0.0127369175 = queryNorm
              0.47874242 = fieldWeight in 6423, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.106586 = idf(docFreq=727, maxDocs=44218)
                0.09375 = fieldNorm(doc=6423)
          0.29914603 = weight(abstract_txt:formed in 6423) [ClassicSimilarity], result of:
            0.29914603 = score(doc=6423,freq=1.0), product of:
              0.47185683 = queryWeight, product of:
                5.4782796 = boost
                6.7624135 = idf(docFreq=138, maxDocs=44218)
                0.0127369175 = queryNorm
              0.6339763 = fieldWeight in 6423, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7624135 = idf(docFreq=138, maxDocs=44218)
                0.09375 = fieldNorm(doc=6423)
        0.16 = coord(4/25)