Document (#14021)

Author
Tagheva, K.
Borsack, J.
Condit, A.
Title
Effects of OCR errors on ranking and feedback using the vector space model
Source
Information processing and management. 32(1996) no.3, S.317-327
Year
1996
Abstract
Reports on the performance of the vector space model in the presence of optical character recognition (OCR) errors. Average precision and recall is not affected for full text document rankings of the OCR and corrected collections with different weithing combinations. Cosine normalization plays a considerable role in the disparity seen between the collections. Even though feedback improves retrieval for both collections, it can not be used to compensate for OCR errors caused by badly degraded documents

Similar documents (content)

  1. Taghva, K.: ¬The effects of noisy data on text retrieval (1994) 0.17
    0.16535042 = sum of:
      0.16535042 = product of:
        0.8267521 = sum of:
          0.07881893 = weight(abstract_txt:recognition in 7227) [ClassicSimilarity], result of:
            0.07881893 = score(doc=7227,freq=1.0), product of:
              0.11773198 = queryWeight, product of:
                6.1209383 = idf(docFreq=263, maxDocs=44218)
                0.019234303 = queryNorm
              0.66947764 = fieldWeight in 7227, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1209383 = idf(docFreq=263, maxDocs=44218)
                0.109375 = fieldNorm(doc=7227)
          0.092220604 = weight(abstract_txt:presence in 7227) [ClassicSimilarity], result of:
            0.092220604 = score(doc=7227,freq=1.0), product of:
              0.13072523 = queryWeight, product of:
                1.0537376 = boost
                6.449863 = idf(docFreq=189, maxDocs=44218)
                0.019234303 = queryNorm
              0.70545375 = fieldWeight in 7227, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.449863 = idf(docFreq=189, maxDocs=44218)
                0.109375 = fieldNorm(doc=7227)
          0.10430715 = weight(abstract_txt:improves in 7227) [ClassicSimilarity], result of:
            0.10430715 = score(doc=7227,freq=1.0), product of:
              0.14191125 = queryWeight, product of:
                1.097896 = boost
                6.7201533 = idf(docFreq=144, maxDocs=44218)
                0.019234303 = queryNorm
              0.73501676 = fieldWeight in 7227, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7201533 = idf(docFreq=144, maxDocs=44218)
                0.109375 = fieldNorm(doc=7227)
          0.26173717 = weight(abstract_txt:degraded in 7227) [ClassicSimilarity], result of:
            0.26173717 = score(doc=7227,freq=1.0), product of:
              0.26205012 = queryWeight, product of:
                1.491918 = boost
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.019234303 = queryNorm
              0.9988057 = fieldWeight in 7227, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.109375 = fieldNorm(doc=7227)
          0.28966826 = weight(abstract_txt:errors in 7227) [ClassicSimilarity], result of:
            0.28966826 = score(doc=7227,freq=1.0), product of:
              0.4043727 = queryWeight, product of:
                3.2099946 = boost
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.019234303 = queryNorm
              0.7163398 = fieldWeight in 7227, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.109375 = fieldNorm(doc=7227)
        0.2 = coord(5/25)
    
  2. Li, D.; Kwong, C.-P.; Lee, D.L.: Unified linear subspace approach to semantic analysis (2009) 0.12
    0.1217047 = sum of:
      0.1217047 = product of:
        0.6085235 = sum of:
          0.035186157 = weight(abstract_txt:model in 3321) [ClassicSimilarity], result of:
            0.035186157 = score(doc=3321,freq=2.0), product of:
              0.09986517 = queryWeight, product of:
                1.3024912 = boost
                3.986234 = idf(docFreq=2231, maxDocs=44218)
                0.019234303 = queryNorm
              0.35233662 = fieldWeight in 3321, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.986234 = idf(docFreq=2231, maxDocs=44218)
                0.0625 = fieldNorm(doc=3321)
          0.14956409 = weight(abstract_txt:degraded in 3321) [ClassicSimilarity], result of:
            0.14956409 = score(doc=3321,freq=1.0), product of:
              0.26205012 = queryWeight, product of:
                1.491918 = boost
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.019234303 = queryNorm
              0.5707461 = fieldWeight in 3321, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.0625 = fieldNorm(doc=3321)
          0.17420839 = weight(abstract_txt:space in 3321) [ClassicSimilarity], result of:
            0.17420839 = score(doc=3321,freq=8.0), product of:
              0.18275061 = queryWeight, product of:
                1.7619647 = boost
                5.3924384 = idf(docFreq=546, maxDocs=44218)
                0.019234303 = queryNorm
              0.95325744 = fieldWeight in 3321, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                5.3924384 = idf(docFreq=546, maxDocs=44218)
                0.0625 = fieldNorm(doc=3321)
          0.1886337 = weight(abstract_txt:vector in 3321) [ClassicSimilarity], result of:
            0.1886337 = score(doc=3321,freq=3.0), product of:
              0.26722798 = queryWeight, product of:
                2.1306334 = boost
                6.5207376 = idf(docFreq=176, maxDocs=44218)
                0.019234303 = queryNorm
              0.70589054 = fieldWeight in 3321, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.5207376 = idf(docFreq=176, maxDocs=44218)
                0.0625 = fieldNorm(doc=3321)
          0.060931176 = weight(abstract_txt:collections in 3321) [ClassicSimilarity], result of:
            0.060931176 = score(doc=3321,freq=1.0), product of:
              0.2076983 = queryWeight, product of:
                2.3005404 = boost
                4.693822 = idf(docFreq=1099, maxDocs=44218)
                0.019234303 = queryNorm
              0.29336387 = fieldWeight in 3321, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.693822 = idf(docFreq=1099, maxDocs=44218)
                0.0625 = fieldNorm(doc=3321)
        0.2 = coord(5/25)
    
  3. Taghva, K.; Borsack, J.; Condit, A.: Evaluation of model-based retrieval effectiveness with OCR text (1996) 0.12
    0.11636909 = sum of:
      0.11636909 = product of:
        0.72730684 = sum of:
          0.082108565 = weight(abstract_txt:affected in 4485) [ClassicSimilarity], result of:
            0.082108565 = score(doc=4485,freq=1.0), product of:
              0.13408008 = queryWeight, product of:
                1.0671731 = boost
                6.532101 = idf(docFreq=174, maxDocs=44218)
                0.019234303 = queryNorm
              0.6123845 = fieldWeight in 4485, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.532101 = idf(docFreq=174, maxDocs=44218)
                0.09375 = fieldNorm(doc=4485)
          0.123755656 = weight(abstract_txt:feedback in 4485) [ClassicSimilarity], result of:
            0.123755656 = score(doc=4485,freq=1.0), product of:
              0.22207108 = queryWeight, product of:
                1.9422886 = boost
                5.9443145 = idf(docFreq=314, maxDocs=44218)
                0.019234303 = queryNorm
              0.55727947 = fieldWeight in 4485, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9443145 = idf(docFreq=314, maxDocs=44218)
                0.09375 = fieldNorm(doc=4485)
          0.091396764 = weight(abstract_txt:collections in 4485) [ClassicSimilarity], result of:
            0.091396764 = score(doc=4485,freq=1.0), product of:
              0.2076983 = queryWeight, product of:
                2.3005404 = boost
                4.693822 = idf(docFreq=1099, maxDocs=44218)
                0.019234303 = queryNorm
              0.4400458 = fieldWeight in 4485, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.693822 = idf(docFreq=1099, maxDocs=44218)
                0.09375 = fieldNorm(doc=4485)
          0.43004584 = weight(abstract_txt:errors in 4485) [ClassicSimilarity], result of:
            0.43004584 = score(doc=4485,freq=3.0), product of:
              0.4043727 = queryWeight, product of:
                3.2099946 = boost
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.019234303 = queryNorm
              1.0634888 = fieldWeight in 4485, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.09375 = fieldNorm(doc=4485)
        0.16 = coord(4/25)
    
  4. Alexander, M.: Retrieving digital data with fuzzy matching (1996) 0.12
    0.11528947 = sum of:
      0.11528947 = product of:
        0.72055924 = sum of:
          0.07881893 = weight(abstract_txt:recognition in 6961) [ClassicSimilarity], result of:
            0.07881893 = score(doc=6961,freq=1.0), product of:
              0.11773198 = queryWeight, product of:
                6.1209383 = idf(docFreq=263, maxDocs=44218)
                0.019234303 = queryNorm
              0.66947764 = fieldWeight in 6961, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1209383 = idf(docFreq=263, maxDocs=44218)
                0.109375 = fieldNorm(doc=6961)
          0.24428612 = weight(abstract_txt:compensate in 6961) [ClassicSimilarity], result of:
            0.24428612 = score(doc=6961,freq=1.0), product of:
              0.25026876 = queryWeight, product of:
                1.4579952 = boost
                8.924298 = idf(docFreq=15, maxDocs=44218)
                0.019234303 = queryNorm
              0.97609514 = fieldWeight in 6961, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.924298 = idf(docFreq=15, maxDocs=44218)
                0.109375 = fieldNorm(doc=6961)
          0.10778594 = weight(abstract_txt:space in 6961) [ClassicSimilarity], result of:
            0.10778594 = score(doc=6961,freq=1.0), product of:
              0.18275061 = queryWeight, product of:
                1.7619647 = boost
                5.3924384 = idf(docFreq=546, maxDocs=44218)
                0.019234303 = queryNorm
              0.589798 = fieldWeight in 6961, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3924384 = idf(docFreq=546, maxDocs=44218)
                0.109375 = fieldNorm(doc=6961)
          0.28966826 = weight(abstract_txt:errors in 6961) [ClassicSimilarity], result of:
            0.28966826 = score(doc=6961,freq=1.0), product of:
              0.4043727 = queryWeight, product of:
                3.2099946 = boost
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.019234303 = queryNorm
              0.7163398 = fieldWeight in 6961, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.109375 = fieldNorm(doc=6961)
        0.16 = coord(4/25)
    
  5. López-Pujalte, C.; Guerrero-Bote, V.P.; Moya-Anegón, F. de: Genetic algorithms in relevance feedback : a second test and new contributions (2003) 0.11
    0.1149885 = sum of:
      0.1149885 = product of:
        0.57494247 = sum of:
          0.05277923 = weight(abstract_txt:model in 1076) [ClassicSimilarity], result of:
            0.05277923 = score(doc=1076,freq=2.0), product of:
              0.09986517 = queryWeight, product of:
                1.3024912 = boost
                3.986234 = idf(docFreq=2231, maxDocs=44218)
                0.019234303 = queryNorm
              0.5285049 = fieldWeight in 1076, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.986234 = idf(docFreq=2231, maxDocs=44218)
                0.09375 = fieldNorm(doc=1076)
          0.092387944 = weight(abstract_txt:space in 1076) [ClassicSimilarity], result of:
            0.092387944 = score(doc=1076,freq=1.0), product of:
              0.18275061 = queryWeight, product of:
                1.7619647 = boost
                5.3924384 = idf(docFreq=546, maxDocs=44218)
                0.019234303 = queryNorm
              0.5055411 = fieldWeight in 1076, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3924384 = idf(docFreq=546, maxDocs=44218)
                0.09375 = fieldNorm(doc=1076)
          0.17501694 = weight(abstract_txt:feedback in 1076) [ClassicSimilarity], result of:
            0.17501694 = score(doc=1076,freq=2.0), product of:
              0.22207108 = queryWeight, product of:
                1.9422886 = boost
                5.9443145 = idf(docFreq=314, maxDocs=44218)
                0.019234303 = queryNorm
              0.7881122 = fieldWeight in 1076, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.9443145 = idf(docFreq=314, maxDocs=44218)
                0.09375 = fieldNorm(doc=1076)
          0.1633616 = weight(abstract_txt:vector in 1076) [ClassicSimilarity], result of:
            0.1633616 = score(doc=1076,freq=1.0), product of:
              0.26722798 = queryWeight, product of:
                2.1306334 = boost
                6.5207376 = idf(docFreq=176, maxDocs=44218)
                0.019234303 = queryNorm
              0.6113192 = fieldWeight in 1076, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5207376 = idf(docFreq=176, maxDocs=44218)
                0.09375 = fieldNorm(doc=1076)
          0.091396764 = weight(abstract_txt:collections in 1076) [ClassicSimilarity], result of:
            0.091396764 = score(doc=1076,freq=1.0), product of:
              0.2076983 = queryWeight, product of:
                2.3005404 = boost
                4.693822 = idf(docFreq=1099, maxDocs=44218)
                0.019234303 = queryNorm
              0.4400458 = fieldWeight in 1076, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.693822 = idf(docFreq=1099, maxDocs=44218)
                0.09375 = fieldNorm(doc=1076)
        0.2 = coord(5/25)