Document (#14020)

Author
Tagheva, K.
Borsack, J.
Condit, A.
Title
Effects of OCR errors on ranking and feedback using the vector space model
Source
Information processing and management. 32(1996) no.3, S.317-327
Year
1996
Abstract
Reports on the performance of the vector space model in the presence of optical character recognition (OCR) errors. Average precision and recall is not affected for full text document rankings of the OCR and corrected collections with different weithing combinations. Cosine normalization plays a considerable role in the disparity seen between the collections. Even though feedback improves retrieval for both collections, it can not be used to compensate for OCR errors caused by badly degraded documents

Similar documents (content)

  1. Taghva, K.: ¬The effects of noisy data on text retrieval (1994) 0.17
    0.16522053 = sum of:
      0.16522053 = product of:
        0.8261026 = sum of:
          0.07855571 = weight(abstract_txt:recognition in 7226) [ClassicSimilarity], result of:
            0.07855571 = score(doc=7226,freq=1.0), product of:
              0.11746776 = queryWeight, product of:
                6.114219 = idf(docFreq=266, maxDocs=44421)
                0.019212225 = queryNorm
              0.6687427 = fieldWeight in 7226, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.114219 = idf(docFreq=266, maxDocs=44421)
                0.109375 = fieldNorm(doc=7226)
          0.09174132 = weight(abstract_txt:presence in 7226) [ClassicSimilarity], result of:
            0.09174132 = score(doc=7226,freq=1.0), product of:
              0.13026974 = queryWeight, product of:
                1.0530826 = boost
                6.4387774 = idf(docFreq=192, maxDocs=44421)
                0.019212225 = queryNorm
              0.7042413 = fieldWeight in 7226, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.4387774 = idf(docFreq=192, maxDocs=44421)
                0.109375 = fieldNorm(doc=7226)
          0.10419523 = weight(abstract_txt:improves in 7226) [ClassicSimilarity], result of:
            0.10419523 = score(doc=7226,freq=1.0), product of:
              0.14180735 = queryWeight, product of:
                1.0987276 = boost
                6.717861 = idf(docFreq=145, maxDocs=44421)
                0.019212225 = queryNorm
              0.73476607 = fieldWeight in 7226, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.717861 = idf(docFreq=145, maxDocs=44421)
                0.109375 = fieldNorm(doc=7226)
          0.26211807 = weight(abstract_txt:degraded in 7226) [ClassicSimilarity], result of:
            0.26211807 = score(doc=7226,freq=1.0), product of:
              0.26229993 = queryWeight, product of:
                1.4943067 = boost
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.019212225 = queryNorm
              0.9993067 = fieldWeight in 7226, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.109375 = fieldNorm(doc=7226)
          0.28949228 = weight(abstract_txt:errors in 7226) [ClassicSimilarity], result of:
            0.28949228 = score(doc=7226,freq=1.0), product of:
              0.4042021 = queryWeight, product of:
                3.212925 = boost
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.019212225 = queryNorm
              0.7162067 = fieldWeight in 7226, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.109375 = fieldNorm(doc=7226)
        0.2 = coord(5/25)
    
  2. Li, D.; Kwong, C.-P.; Lee, D.L.: Unified linear subspace approach to semantic analysis (2009) 0.12
    0.12175375 = sum of:
      0.12175375 = product of:
        0.60876876 = sum of:
          0.03509307 = weight(abstract_txt:model in 308) [ClassicSimilarity], result of:
            0.03509307 = score(doc=308,freq=2.0), product of:
              0.099687286 = queryWeight, product of:
                1.3027934 = boost
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.019212225 = queryNorm
              0.35203153 = fieldWeight in 308, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.0625 = fieldNorm(doc=308)
          0.14978176 = weight(abstract_txt:degraded in 308) [ClassicSimilarity], result of:
            0.14978176 = score(doc=308,freq=1.0), product of:
              0.26229993 = queryWeight, product of:
                1.4943067 = boost
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.019212225 = queryNorm
              0.5710324 = fieldWeight in 308, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.0625 = fieldNorm(doc=308)
          0.17428987 = weight(abstract_txt:space in 308) [ClassicSimilarity], result of:
            0.17428987 = score(doc=308,freq=8.0), product of:
              0.18280454 = queryWeight, product of:
                1.7642053 = boost
                5.393369 = idf(docFreq=548, maxDocs=44421)
                0.019212225 = queryNorm
              0.95342195 = fieldWeight in 308, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                5.393369 = idf(docFreq=548, maxDocs=44421)
                0.0625 = fieldNorm(doc=308)
          0.1885328 = weight(abstract_txt:vector in 308) [ClassicSimilarity], result of:
            0.1885328 = score(doc=308,freq=3.0), product of:
              0.2671282 = queryWeight, product of:
                2.13263 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.019212225 = queryNorm
              0.70577645 = fieldWeight in 308, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.0625 = fieldNorm(doc=308)
          0.06107121 = weight(abstract_txt:collections in 308) [ClassicSimilarity], result of:
            0.06107121 = score(doc=308,freq=1.0), product of:
              0.20801292 = queryWeight, product of:
                2.3048701 = boost
                4.6974936 = idf(docFreq=1100, maxDocs=44421)
                0.019212225 = queryNorm
              0.29359335 = fieldWeight in 308, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6974936 = idf(docFreq=1100, maxDocs=44421)
                0.0625 = fieldNorm(doc=308)
        0.2 = coord(5/25)
    
  3. Taghva, K.; Borsack, J.; Condit, A.: Evaluation of model-based retrieval effectiveness with OCR text (1996) 0.12
    0.11630359 = sum of:
      0.11630359 = product of:
        0.7268975 = sum of:
          0.082062304 = weight(abstract_txt:affected in 4553) [ClassicSimilarity], result of:
            0.082062304 = score(doc=4553,freq=1.0), product of:
              0.13402748 = queryWeight, product of:
                1.0681632 = boost
                6.5309834 = idf(docFreq=175, maxDocs=44421)
                0.019212225 = queryNorm
              0.6122797 = fieldWeight in 4553, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5309834 = idf(docFreq=175, maxDocs=44421)
                0.09375 = fieldNorm(doc=4553)
          0.12344379 = weight(abstract_txt:feedback in 4553) [ClassicSimilarity], result of:
            0.12344379 = score(doc=4553,freq=1.0), product of:
              0.22169414 = queryWeight, product of:
                1.9428209 = boost
                5.9394164 = idf(docFreq=317, maxDocs=44421)
                0.019212225 = queryNorm
              0.5568203 = fieldWeight in 4553, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9394164 = idf(docFreq=317, maxDocs=44421)
                0.09375 = fieldNorm(doc=4553)
          0.09160682 = weight(abstract_txt:collections in 4553) [ClassicSimilarity], result of:
            0.09160682 = score(doc=4553,freq=1.0), product of:
              0.20801292 = queryWeight, product of:
                2.3048701 = boost
                4.6974936 = idf(docFreq=1100, maxDocs=44421)
                0.019212225 = queryNorm
              0.44039002 = fieldWeight in 4553, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6974936 = idf(docFreq=1100, maxDocs=44421)
                0.09375 = fieldNorm(doc=4553)
          0.42978454 = weight(abstract_txt:errors in 4553) [ClassicSimilarity], result of:
            0.42978454 = score(doc=4553,freq=3.0), product of:
              0.4042021 = queryWeight, product of:
                3.212925 = boost
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.019212225 = queryNorm
              1.0632912 = fieldWeight in 4553, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.09375 = fieldNorm(doc=4553)
        0.16 = coord(4/25)
    
  4. Alexander, M.: Retrieving digital data with fuzzy matching (1996) 0.12
    0.11528553 = sum of:
      0.11528553 = product of:
        0.72053456 = sum of:
          0.07855571 = weight(abstract_txt:recognition in 30) [ClassicSimilarity], result of:
            0.07855571 = score(doc=30,freq=1.0), product of:
              0.11746776 = queryWeight, product of:
                6.114219 = idf(docFreq=266, maxDocs=44421)
                0.019212225 = queryNorm
              0.6687427 = fieldWeight in 30, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.114219 = idf(docFreq=266, maxDocs=44421)
                0.109375 = fieldNorm(doc=30)
          0.24465019 = weight(abstract_txt:compensate in 30) [ClassicSimilarity], result of:
            0.24465019 = score(doc=30,freq=1.0), product of:
              0.25051317 = queryWeight, product of:
                1.4603466 = boost
                8.928879 = idf(docFreq=15, maxDocs=44421)
                0.019212225 = queryNorm
              0.9765961 = fieldWeight in 30, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.928879 = idf(docFreq=15, maxDocs=44421)
                0.109375 = fieldNorm(doc=30)
          0.10783636 = weight(abstract_txt:space in 30) [ClassicSimilarity], result of:
            0.10783636 = score(doc=30,freq=1.0), product of:
              0.18280454 = queryWeight, product of:
                1.7642053 = boost
                5.393369 = idf(docFreq=548, maxDocs=44421)
                0.019212225 = queryNorm
              0.5898998 = fieldWeight in 30, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.393369 = idf(docFreq=548, maxDocs=44421)
                0.109375 = fieldNorm(doc=30)
          0.28949228 = weight(abstract_txt:errors in 30) [ClassicSimilarity], result of:
            0.28949228 = score(doc=30,freq=1.0), product of:
              0.4042021 = queryWeight, product of:
                3.212925 = boost
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.019212225 = queryNorm
              0.7162067 = fieldWeight in 30, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.109375 = fieldNorm(doc=30)
        0.16 = coord(4/25)
    
  5. López-Pujalte, C.; Guerrero-Bote, V.P.; Moya-Anegón, F. de: Genetic algorithms in relevance feedback : a second test and new contributions (2003) 0.11
    0.114905536 = sum of:
      0.114905536 = product of:
        0.5745277 = sum of:
          0.052639604 = weight(abstract_txt:model in 2076) [ClassicSimilarity], result of:
            0.052639604 = score(doc=2076,freq=2.0), product of:
              0.099687286 = queryWeight, product of:
                1.3027934 = boost
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.019212225 = queryNorm
              0.5280473 = fieldWeight in 2076, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.09375 = fieldNorm(doc=2076)
          0.09243116 = weight(abstract_txt:space in 2076) [ClassicSimilarity], result of:
            0.09243116 = score(doc=2076,freq=1.0), product of:
              0.18280454 = queryWeight, product of:
                1.7642053 = boost
                5.393369 = idf(docFreq=548, maxDocs=44421)
                0.019212225 = queryNorm
              0.50562835 = fieldWeight in 2076, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.393369 = idf(docFreq=548, maxDocs=44421)
                0.09375 = fieldNorm(doc=2076)
          0.17457588 = weight(abstract_txt:feedback in 2076) [ClassicSimilarity], result of:
            0.17457588 = score(doc=2076,freq=2.0), product of:
              0.22169414 = queryWeight, product of:
                1.9428209 = boost
                5.9394164 = idf(docFreq=317, maxDocs=44421)
                0.019212225 = queryNorm
              0.7874628 = fieldWeight in 2076, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.9394164 = idf(docFreq=317, maxDocs=44421)
                0.09375 = fieldNorm(doc=2076)
          0.1632742 = weight(abstract_txt:vector in 2076) [ClassicSimilarity], result of:
            0.1632742 = score(doc=2076,freq=1.0), product of:
              0.2671282 = queryWeight, product of:
                2.13263 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.019212225 = queryNorm
              0.61122036 = fieldWeight in 2076, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.09375 = fieldNorm(doc=2076)
          0.09160682 = weight(abstract_txt:collections in 2076) [ClassicSimilarity], result of:
            0.09160682 = score(doc=2076,freq=1.0), product of:
              0.20801292 = queryWeight, product of:
                2.3048701 = boost
                4.6974936 = idf(docFreq=1100, maxDocs=44421)
                0.019212225 = queryNorm
              0.44039002 = fieldWeight in 2076, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6974936 = idf(docFreq=1100, maxDocs=44421)
                0.09375 = fieldNorm(doc=2076)
        0.2 = coord(5/25)