Document (#18028)

Author
Leppanen, E.
Title
Homografiongelma tekstihaussa ja homografien disambiguoinnin vaikutukset
Source
Informaatiotutkimus. 15(1996) no.4, S.133-144
Year
1996
Abstract
Homonymy is known to often cause false drops in free text searching in a full text database. The problem is quite common and difficult to avoid in Finnish, but nobody has examined it before. Reports on a study that examined the frequency of, and solutions to, the homonymy problem, based on searches made in a Finnish full text database containing about 55.000 newspaper articles. The results indicate that homonymy is not a very serious problem in full text searching, with only about 1 search result set out of 4 containing false drops caused by homonymy. Several other reasons for nonrelevance were much more common. However, in some set results there were a considerable number of homonymy errors, so the number seems to be very random. A study was also made into whether homonyms can be disambiguated by syntactic analysis. The result was that 75,2% of homonyms were disambiguated by this method. Verb homonyms were considerably easier to disambiguate than substantives. Although homonymy is not a very big problem it could perhaps easily be eliminated if there was a suitable syntactic analyzer in the IR system
Footnote
Übers. d. Titels: The homonymy problem in free text searching and the results of homonymy disambiguation
Theme
Volltextretrieval
Retrievalstudien

Similar documents (content)

  1. Gillaspie, L.: ¬The role of linguistic phenomena in retrieval performance (1995) 0.17
    0.16674577 = sum of:
      0.16674577 = product of:
        0.8337288 = sum of:
          0.037650455 = weight(abstract_txt:number in 3929) [ClassicSimilarity], result of:
            0.037650455 = score(doc=3929,freq=1.0), product of:
              0.072830595 = queryWeight, product of:
                1.015373 = boost
                4.1356745 = idf(docFreq=1930, maxDocs=44421)
                0.017343706 = queryNorm
              0.5169593 = fieldWeight in 3929, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1356745 = idf(docFreq=1930, maxDocs=44421)
                0.125 = fieldNorm(doc=3929)
          0.095374145 = weight(abstract_txt:full in 3929) [ClassicSimilarity], result of:
            0.095374145 = score(doc=3929,freq=1.0), product of:
              0.1549237 = queryWeight, product of:
                1.8137325 = boost
                4.9249606 = idf(docFreq=876, maxDocs=44421)
                0.017343706 = queryNorm
              0.6156201 = fieldWeight in 3929, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9249606 = idf(docFreq=876, maxDocs=44421)
                0.125 = fieldNorm(doc=3929)
          0.2343109 = weight(abstract_txt:false in 3929) [ClassicSimilarity], result of:
            0.2343109 = score(doc=3929,freq=1.0), product of:
              0.2464121 = queryWeight, product of:
                1.867668 = boost
                7.607123 = idf(docFreq=59, maxDocs=44421)
                0.017343706 = queryNorm
              0.95089036 = fieldWeight in 3929, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.607123 = idf(docFreq=59, maxDocs=44421)
                0.125 = fieldNorm(doc=3929)
          0.07024084 = weight(abstract_txt:text in 3929) [ClassicSimilarity], result of:
            0.07024084 = score(doc=3929,freq=1.0), product of:
              0.13906041 = queryWeight, product of:
                1.9841999 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.017343706 = queryNorm
              0.50511026 = fieldWeight in 3929, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.125 = fieldNorm(doc=3929)
          0.39615244 = weight(abstract_txt:drops in 3929) [ClassicSimilarity], result of:
            0.39615244 = score(doc=3929,freq=1.0), product of:
              0.34971043 = queryWeight, product of:
                2.224964 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.017343706 = queryNorm
              1.1328013 = fieldWeight in 3929, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.125 = fieldNorm(doc=3929)
        0.2 = coord(5/25)
    
  2. Vaughan, L.; Shaw , D.: Bibliographic and Web citations : what Is the difference? (2003) 0.12
    0.1182575 = sum of:
      0.1182575 = product of:
        0.4223482 = sum of:
          0.015980124 = weight(abstract_txt:there in 176) [ClassicSimilarity], result of:
            0.015980124 = score(doc=176,freq=1.0), product of:
              0.07137319 = queryWeight, product of:
                1.0051624 = boost
                4.094086 = idf(docFreq=2012, maxDocs=44421)
                0.017343706 = queryNorm
              0.22389534 = fieldWeight in 176, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.094086 = idf(docFreq=2012, maxDocs=44421)
                0.0546875 = fieldNorm(doc=176)
          0.016472075 = weight(abstract_txt:number in 176) [ClassicSimilarity], result of:
            0.016472075 = score(doc=176,freq=1.0), product of:
              0.072830595 = queryWeight, product of:
                1.015373 = boost
                4.1356745 = idf(docFreq=1930, maxDocs=44421)
                0.017343706 = queryNorm
              0.2261697 = fieldWeight in 176, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1356745 = idf(docFreq=1930, maxDocs=44421)
                0.0546875 = fieldNorm(doc=176)
          0.032549288 = weight(abstract_txt:examined in 176) [ClassicSimilarity], result of:
            0.032549288 = score(doc=176,freq=1.0), product of:
              0.114685714 = queryWeight, product of:
                1.2741581 = boost
                5.189722 = idf(docFreq=672, maxDocs=44421)
                0.017343706 = queryNorm
              0.28381294 = fieldWeight in 176, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.189722 = idf(docFreq=672, maxDocs=44421)
                0.0546875 = fieldNorm(doc=176)
          0.039792825 = weight(abstract_txt:were in 176) [ClassicSimilarity], result of:
            0.039792825 = score(doc=176,freq=3.0), product of:
              0.114548065 = queryWeight, product of:
                1.80085 = boost
                3.6674848 = idf(docFreq=3083, maxDocs=44421)
                0.017343706 = queryNorm
              0.34738976 = fieldWeight in 176, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.6674848 = idf(docFreq=3083, maxDocs=44421)
                0.0546875 = fieldNorm(doc=176)
          0.041726187 = weight(abstract_txt:full in 176) [ClassicSimilarity], result of:
            0.041726187 = score(doc=176,freq=1.0), product of:
              0.1549237 = queryWeight, product of:
                1.8137325 = boost
                4.9249606 = idf(docFreq=876, maxDocs=44421)
                0.017343706 = queryNorm
              0.26933378 = fieldWeight in 176, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9249606 = idf(docFreq=876, maxDocs=44421)
                0.0546875 = fieldNorm(doc=176)
          0.10251101 = weight(abstract_txt:false in 176) [ClassicSimilarity], result of:
            0.10251101 = score(doc=176,freq=1.0), product of:
              0.2464121 = queryWeight, product of:
                1.867668 = boost
                7.607123 = idf(docFreq=59, maxDocs=44421)
                0.017343706 = queryNorm
              0.41601452 = fieldWeight in 176, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.607123 = idf(docFreq=59, maxDocs=44421)
                0.0546875 = fieldNorm(doc=176)
          0.17331669 = weight(abstract_txt:drops in 176) [ClassicSimilarity], result of:
            0.17331669 = score(doc=176,freq=1.0), product of:
              0.34971043 = queryWeight, product of:
                2.224964 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.017343706 = queryNorm
              0.49560058 = fieldWeight in 176, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.0546875 = fieldNorm(doc=176)
        0.28 = coord(7/25)
    
  3. Shuman, B.A.: One false drop deserves another : file selection as a means of increasing precision in online searches (1992) 0.12
    0.11814647 = sum of:
      0.11814647 = product of:
        0.59073234 = sum of:
          0.037048582 = weight(abstract_txt:searching in 4030) [ClassicSimilarity], result of:
            0.037048582 = score(doc=4030,freq=2.0), product of:
              0.07823206 = queryWeight, product of:
                1.0523521 = boost
                4.2862926 = idf(docFreq=1660, maxDocs=44421)
                0.017343706 = queryNorm
              0.47357288 = fieldWeight in 4030, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2862926 = idf(docFreq=1660, maxDocs=44421)
                0.078125 = fieldNorm(doc=4030)
          0.036900297 = weight(abstract_txt:common in 4030) [ClassicSimilarity], result of:
            0.036900297 = score(doc=4030,freq=1.0), product of:
              0.098303035 = queryWeight, product of:
                1.1796472 = boost
                4.8047733 = idf(docFreq=988, maxDocs=44421)
                0.017343706 = queryNorm
              0.37537292 = fieldWeight in 4030, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8047733 = idf(docFreq=988, maxDocs=44421)
                0.078125 = fieldNorm(doc=4030)
          0.2071035 = weight(abstract_txt:false in 4030) [ClassicSimilarity], result of:
            0.2071035 = score(doc=4030,freq=2.0), product of:
              0.2464121 = queryWeight, product of:
                1.867668 = boost
                7.607123 = idf(docFreq=59, maxDocs=44421)
                0.017343706 = queryNorm
              0.8404762 = fieldWeight in 4030, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.607123 = idf(docFreq=59, maxDocs=44421)
                0.078125 = fieldNorm(doc=4030)
          0.062084716 = weight(abstract_txt:text in 4030) [ClassicSimilarity], result of:
            0.062084716 = score(doc=4030,freq=2.0), product of:
              0.13906041 = queryWeight, product of:
                1.9841999 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.017343706 = queryNorm
              0.4464586 = fieldWeight in 4030, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.078125 = fieldNorm(doc=4030)
          0.24759527 = weight(abstract_txt:drops in 4030) [ClassicSimilarity], result of:
            0.24759527 = score(doc=4030,freq=1.0), product of:
              0.34971043 = queryWeight, product of:
                2.224964 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.017343706 = queryNorm
              0.7080008 = fieldWeight in 4030, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.078125 = fieldNorm(doc=4030)
        0.2 = coord(5/25)
    
  4. McBride, J.L.: Faceted subject access for music through USMARC : a case for linked fields (2000) 0.11
    0.10879903 = sum of:
      0.10879903 = product of:
        0.54399514 = sum of:
          0.023531532 = weight(abstract_txt:number in 403) [ClassicSimilarity], result of:
            0.023531532 = score(doc=403,freq=1.0), product of:
              0.072830595 = queryWeight, product of:
                1.015373 = boost
                4.1356745 = idf(docFreq=1930, maxDocs=44421)
                0.017343706 = queryNorm
              0.32309955 = fieldWeight in 403, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1356745 = idf(docFreq=1930, maxDocs=44421)
                0.078125 = fieldNorm(doc=403)
          0.043994367 = weight(abstract_txt:result in 403) [ClassicSimilarity], result of:
            0.043994367 = score(doc=403,freq=1.0), product of:
              0.11052955 = queryWeight, product of:
                1.2508576 = boost
                5.0948176 = idf(docFreq=739, maxDocs=44421)
                0.017343706 = queryNorm
              0.39803264 = fieldWeight in 403, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0948176 = idf(docFreq=739, maxDocs=44421)
                0.078125 = fieldNorm(doc=403)
          0.08242966 = weight(abstract_txt:containing in 403) [ClassicSimilarity], result of:
            0.08242966 = score(doc=403,freq=1.0), product of:
              0.16798456 = queryWeight, product of:
                1.5420674 = boost
                6.2809324 = idf(docFreq=225, maxDocs=44421)
                0.017343706 = queryNorm
              0.49069786 = fieldWeight in 403, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2809324 = idf(docFreq=225, maxDocs=44421)
                0.078125 = fieldNorm(doc=403)
          0.1464443 = weight(abstract_txt:false in 403) [ClassicSimilarity], result of:
            0.1464443 = score(doc=403,freq=1.0), product of:
              0.2464121 = queryWeight, product of:
                1.867668 = boost
                7.607123 = idf(docFreq=59, maxDocs=44421)
                0.017343706 = queryNorm
              0.59430647 = fieldWeight in 403, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.607123 = idf(docFreq=59, maxDocs=44421)
                0.078125 = fieldNorm(doc=403)
          0.24759527 = weight(abstract_txt:drops in 403) [ClassicSimilarity], result of:
            0.24759527 = score(doc=403,freq=1.0), product of:
              0.34971043 = queryWeight, product of:
                2.224964 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.017343706 = queryNorm
              0.7080008 = fieldWeight in 403, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.078125 = fieldNorm(doc=403)
        0.2 = coord(5/25)
    
  5. Ahlgren, P.; Kekäläinen, J.: Indexing strategies for Swedish full text retrieval under different user scenarios (2007) 0.10
    0.10114395 = sum of:
      0.10114395 = product of:
        0.36122838 = sum of:
          0.029539296 = weight(abstract_txt:database in 1896) [ClassicSimilarity], result of:
            0.029539296 = score(doc=1896,freq=2.0), product of:
              0.07805675 = queryWeight, product of:
                1.0511724 = boost
                4.2814875 = idf(docFreq=1668, maxDocs=44421)
                0.017343706 = queryNorm
              0.3784336 = fieldWeight in 1896, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2814875 = idf(docFreq=1668, maxDocs=44421)
                0.0625 = fieldNorm(doc=1896)
          0.025437145 = weight(abstract_txt:made in 1896) [ClassicSimilarity], result of:
            0.025437145 = score(doc=1896,freq=1.0), product of:
              0.089015566 = queryWeight, product of:
                1.1225395 = boost
                4.5721703 = idf(docFreq=1247, maxDocs=44421)
                0.017343706 = queryNorm
              0.28576064 = fieldWeight in 1896, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5721703 = idf(docFreq=1247, maxDocs=44421)
                0.0625 = fieldNorm(doc=1896)
          0.10417845 = weight(abstract_txt:analyzer in 1896) [ClassicSimilarity], result of:
            0.10417845 = score(doc=1896,freq=1.0), product of:
              0.18085435 = queryWeight, product of:
                1.1314051 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.017343706 = queryNorm
              0.5760351 = fieldWeight in 1896, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.0625 = fieldNorm(doc=1896)
          0.05251291 = weight(abstract_txt:were in 1896) [ClassicSimilarity], result of:
            0.05251291 = score(doc=1896,freq=4.0), product of:
              0.114548065 = queryWeight, product of:
                1.80085 = boost
                3.6674848 = idf(docFreq=3083, maxDocs=44421)
                0.017343706 = queryNorm
              0.4584356 = fieldWeight in 1896, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.6674848 = idf(docFreq=3083, maxDocs=44421)
                0.0625 = fieldNorm(doc=1896)
          0.047687072 = weight(abstract_txt:full in 1896) [ClassicSimilarity], result of:
            0.047687072 = score(doc=1896,freq=1.0), product of:
              0.1549237 = queryWeight, product of:
                1.8137325 = boost
                4.9249606 = idf(docFreq=876, maxDocs=44421)
                0.017343706 = queryNorm
              0.30781004 = fieldWeight in 1896, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9249606 = idf(docFreq=876, maxDocs=44421)
                0.0625 = fieldNorm(doc=1896)
          0.03512042 = weight(abstract_txt:text in 1896) [ClassicSimilarity], result of:
            0.03512042 = score(doc=1896,freq=1.0), product of:
              0.13906041 = queryWeight, product of:
                1.9841999 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.017343706 = queryNorm
              0.25255513 = fieldWeight in 1896, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0625 = fieldNorm(doc=1896)
          0.06675306 = weight(abstract_txt:problem in 1896) [ClassicSimilarity], result of:
            0.06675306 = score(doc=1896,freq=2.0), product of:
              0.16935623 = queryWeight, product of:
                2.1896982 = boost
                4.4593854 = idf(docFreq=1396, maxDocs=44421)
                0.017343706 = queryNorm
              0.3941577 = fieldWeight in 1896, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4593854 = idf(docFreq=1396, maxDocs=44421)
                0.0625 = fieldNorm(doc=1896)
        0.28 = coord(7/25)