Document (#29534)

Title
¬An introduction to information retrieval
Source
http://www.xapian.org/docs/intro_ir.html
Year
o.J.
Abstract
In the beginning IR was dominated by Boolean retrieval, described in the next section. This could be called the antediluvian period, or generation zero. The first generation of IR research dates from the early sixties, and was dominated by model building, experimentation, and heuristics. The big names were Gerry Salton and Karen Sparck Jones. The second period, which began in the mid-seventies, saw a big shift towards mathematics, and a rise of the IR model based upon probability theory - probabilistic IR. The big name here was, and continues to be, Stephen Robertson. More recently Keith van Rijsbergen has led a group that has developed underlying logical models of IR, but interesting as this new work is, it has not as yet led to results that offer improvements for the IR system builder. Xapian is firmly placed as a system that implements, or tries to implement, the probabilistic IR model. (We say 'tries' because sometimes implementation efficiency and theoretical complexity demand certain short-cuts.)
Theme
Retrievalalgorithmen
Object
Xapian

Similar documents (content)

  1. Robertson, S.E.: OKAPI at TREC-1 (1994) 0.17
    0.17182021 = sum of:
      0.17182021 = product of:
        0.859101 = sum of:
          0.027534109 = weight(abstract_txt:retrieval in 7952) [ClassicSimilarity], result of:
            0.027534109 = score(doc=7952,freq=1.0), product of:
              0.06336053 = queryWeight, product of:
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.018225377 = queryNorm
              0.4345625 = fieldWeight in 7952, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.125 = fieldNorm(doc=7952)
          0.15857777 = weight(abstract_txt:jones in 7952) [ClassicSimilarity], result of:
            0.15857777 = score(doc=7952,freq=1.0), product of:
              0.16158076 = queryWeight, product of:
                1.1291989 = boost
                7.85132 = idf(docFreq=46, maxDocs=44421)
                0.018225377 = queryNorm
              0.981415 = fieldWeight in 7952, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.85132 = idf(docFreq=46, maxDocs=44421)
                0.125 = fieldNorm(doc=7952)
          0.22003138 = weight(abstract_txt:robertson in 7952) [ClassicSimilarity], result of:
            0.22003138 = score(doc=7952,freq=1.0), product of:
              0.20101008 = queryWeight, product of:
                1.2594604 = boost
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.018225377 = queryNorm
              1.0946286 = fieldWeight in 7952, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.125 = fieldNorm(doc=7952)
          0.2498944 = weight(abstract_txt:sparck in 7952) [ClassicSimilarity], result of:
            0.2498944 = score(doc=7952,freq=1.0), product of:
              0.21880929 = queryWeight, product of:
                1.3140397 = boost
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.018225377 = queryNorm
              1.1420648 = fieldWeight in 7952, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.125 = fieldNorm(doc=7952)
          0.20306335 = weight(abstract_txt:probabilistic in 7952) [ClassicSimilarity], result of:
            0.20306335 = score(doc=7952,freq=1.0), product of:
              0.24006331 = queryWeight, product of:
                1.946496 = boost
                6.7669935 = idf(docFreq=138, maxDocs=44421)
                0.018225377 = queryNorm
              0.8458742 = fieldWeight in 7952, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7669935 = idf(docFreq=138, maxDocs=44421)
                0.125 = fieldNorm(doc=7952)
        0.2 = coord(5/25)
    
  2. Robertson, M.; Willett, P.: ¬An upperbound to the performance of ranked output searching : optimal weighting of query terms using a genetic algorithms (1996) 0.13
    0.13380782 = sum of:
      0.13380782 = product of:
        0.66903913 = sum of:
          0.027534109 = weight(abstract_txt:retrieval in 46) [ClassicSimilarity], result of:
            0.027534109 = score(doc=46,freq=1.0), product of:
              0.06336053 = queryWeight, product of:
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.018225377 = queryNorm
              0.4345625 = fieldWeight in 46, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.125 = fieldNorm(doc=46)
          0.013001495 = weight(abstract_txt:that in 46) [ClassicSimilarity], result of:
            0.013001495 = score(doc=46,freq=1.0), product of:
              0.043980893 = queryWeight, product of:
                1.0203949 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.018225377 = queryNorm
              0.2956169 = fieldWeight in 46, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.125 = fieldNorm(doc=46)
          0.15857777 = weight(abstract_txt:jones in 46) [ClassicSimilarity], result of:
            0.15857777 = score(doc=46,freq=1.0), product of:
              0.16158076 = queryWeight, product of:
                1.1291989 = boost
                7.85132 = idf(docFreq=46, maxDocs=44421)
                0.018225377 = queryNorm
              0.981415 = fieldWeight in 46, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.85132 = idf(docFreq=46, maxDocs=44421)
                0.125 = fieldNorm(doc=46)
          0.22003138 = weight(abstract_txt:robertson in 46) [ClassicSimilarity], result of:
            0.22003138 = score(doc=46,freq=1.0), product of:
              0.20101008 = queryWeight, product of:
                1.2594604 = boost
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.018225377 = queryNorm
              1.0946286 = fieldWeight in 46, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.125 = fieldNorm(doc=46)
          0.2498944 = weight(abstract_txt:sparck in 46) [ClassicSimilarity], result of:
            0.2498944 = score(doc=46,freq=1.0), product of:
              0.21880929 = queryWeight, product of:
                1.3140397 = boost
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.018225377 = queryNorm
              1.1420648 = fieldWeight in 46, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.125 = fieldNorm(doc=46)
        0.2 = coord(5/25)
    
  3. Lioma, C.; Ounis, I.: ¬A syntactically-based query reformulation technique for information retrieval (2008) 0.11
    0.11495281 = sum of:
      0.11495281 = product of:
        0.47897005 = sum of:
          0.034071725 = weight(abstract_txt:retrieval in 3031) [ClassicSimilarity], result of:
            0.034071725 = score(doc=3031,freq=8.0), product of:
              0.06336053 = queryWeight, product of:
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.018225377 = queryNorm
              0.5377437 = fieldWeight in 3031, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3031)
          0.008044264 = weight(abstract_txt:that in 3031) [ClassicSimilarity], result of:
            0.008044264 = score(doc=3031,freq=2.0), product of:
              0.043980893 = queryWeight, product of:
                1.0203949 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.018225377 = queryNorm
              0.18290362 = fieldWeight in 3031, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3031)
          0.06937778 = weight(abstract_txt:jones in 3031) [ClassicSimilarity], result of:
            0.06937778 = score(doc=3031,freq=1.0), product of:
              0.16158076 = queryWeight, product of:
                1.1291989 = boost
                7.85132 = idf(docFreq=46, maxDocs=44421)
                0.018225377 = queryNorm
              0.42936906 = fieldWeight in 3031, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.85132 = idf(docFreq=46, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3031)
          0.10427175 = weight(abstract_txt:salton in 3031) [ClassicSimilarity], result of:
            0.10427175 = score(doc=3031,freq=1.0), product of:
              0.21200876 = queryWeight, product of:
                1.2934586 = boost
                8.993418 = idf(docFreq=14, maxDocs=44421)
                0.018225377 = queryNorm
              0.49182755 = fieldWeight in 3031, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.993418 = idf(docFreq=14, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3031)
          0.10932879 = weight(abstract_txt:sparck in 3031) [ClassicSimilarity], result of:
            0.10932879 = score(doc=3031,freq=1.0), product of:
              0.21880929 = queryWeight, product of:
                1.3140397 = boost
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.018225377 = queryNorm
              0.49965334 = fieldWeight in 3031, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3031)
          0.15387577 = weight(abstract_txt:probabilistic in 3031) [ClassicSimilarity], result of:
            0.15387577 = score(doc=3031,freq=3.0), product of:
              0.24006331 = queryWeight, product of:
                1.946496 = boost
                6.7669935 = idf(docFreq=138, maxDocs=44421)
                0.018225377 = queryNorm
              0.64097995 = fieldWeight in 3031, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.7669935 = idf(docFreq=138, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3031)
        0.24 = coord(6/25)
    
  4. Lassalle, E.; Lassalle, E.: Semantic models in information retrieval (2012) 0.11
    0.1139367 = sum of:
      0.1139367 = product of:
        0.47473627 = sum of:
          0.013767054 = weight(abstract_txt:retrieval in 1097) [ClassicSimilarity], result of:
            0.013767054 = score(doc=1097,freq=1.0), product of:
              0.06336053 = queryWeight, product of:
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.018225377 = queryNorm
              0.21728125 = fieldWeight in 1097, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.0625 = fieldNorm(doc=1097)
          0.0065007475 = weight(abstract_txt:that in 1097) [ClassicSimilarity], result of:
            0.0065007475 = score(doc=1097,freq=1.0), product of:
              0.043980893 = queryWeight, product of:
                1.0203949 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.018225377 = queryNorm
              0.14780845 = fieldWeight in 1097, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0625 = fieldNorm(doc=1097)
          0.079288885 = weight(abstract_txt:jones in 1097) [ClassicSimilarity], result of:
            0.079288885 = score(doc=1097,freq=1.0), product of:
              0.16158076 = queryWeight, product of:
                1.1291989 = boost
                7.85132 = idf(docFreq=46, maxDocs=44421)
                0.018225377 = queryNorm
              0.4907075 = fieldWeight in 1097, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.85132 = idf(docFreq=46, maxDocs=44421)
                0.0625 = fieldNorm(doc=1097)
          0.11001569 = weight(abstract_txt:robertson in 1097) [ClassicSimilarity], result of:
            0.11001569 = score(doc=1097,freq=1.0), product of:
              0.20101008 = queryWeight, product of:
                1.2594604 = boost
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.018225377 = queryNorm
              0.5473143 = fieldWeight in 1097, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.0625 = fieldNorm(doc=1097)
          0.062100515 = weight(abstract_txt:model in 1097) [ClassicSimilarity], result of:
            0.062100515 = score(doc=1097,freq=4.0), product of:
              0.12473796 = queryWeight, product of:
                1.7184447 = boost
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.018225377 = queryNorm
              0.49784777 = fieldWeight in 1097, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.0625 = fieldNorm(doc=1097)
          0.20306335 = weight(abstract_txt:probabilistic in 1097) [ClassicSimilarity], result of:
            0.20306335 = score(doc=1097,freq=4.0), product of:
              0.24006331 = queryWeight, product of:
                1.946496 = boost
                6.7669935 = idf(docFreq=138, maxDocs=44421)
                0.018225377 = queryNorm
              0.8458742 = fieldWeight in 1097, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.7669935 = idf(docFreq=138, maxDocs=44421)
                0.0625 = fieldNorm(doc=1097)
        0.24 = coord(6/25)
    
  5. Bodoff, D.; Robertson, S.: ¬A new unified probabilistic model (2004) 0.10
    0.10183191 = sum of:
      0.10183191 = product of:
        0.50915956 = sum of:
          0.017208818 = weight(abstract_txt:retrieval in 3129) [ClassicSimilarity], result of:
            0.017208818 = score(doc=3129,freq=1.0), product of:
              0.06336053 = queryWeight, product of:
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.018225377 = queryNorm
              0.27160156 = fieldWeight in 3129, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.078125 = fieldNorm(doc=3129)
          0.018170143 = weight(abstract_txt:that in 3129) [ClassicSimilarity], result of:
            0.018170143 = score(doc=3129,freq=5.0), product of:
              0.043980893 = queryWeight, product of:
                1.0203949 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.018225377 = queryNorm
              0.4131372 = fieldWeight in 3129, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.078125 = fieldNorm(doc=3129)
          0.1375196 = weight(abstract_txt:robertson in 3129) [ClassicSimilarity], result of:
            0.1375196 = score(doc=3129,freq=1.0), product of:
              0.20101008 = queryWeight, product of:
                1.2594604 = boost
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.018225377 = queryNorm
              0.6841428 = fieldWeight in 3129, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.078125 = fieldNorm(doc=3129)
          0.11643846 = weight(abstract_txt:model in 3129) [ClassicSimilarity], result of:
            0.11643846 = score(doc=3129,freq=9.0), product of:
              0.12473796 = queryWeight, product of:
                1.7184447 = boost
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.018225377 = queryNorm
              0.9334645 = fieldWeight in 3129, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.078125 = fieldNorm(doc=3129)
          0.21982253 = weight(abstract_txt:probabilistic in 3129) [ClassicSimilarity], result of:
            0.21982253 = score(doc=3129,freq=3.0), product of:
              0.24006331 = queryWeight, product of:
                1.946496 = boost
                6.7669935 = idf(docFreq=138, maxDocs=44421)
                0.018225377 = queryNorm
              0.91568565 = fieldWeight in 3129, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.7669935 = idf(docFreq=138, maxDocs=44421)
                0.078125 = fieldNorm(doc=3129)
        0.2 = coord(5/25)