LIR – Help

Help contents

Quick tips What are a few quick tips?
Ranked query What's a ranked query?
Algorithms Which ranking/weighting algorithms are supported?
Regular expressions What are regular expressions?
Options What do these options mean?

Quick tips


Ranked query

A ranked query or best match query, as opposed to exact match query or boolean query, is a means of improving query results by assigning each result a weight according to its relevance for this particular query and displaying the results in descending order of relevance.
Thus, the most relevant documents will be presented at the top of the result list, without having less relevant documents completely omitted as is the case with boolean queries, where documents not containing all of the query terms simply fall short. [1]
In so doing, recall gets immensely increased while some kind of precision is still achieved through the relevance ranking.

Yet still problematic is the concept of relevance [2] of a certain document for a particular query. The usual approach is to assign weights to the index terms and generate a document's weight (which acts as a measure for its relevance) from the matching term's weights.
lir.pl offers using various term weights calculated by different algorithms [3] to allow for comparison of their effects and appropriateness as relevance measures.


Algorithms

Salton
Kascade einfach
Kascade komplex
Robertson
IDF
Custom
None

Formulas and annotations are based on Lepsky [3] and Lohmann [4].

Legend:

docNum N Number of documents in the collection
docNum(W) df Number of documents containing basic form w ("document frequency")
colLen Number of detected word forms in the collection ("collection length")
docLen(D) Number of detected word forms in document d
formLen(D) Number of distinct basic forms in document d
freq(W, D) tf Frequency of basic form w in document d ("term frequency")
freq(W) Frequency of basic form w in the collection
len(W) Character length of basic form W

w(W, D) Weight of document d resulting from basic form w
(The overall weight of the document - expressing the relevance of this document for the query - is calculated as the sum of the individual weights resulting from each of the query's terms.)


Regular expressions

In short:
For more information see:


Options

Footnotes

[1] Strictly speaking, this only applies to Boolean AND-queries, but the problem with OR-queries is that they lack any information on the result's relevance, the relevant results are simply scattered around the results list; actually, a ranked query - in its simplest form - is a Boolean OR-query with relevance ranking.
[2] For a comprehensive survey see: Saracevic: RELEVANCE: A Review of and a Framework for the Thinking on the Notion in Information Science [pdf].
Probabilistic IR: Crestani et al.: "Is This Document Relevant? ... Probably": A Survey of Probabilistic Models in Information Retrieval [pdf].
[3] Lepsky: Automatisches Indexieren [pdf].
Larson & Hearst: Term Weighting and Ranking Algorithms [pdf].
Salton & Buckley: Term-weighting approaches in automatic text retrieval. In: Information Processing & Management vol. 24, no. 5, pp. 513-523, 1988.
[4] Lohmann: KASCADE: Dokumentanreicherung und automatische Inhaltserschließung - Projektbericht und Ergebnisse des Retrievaltests. Düsseldorf, 2000. (Schriften der Universitäts- und Landesbibliothek Düsseldorf; 31).
* And also, the lir.pl databases "LIR" and "Literatur zur Inhaltserschließung" offer a lot of references on these and related topics!