Document (#23765)

Author
Kaszkiel, M.
Zobel, J.
Title
Effective ranking with arbitrary passages
Source
Journal of the American Society for Information Science and technology. 52(2001) no.4, S.344-364
Year
2001
Abstract
Text retrieval systems store a great variety of documents, from abstracts, newspaper articles, and Web pages to journal articles, books, court transcripts, and legislation. Collections of diverse types of documents expose shortcomings in current approaches to ranking. Use of short fragments of documents, called passages, instead of whole documents can overcome these shortcomings: passage ranking provides convenient units of text to return to the user, can avoid the difficulties of comparing documents of different length, and enables identification of short blocks of relevant material among otherwise irrelevant text. In this article, we compare several kinds of passage in an extensive series of experiments. We introduce a new type of passage, overlapping fragments of either fixed or variable length. We show that ranking with these arbitrary passages gives substantial improvements in retrieval effectiveness over traditional document ranking schemes, particularly for queries on collections of long documents. Ranking with arbitrary passages shows consistent improvements compared to ranking with whole documents, and to ranking with previous passage types that depend on document structure or topic shifts in documents
Theme
Retrievalalgorithmen

Similar documents (author)

  1. Heinz, S.; Zobel, J.: Efficient single-pass index construction for text databases (2003) 4.70
    4.6994414 = sum of:
      4.6994414 = weight(author_txt:zobel in 2678) [ClassicSimilarity], result of:
        4.6994414 = fieldWeight in 2678, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.398883 = idf(docFreq=9, maxDocs=44421)
          0.5 = fieldNorm(doc=2678)
    
  2. Uitdenbogerd, A.L.; Zobel, J.: ¬An architecture for effective music information retrieval (2004) 4.70
    4.6994414 = sum of:
      4.6994414 = weight(author_txt:zobel in 4055) [ClassicSimilarity], result of:
        4.6994414 = fieldWeight in 4055, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.398883 = idf(docFreq=9, maxDocs=44421)
          0.5 = fieldNorm(doc=4055)
    
  3. Hoad, T.C.; Zobel, J.: Methods for identifying versioned and plagiarized documents (2003) 4.70
    4.6994414 = sum of:
      4.6994414 = weight(author_txt:zobel in 159) [ClassicSimilarity], result of:
        4.6994414 = fieldWeight in 159, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.398883 = idf(docFreq=9, maxDocs=44421)
          0.5 = fieldNorm(doc=159)
    
  4. Moffat, A.; Zobel, J.: Self-indexing inverted files for fast text retrieval (1996) 4.70
    4.6994414 = sum of:
      4.6994414 = weight(author_txt:zobel in 1009) [ClassicSimilarity], result of:
        4.6994414 = fieldWeight in 1009, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.398883 = idf(docFreq=9, maxDocs=44421)
          0.5 = fieldNorm(doc=1009)
    
  5. Hawking, D.; Zobel, J.: Does topic metadata help with Web search? (2007) 4.70
    4.6994414 = sum of:
      4.6994414 = weight(author_txt:zobel in 1204) [ClassicSimilarity], result of:
        4.6994414 = fieldWeight in 1204, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.398883 = idf(docFreq=9, maxDocs=44421)
          0.5 = fieldNorm(doc=1204)
    

Similar documents (content)

  1. Mengle, S.; Goharian, N.: Passage detection using text classification (2009) 0.57
    0.57215476 = sum of:
      0.57215476 = product of:
        1.7879837 = sum of:
          0.041486587 = weight(abstract_txt:blocks in 3765) [ClassicSimilarity], result of:
            0.041486587 = score(doc=3765,freq=1.0), product of:
              0.09859613 = queryWeight, product of:
                1.0179671 = boost
                7.694134 = idf(docFreq=54, maxDocs=44421)
                0.012588279 = queryNorm
              0.42077297 = fieldWeight in 3765, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.694134 = idf(docFreq=54, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3765)
          0.024983346 = weight(abstract_txt:document in 3765) [ClassicSimilarity], result of:
            0.024983346 = score(doc=3765,freq=3.0), product of:
              0.061422113 = queryWeight, product of:
                1.1362691 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.012588279 = queryNorm
              0.4067484 = fieldWeight in 3765, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3765)
          0.03605855 = weight(abstract_txt:text in 3765) [ClassicSimilarity], result of:
            0.03605855 = score(doc=3765,freq=4.0), product of:
              0.08158569 = queryWeight, product of:
                1.6038784 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.012588279 = queryNorm
              0.44197148 = fieldWeight in 3765, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3765)
          0.014165605 = weight(abstract_txt:with in 3765) [ClassicSimilarity], result of:
            0.014165605 = score(doc=3765,freq=4.0), product of:
              0.051885758 = queryWeight, product of:
                1.6512501 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.012588279 = queryNorm
              0.2730153 = fieldWeight in 3765, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3765)
          0.051011972 = weight(abstract_txt:length in 3765) [ClassicSimilarity], result of:
            0.051011972 = score(doc=3765,freq=1.0), product of:
              0.1425759 = queryWeight, product of:
                1.7311786 = boost
                6.5424123 = idf(docFreq=173, maxDocs=44421)
                0.012588279 = queryNorm
              0.35778818 = fieldWeight in 3765, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5424123 = idf(docFreq=173, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3765)
          0.761481 = weight(abstract_txt:passage in 3765) [ClassicSimilarity], result of:
            0.761481 = score(doc=3765,freq=14.0), product of:
              0.4518609 = queryWeight, product of:
                4.3584914 = boost
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.012588279 = queryNorm
              1.6852111 = fieldWeight in 3765, product of:
                3.7416575 = tf(freq=14.0), with freq of:
                  14.0 = termFreq=14.0
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3765)
          0.08847504 = weight(abstract_txt:documents in 3765) [ClassicSimilarity], result of:
            0.08847504 = score(doc=3765,freq=3.0), product of:
              0.22652952 = queryWeight, product of:
                4.364266 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.012588279 = queryNorm
              0.39056736 = fieldWeight in 3765, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3765)
          0.77032155 = weight(abstract_txt:passages in 3765) [ClassicSimilarity], result of:
            0.77032155 = score(doc=3765,freq=14.0), product of:
              0.45535147 = queryWeight, product of:
                4.3752933 = boost
                8.267481 = idf(docFreq=30, maxDocs=44421)
                0.012588279 = queryNorm
              1.6917076 = fieldWeight in 3765, product of:
                3.7416575 = tf(freq=14.0), with freq of:
                  14.0 = termFreq=14.0
                8.267481 = idf(docFreq=30, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3765)
        0.32 = coord(8/25)
    
  2. Stamatatos, E.: Plagiarism detection using stopword n-grams (2011) 0.21
    0.20999396 = sum of:
      0.20999396 = product of:
        0.87497485 = sum of:
          0.020605918 = weight(abstract_txt:document in 955) [ClassicSimilarity], result of:
            0.020605918 = score(doc=955,freq=1.0), product of:
              0.061422113 = queryWeight, product of:
                1.1362691 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.012588279 = queryNorm
              0.33548045 = fieldWeight in 955, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.078125 = fieldNorm(doc=955)
          0.026974848 = weight(abstract_txt:collections in 955) [ClassicSimilarity], result of:
            0.026974848 = score(doc=955,freq=1.0), product of:
              0.07350261 = queryWeight, product of:
                1.2429973 = boost
                4.6974936 = idf(docFreq=1100, maxDocs=44421)
                0.012588279 = queryNorm
              0.3669917 = fieldWeight in 955, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6974936 = idf(docFreq=1100, maxDocs=44421)
                0.078125 = fieldNorm(doc=955)
          0.017525392 = weight(abstract_txt:with in 955) [ClassicSimilarity], result of:
            0.017525392 = score(doc=955,freq=3.0), product of:
              0.051885758 = queryWeight, product of:
                1.6512501 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.012588279 = queryNorm
              0.33776882 = fieldWeight in 955, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.078125 = fieldNorm(doc=955)
          0.2907348 = weight(abstract_txt:passage in 955) [ClassicSimilarity], result of:
            0.2907348 = score(doc=955,freq=1.0), product of:
              0.4518609 = queryWeight, product of:
                4.3584914 = boost
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.012588279 = queryNorm
              0.6434166 = fieldWeight in 955, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.078125 = fieldNorm(doc=955)
          0.10319938 = weight(abstract_txt:documents in 955) [ClassicSimilarity], result of:
            0.10319938 = score(doc=955,freq=2.0), product of:
              0.22652952 = queryWeight, product of:
                4.364266 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.012588279 = queryNorm
              0.455567 = fieldWeight in 955, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.078125 = fieldNorm(doc=955)
          0.4159345 = weight(abstract_txt:passages in 955) [ClassicSimilarity], result of:
            0.4159345 = score(doc=955,freq=2.0), product of:
              0.45535147 = queryWeight, product of:
                4.3752933 = boost
                8.267481 = idf(docFreq=30, maxDocs=44421)
                0.012588279 = queryNorm
              0.9134362 = fieldWeight in 955, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.267481 = idf(docFreq=30, maxDocs=44421)
                0.078125 = fieldNorm(doc=955)
        0.24 = coord(6/25)
    
  3. Wan, X.; Yang, J.; Xiao, J.: Towards a unified approach to document similarity search using manifold-ranking of blocks (2008) 0.20
    0.19983841 = sum of:
      0.19983841 = product of:
        0.71370864 = sum of:
          0.106019236 = weight(abstract_txt:blocks in 3081) [ClassicSimilarity], result of:
            0.106019236 = score(doc=3081,freq=5.0), product of:
              0.09859613 = queryWeight, product of:
                1.0179671 = boost
                7.694134 = idf(docFreq=54, maxDocs=44421)
                0.012588279 = queryNorm
              1.0752879 = fieldWeight in 3081, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.694134 = idf(docFreq=54, maxDocs=44421)
                0.0625 = fieldNorm(doc=3081)
          0.04662587 = weight(abstract_txt:document in 3081) [ClassicSimilarity], result of:
            0.04662587 = score(doc=3081,freq=8.0), product of:
              0.061422113 = queryWeight, product of:
                1.1362691 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.012588279 = queryNorm
              0.7591056 = fieldWeight in 3081, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.0625 = fieldNorm(doc=3081)
          0.042940978 = weight(abstract_txt:whole in 3081) [ClassicSimilarity], result of:
            0.042940978 = score(doc=3081,freq=1.0), product of:
              0.11628349 = queryWeight, product of:
                1.5634278 = boost
                5.908454 = idf(docFreq=327, maxDocs=44421)
                0.012588279 = queryNorm
              0.36927837 = fieldWeight in 3081, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.908454 = idf(docFreq=327, maxDocs=44421)
                0.0625 = fieldNorm(doc=3081)
          0.035688706 = weight(abstract_txt:text in 3081) [ClassicSimilarity], result of:
            0.035688706 = score(doc=3081,freq=3.0), product of:
              0.08158569 = queryWeight, product of:
                1.6038784 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.012588279 = queryNorm
              0.4374383 = fieldWeight in 3081, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0625 = fieldNorm(doc=3081)
          0.008094631 = weight(abstract_txt:with in 3081) [ClassicSimilarity], result of:
            0.008094631 = score(doc=3081,freq=1.0), product of:
              0.051885758 = queryWeight, product of:
                1.6512501 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.012588279 = queryNorm
              0.15600874 = fieldWeight in 3081, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.0625 = fieldNorm(doc=3081)
          0.116756774 = weight(abstract_txt:documents in 3081) [ClassicSimilarity], result of:
            0.116756774 = score(doc=3081,freq=4.0), product of:
              0.22652952 = queryWeight, product of:
                4.364266 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.012588279 = queryNorm
              0.51541525 = fieldWeight in 3081, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.0625 = fieldNorm(doc=3081)
          0.35758242 = weight(abstract_txt:ranking in 3081) [ClassicSimilarity], result of:
            0.35758242 = score(doc=3081,freq=6.0), product of:
              0.41734043 = queryWeight, product of:
                5.923713 = boost
                5.5966744 = idf(docFreq=447, maxDocs=44421)
                0.012588279 = queryNorm
              0.8568123 = fieldWeight in 3081, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.5966744 = idf(docFreq=447, maxDocs=44421)
                0.0625 = fieldNorm(doc=3081)
        0.28 = coord(7/25)
    
  4. Moura, E.S. de; Fernandes, D.; Ribeiro-Neto, B.; Silva, A.S. da; Gonçalves, M.A.: Using structural information to improve search in Web collections (2010) 0.16
    0.157808 = sum of:
      0.157808 = product of:
        0.65753335 = sum of:
          0.08381556 = weight(abstract_txt:blocks in 119) [ClassicSimilarity], result of:
            0.08381556 = score(doc=119,freq=2.0), product of:
              0.09859613 = queryWeight, product of:
                1.0179671 = boost
                7.694134 = idf(docFreq=54, maxDocs=44421)
                0.012588279 = queryNorm
              0.8500897 = fieldWeight in 119, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.694134 = idf(docFreq=54, maxDocs=44421)
                0.078125 = fieldNorm(doc=119)
          0.020605918 = weight(abstract_txt:document in 119) [ClassicSimilarity], result of:
            0.020605918 = score(doc=119,freq=1.0), product of:
              0.061422113 = queryWeight, product of:
                1.1362691 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.012588279 = queryNorm
              0.33548045 = fieldWeight in 119, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.078125 = fieldNorm(doc=119)
          0.03814819 = weight(abstract_txt:collections in 119) [ClassicSimilarity], result of:
            0.03814819 = score(doc=119,freq=2.0), product of:
              0.07350261 = queryWeight, product of:
                1.2429973 = boost
                4.6974936 = idf(docFreq=1100, maxDocs=44421)
                0.012588279 = queryNorm
              0.5190046 = fieldWeight in 119, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.6974936 = idf(docFreq=1100, maxDocs=44421)
                0.078125 = fieldNorm(doc=119)
          0.053676225 = weight(abstract_txt:whole in 119) [ClassicSimilarity], result of:
            0.053676225 = score(doc=119,freq=1.0), product of:
              0.11628349 = queryWeight, product of:
                1.5634278 = boost
                5.908454 = idf(docFreq=327, maxDocs=44421)
                0.012588279 = queryNorm
              0.46159798 = fieldWeight in 119, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.908454 = idf(docFreq=327, maxDocs=44421)
                0.078125 = fieldNorm(doc=119)
          0.014309422 = weight(abstract_txt:with in 119) [ClassicSimilarity], result of:
            0.014309422 = score(doc=119,freq=2.0), product of:
              0.051885758 = queryWeight, product of:
                1.6512501 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.012588279 = queryNorm
              0.2757871 = fieldWeight in 119, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.078125 = fieldNorm(doc=119)
          0.446978 = weight(abstract_txt:ranking in 119) [ClassicSimilarity], result of:
            0.446978 = score(doc=119,freq=6.0), product of:
              0.41734043 = queryWeight, product of:
                5.923713 = boost
                5.5966744 = idf(docFreq=447, maxDocs=44421)
                0.012588279 = queryNorm
              1.0710154 = fieldWeight in 119, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.5966744 = idf(docFreq=447, maxDocs=44421)
                0.078125 = fieldNorm(doc=119)
        0.24 = coord(6/25)
    
  5. Otterbacher, J.; Erkan, G.; Radev, D.R.: Biased LexRank : passage retrieval using random walks with question-based priors (2009) 0.15
    0.14915605 = sum of:
      0.14915605 = product of:
        1.2429671 = sum of:
          0.04370956 = weight(abstract_txt:text in 3450) [ClassicSimilarity], result of:
            0.04370956 = score(doc=3450,freq=2.0), product of:
              0.08158569 = queryWeight, product of:
                1.6038784 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.012588279 = queryNorm
              0.5357503 = fieldWeight in 3450, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.09375 = fieldNorm(doc=3450)
          0.49339333 = weight(abstract_txt:passage in 3450) [ClassicSimilarity], result of:
            0.49339333 = score(doc=3450,freq=2.0), product of:
              0.4518609 = queryWeight, product of:
                4.3584914 = boost
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.012588279 = queryNorm
              1.0919142 = fieldWeight in 3450, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.09375 = fieldNorm(doc=3450)
          0.7058643 = weight(abstract_txt:passages in 3450) [ClassicSimilarity], result of:
            0.7058643 = score(doc=3450,freq=4.0), product of:
              0.45535147 = queryWeight, product of:
                4.3752933 = boost
                8.267481 = idf(docFreq=30, maxDocs=44421)
                0.012588279 = queryNorm
              1.5501527 = fieldWeight in 3450, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                8.267481 = idf(docFreq=30, maxDocs=44421)
                0.09375 = fieldNorm(doc=3450)
        0.12 = coord(3/25)