Document (#19151)

Author
Melucci, M.
Title
Passage retrieval : a probabilistic technique
Source
Information processing and management. 34(1998) no.1, S.43-68
Year
1998
Abstract
This paper presents a probabilistic technique to retrieve passages from texts having a large size or heterogeneous semantic content. The proposed technique is independent on any supporting auxiliary data, such as text structure, topic organization, or pre-defined text segments. A Bayesian framework implements the probabilistic technique. We carried out experiments to compare the probabilistique technique to one based on a text segmentation algorithm. In particular, the probabilistique technique is more effective than, or as effective as the one based on the text segmentation to retrieve small passages. Results show that passage size affects passage retrieval performance. Results do also suggest that text organization and query generality may have an impact on the difference in effectiveness between the two techniques
Theme
Volltextretrieval

Similar documents (author)

  1. Melucci, M.: Making digital libraries effective : automatic generation of links for similarity search across hyper-textbooks (2004) 5.81
    5.814733 = sum of:
      5.814733 = weight(author_txt:melucci in 3226) [ClassicSimilarity], result of:
        5.814733 = fieldWeight in 3226, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.303573 = idf(docFreq=10, maxDocs=44421)
          0.625 = fieldNorm(doc=3226)
    
  2. Melucci, M.: Contextual search : a computational framework (2012) 5.81
    5.814733 = sum of:
      5.814733 = weight(author_txt:melucci in 913) [ClassicSimilarity], result of:
        5.814733 = fieldWeight in 913, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.303573 = idf(docFreq=10, maxDocs=44421)
          0.625 = fieldNorm(doc=913)
    
  3. Agosti, M.; Melucci, M.: Information retrieval techniques for the automatic construction of hypertext (2000) 4.65
    4.6517863 = sum of:
      4.6517863 = weight(author_txt:melucci in 5671) [ClassicSimilarity], result of:
        4.6517863 = fieldWeight in 5671, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.303573 = idf(docFreq=10, maxDocs=44421)
          0.5 = fieldNorm(doc=5671)
    
  4. Melucci, M.; Orio, N.: Combining melody processing and information retrieval techniques : methodology, evaluation, and system implementation (2004) 4.65
    4.6517863 = sum of:
      4.6517863 = weight(author_txt:melucci in 4087) [ClassicSimilarity], result of:
        4.6517863 = fieldWeight in 4087, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.303573 = idf(docFreq=10, maxDocs=44421)
          0.5 = fieldNorm(doc=4087)
    
  5. Melucci, M.; Orio, N.: Design, implementation, and evaluation of a methodology for automatic stemmer generation (2007) 4.65
    4.6517863 = sum of:
      4.6517863 = weight(author_txt:melucci in 1268) [ClassicSimilarity], result of:
        4.6517863 = fieldWeight in 1268, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.303573 = idf(docFreq=10, maxDocs=44421)
          0.5 = fieldNorm(doc=1268)
    

Similar documents (content)

  1. Otterbacher, J.; Erkan, G.; Radev, D.R.: Biased LexRank : passage retrieval using random walks with question-based priors (2009) 0.35
    0.35237235 = sum of:
      0.35237235 = product of:
        1.2584727 = sum of:
          0.013996452 = weight(abstract_txt:based in 3450) [ClassicSimilarity], result of:
            0.013996452 = score(doc=3450,freq=1.0), product of:
              0.04690291 = queryWeight, product of:
                1.1109811 = boost
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.013263136 = queryNorm
              0.2984133 = fieldWeight in 3450, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.09375 = fieldNorm(doc=3450)
          0.02578808 = weight(abstract_txt:retrieval in 3450) [ClassicSimilarity], result of:
            0.02578808 = score(doc=3450,freq=2.0), product of:
              0.055948764 = queryWeight, product of:
                1.2133945 = boost
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.013263136 = queryNorm
              0.46092314 = fieldWeight in 3450, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.09375 = fieldNorm(doc=3450)
          0.01826871 = weight(abstract_txt:results in 3450) [ClassicSimilarity], result of:
            0.01826871 = score(doc=3450,freq=1.0), product of:
              0.056017846 = queryWeight, product of:
                1.2141434 = boost
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.013263136 = queryNorm
              0.32612303 = fieldWeight in 3450, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.09375 = fieldNorm(doc=3450)
          0.09442221 = weight(abstract_txt:retrieve in 3450) [ClassicSimilarity], result of:
            0.09442221 = score(doc=3450,freq=1.0), product of:
              0.16745724 = queryWeight, product of:
                2.0992239 = boost
                6.014492 = idf(docFreq=294, maxDocs=44421)
                0.013263136 = queryNorm
              0.5638586 = fieldWeight in 3450, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.014492 = idf(docFreq=294, maxDocs=44421)
                0.09375 = fieldNorm(doc=3450)
          0.4904863 = weight(abstract_txt:passages in 3450) [ClassicSimilarity], result of:
            0.4904863 = score(doc=3450,freq=4.0), product of:
              0.3164116 = queryWeight, product of:
                2.885579 = boost
                8.267481 = idf(docFreq=30, maxDocs=44421)
                0.013263136 = queryNorm
              1.5501527 = fieldWeight in 3450, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                8.267481 = idf(docFreq=30, maxDocs=44421)
                0.09375 = fieldNorm(doc=3450)
          0.101242036 = weight(abstract_txt:text in 3450) [ClassicSimilarity], result of:
            0.101242036 = score(doc=3450,freq=2.0), product of:
              0.18897241 = queryWeight, product of:
                3.5259488 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.013263136 = queryNorm
              0.5357503 = fieldWeight in 3450, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.09375 = fieldNorm(doc=3450)
          0.5142688 = weight(abstract_txt:passage in 3450) [ClassicSimilarity], result of:
            0.5142688 = score(doc=3450,freq=2.0), product of:
              0.47097915 = queryWeight, product of:
                4.3117466 = boost
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.013263136 = queryNorm
              1.0919142 = fieldWeight in 3450, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.09375 = fieldNorm(doc=3450)
        0.28 = coord(7/25)
    
  2. Mengle, S.; Goharian, N.: Passage detection using text classification (2009) 0.29
    0.28888908 = sum of:
      0.28888908 = product of:
        1.4444454 = sum of:
          0.008164597 = weight(abstract_txt:based in 3765) [ClassicSimilarity], result of:
            0.008164597 = score(doc=3765,freq=1.0), product of:
              0.04690291 = queryWeight, product of:
                1.1109811 = boost
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.013263136 = queryNorm
              0.17407443 = fieldWeight in 3765, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3765)
          0.023785146 = weight(abstract_txt:retrieval in 3765) [ClassicSimilarity], result of:
            0.023785146 = score(doc=3765,freq=5.0), product of:
              0.055948764 = queryWeight, product of:
                1.2133945 = boost
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.013263136 = queryNorm
              0.42512372 = fieldWeight in 3765, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3765)
          0.53527594 = weight(abstract_txt:passages in 3765) [ClassicSimilarity], result of:
            0.53527594 = score(doc=3765,freq=14.0), product of:
              0.3164116 = queryWeight, product of:
                2.885579 = boost
                8.267481 = idf(docFreq=30, maxDocs=44421)
                0.013263136 = queryNorm
              1.6917076 = fieldWeight in 3765, product of:
                3.7416575 = tf(freq=14.0), with freq of:
                  14.0 = termFreq=14.0
                8.267481 = idf(docFreq=30, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3765)
          0.08352042 = weight(abstract_txt:text in 3765) [ClassicSimilarity], result of:
            0.08352042 = score(doc=3765,freq=4.0), product of:
              0.18897241 = queryWeight, product of:
                3.5259488 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.013263136 = queryNorm
              0.44197148 = fieldWeight in 3765, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3765)
          0.79369926 = weight(abstract_txt:passage in 3765) [ClassicSimilarity], result of:
            0.79369926 = score(doc=3765,freq=14.0), product of:
              0.47097915 = queryWeight, product of:
                4.3117466 = boost
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.013263136 = queryNorm
              1.6852111 = fieldWeight in 3765, product of:
                3.7416575 = tf(freq=14.0), with freq of:
                  14.0 = termFreq=14.0
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3765)
        0.2 = coord(5/25)
    
  3. Liu, R.-L.: ¬A passage extractor for classification of disease aspect information (2013) 0.23
    0.22784002 = sum of:
      0.22784002 = product of:
        0.9493334 = sum of:
          0.01319598 = weight(abstract_txt:based in 2107) [ClassicSimilarity], result of:
            0.01319598 = score(doc=2107,freq=2.0), product of:
              0.04690291 = queryWeight, product of:
                1.1109811 = boost
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.013263136 = queryNorm
              0.28134674 = fieldWeight in 2107, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.0625 = fieldNorm(doc=2107)
          0.012156618 = weight(abstract_txt:retrieval in 2107) [ClassicSimilarity], result of:
            0.012156618 = score(doc=2107,freq=1.0), product of:
              0.055948764 = queryWeight, product of:
                1.2133945 = boost
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.013263136 = queryNorm
              0.21728125 = fieldWeight in 2107, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.0625 = fieldNorm(doc=2107)
          0.23121746 = weight(abstract_txt:passages in 2107) [ClassicSimilarity], result of:
            0.23121746 = score(doc=2107,freq=2.0), product of:
              0.3164116 = queryWeight, product of:
                2.885579 = boost
                8.267481 = idf(docFreq=30, maxDocs=44421)
                0.013263136 = queryNorm
              0.73074895 = fieldWeight in 2107, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.267481 = idf(docFreq=30, maxDocs=44421)
                0.0625 = fieldNorm(doc=2107)
          0.13498938 = weight(abstract_txt:text in 2107) [ClassicSimilarity], result of:
            0.13498938 = score(doc=2107,freq=8.0), product of:
              0.18897241 = queryWeight, product of:
                3.5259488 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.013263136 = queryNorm
              0.7143338 = fieldWeight in 2107, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0625 = fieldNorm(doc=2107)
          0.34284586 = weight(abstract_txt:passage in 2107) [ClassicSimilarity], result of:
            0.34284586 = score(doc=2107,freq=2.0), product of:
              0.47097915 = queryWeight, product of:
                4.3117466 = boost
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.013263136 = queryNorm
              0.72794276 = fieldWeight in 2107, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.0625 = fieldNorm(doc=2107)
          0.21492812 = weight(abstract_txt:technique in 2107) [ClassicSimilarity], result of:
            0.21492812 = score(doc=2107,freq=2.0), product of:
              0.43465158 = queryWeight, product of:
                5.857847 = boost
                5.5944448 = idf(docFreq=448, maxDocs=44421)
                0.013263136 = queryNorm
              0.4944837 = fieldWeight in 2107, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.5944448 = idf(docFreq=448, maxDocs=44421)
                0.0625 = fieldNorm(doc=2107)
        0.24 = coord(6/25)
    
  4. Lioma, C.; Ounis, I.: ¬A syntactically-based query reformulation technique for information retrieval (2008) 0.21
    0.21445754 = sum of:
      0.21445754 = product of:
        0.67017984 = sum of:
          0.011546482 = weight(abstract_txt:based in 3031) [ClassicSimilarity], result of:
            0.011546482 = score(doc=3031,freq=2.0), product of:
              0.04690291 = queryWeight, product of:
                1.1109811 = boost
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.013263136 = queryNorm
              0.24617839 = fieldWeight in 3031, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3031)
          0.030086095 = weight(abstract_txt:retrieval in 3031) [ClassicSimilarity], result of:
            0.030086095 = score(doc=3031,freq=8.0), product of:
              0.055948764 = queryWeight, product of:
                1.2133945 = boost
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.013263136 = queryNorm
              0.5377437 = fieldWeight in 3031, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3031)
          0.015070915 = weight(abstract_txt:results in 3031) [ClassicSimilarity], result of:
            0.015070915 = score(doc=3031,freq=2.0), product of:
              0.056017846 = queryWeight, product of:
                1.2141434 = boost
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.013263136 = queryNorm
              0.26903775 = fieldWeight in 3031, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3031)
          0.02866165 = weight(abstract_txt:effective in 3031) [ClassicSimilarity], result of:
            0.02866165 = score(doc=3031,freq=1.0), product of:
              0.10833716 = queryWeight, product of:
                1.6884784 = boost
                4.837664 = idf(docFreq=956, maxDocs=44421)
                0.013263136 = queryNorm
              0.26455975 = fieldWeight in 3031, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.837664 = idf(docFreq=956, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3031)
          0.073280826 = weight(abstract_txt:size in 3031) [ClassicSimilarity], result of:
            0.073280826 = score(doc=3031,freq=2.0), product of:
              0.1607781 = queryWeight, product of:
                2.0569334 = boost
                5.8933253 = idf(docFreq=332, maxDocs=44421)
                0.013263136 = queryNorm
              0.4557886 = fieldWeight in 3031, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.8933253 = idf(docFreq=332, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3031)
          0.04176021 = weight(abstract_txt:text in 3031) [ClassicSimilarity], result of:
            0.04176021 = score(doc=3031,freq=1.0), product of:
              0.18897241 = queryWeight, product of:
                3.5259488 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.013263136 = queryNorm
              0.22098574 = fieldWeight in 3031, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3031)
          0.20381364 = weight(abstract_txt:probabilistic in 3031) [ClassicSimilarity], result of:
            0.20381364 = score(doc=3031,freq=3.0), product of:
              0.31797194 = queryWeight, product of:
                3.5428014 = boost
                6.7669935 = idf(docFreq=138, maxDocs=44421)
                0.013263136 = queryNorm
              0.64097995 = fieldWeight in 3031, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.7669935 = idf(docFreq=138, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3031)
          0.26596 = weight(abstract_txt:technique in 3031) [ClassicSimilarity], result of:
            0.26596 = score(doc=3031,freq=4.0), product of:
              0.43465158 = queryWeight, product of:
                5.857847 = boost
                5.5944448 = idf(docFreq=448, maxDocs=44421)
                0.013263136 = queryNorm
              0.6118924 = fieldWeight in 3031, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.5944448 = idf(docFreq=448, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3031)
        0.32 = coord(8/25)
    
  5. Yang, C.C.; Li, K.W.: ¬A heuristic method based on a statistical approach for chinese text segmentation (2005) 0.19
    0.1889265 = sum of:
      0.1889265 = product of:
        0.7871938 = sum of:
          0.01319598 = weight(abstract_txt:based in 5580) [ClassicSimilarity], result of:
            0.01319598 = score(doc=5580,freq=2.0), product of:
              0.04690291 = queryWeight, product of:
                1.1109811 = boost
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.013263136 = queryNorm
              0.28134674 = fieldWeight in 5580, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.0625 = fieldNorm(doc=5580)
          0.012156618 = weight(abstract_txt:retrieval in 5580) [ClassicSimilarity], result of:
            0.012156618 = score(doc=5580,freq=1.0), product of:
              0.055948764 = queryWeight, product of:
                1.2133945 = boost
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.013263136 = queryNorm
              0.21728125 = fieldWeight in 5580, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.0625 = fieldNorm(doc=5580)
          0.04907008 = weight(abstract_txt:affects in 5580) [ClassicSimilarity], result of:
            0.04907008 = score(doc=5580,freq=1.0), product of:
              0.11257704 = queryWeight, product of:
                1.2170732 = boost
                6.9740796 = idf(docFreq=112, maxDocs=44421)
                0.013263136 = queryNorm
              0.43587998 = fieldWeight in 5580, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9740796 = idf(docFreq=112, maxDocs=44421)
                0.0625 = fieldNorm(doc=5580)
          0.43452296 = weight(abstract_txt:segmentation in 5580) [ClassicSimilarity], result of:
            0.43452296 = score(doc=5580,freq=9.0), product of:
              0.29186118 = queryWeight, product of:
                2.7713728 = boost
                7.9402676 = idf(docFreq=42, maxDocs=44421)
                0.013263136 = queryNorm
              1.4888002 = fieldWeight in 5580, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                7.9402676 = idf(docFreq=42, maxDocs=44421)
                0.0625 = fieldNorm(doc=5580)
          0.126271 = weight(abstract_txt:text in 5580) [ClassicSimilarity], result of:
            0.126271 = score(doc=5580,freq=7.0), product of:
              0.18897241 = queryWeight, product of:
                3.5259488 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.013263136 = queryNorm
              0.66819805 = fieldWeight in 5580, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0625 = fieldNorm(doc=5580)
          0.15197714 = weight(abstract_txt:technique in 5580) [ClassicSimilarity], result of:
            0.15197714 = score(doc=5580,freq=1.0), product of:
              0.43465158 = queryWeight, product of:
                5.857847 = boost
                5.5944448 = idf(docFreq=448, maxDocs=44421)
                0.013263136 = queryNorm
              0.3496528 = fieldWeight in 5580, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5944448 = idf(docFreq=448, maxDocs=44421)
                0.0625 = fieldNorm(doc=5580)
        0.24 = coord(6/25)