Document (#34021)

Author
Efron, M.
Title
Query expansion and dimensionality reduction : Notions of optimality in Rocchio relevance feedback and latent semantic indexing
Source
Information processing and management. 44(2008) no.1, S.163-180
Year
2008
Abstract
Rocchio relevance feedback and latent semantic indexing (LSI) are well-known extensions of the vector space model for information retrieval (IR). This paper analyzes the statistical relationship between these extensions. The analysis focuses on each method's basis in least-squares optimization. Noting that LSI and Rocchio relevance feedback both alter the vector space model in a way that is in some sense least-squares optimal, we ask: what is the relationship between LSI's and Rocchio's notions of optimality? What does this relationship imply for IR? Using an analytical approach, we argue that Rocchio relevance feedback is optimal if we understand retrieval as a simplified classification problem. On the other hand, LSI's motivation comes to the fore if we understand it as a biased regression technique, where projection onto a low-dimensional orthogonal subspace of the documents reduces model variance.
Theme
Retrievalalgorithmen
Object
Rocchio-Algorithmus
Latent semantic indexing

Similar documents (author)

  1. Efron, M.: Eigenvalue-based model selection during Latent Semantic Indexing (2005) 6.10
    6.0972233 = sum of:
      6.0972233 = weight(author_txt:efron in 4685) [ClassicSimilarity], result of:
        6.0972233 = fieldWeight in 4685, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.755557 = idf(docFreq=6, maxDocs=44421)
          0.625 = fieldNorm(doc=4685)
    
  2. Efron, M.: Shannon meets Shortz : a probabilistic model of crossword puzzle difficulty (2008) 6.10
    6.0972233 = sum of:
      6.0972233 = weight(author_txt:efron in 2620) [ClassicSimilarity], result of:
        6.0972233 = fieldWeight in 2620, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.755557 = idf(docFreq=6, maxDocs=44421)
          0.625 = fieldNorm(doc=2620)
    
  3. Efron, M.: Linear time series models for term weighting in information retrieval (2010) 6.10
    6.0972233 = sum of:
      6.0972233 = weight(author_txt:efron in 675) [ClassicSimilarity], result of:
        6.0972233 = fieldWeight in 675, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.755557 = idf(docFreq=6, maxDocs=44421)
          0.625 = fieldNorm(doc=675)
    
  4. Efron, M.: Information search and retrieval in microblogs (2011) 6.10
    6.0972233 = sum of:
      6.0972233 = weight(author_txt:efron in 455) [ClassicSimilarity], result of:
        6.0972233 = fieldWeight in 455, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.755557 = idf(docFreq=6, maxDocs=44421)
          0.625 = fieldNorm(doc=455)
    
  5. Efron, M.; Winget, M.: Query polyrepresentation for ranking retrieval systems without relevance judgments (2010) 4.88
    4.8777785 = sum of:
      4.8777785 = weight(author_txt:efron in 456) [ClassicSimilarity], result of:
        4.8777785 = fieldWeight in 456, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.755557 = idf(docFreq=6, maxDocs=44421)
          0.5 = fieldNorm(doc=456)
    

Similar documents (content)

  1. Tsai, C.-F.; Hu, Y.-H.; Chen, Z.-Y.: Factors affecting rocchio-based pseudorelevance feedback in image retrieval (2015) 0.19
    0.18727422 = sum of:
      0.18727422 = product of:
        1.1704639 = sum of:
          0.09482933 = weight(abstract_txt:optimal in 2607) [ClassicSimilarity], result of:
            0.09482933 = score(doc=2607,freq=2.0), product of:
              0.16034947 = queryWeight, product of:
                1.7089666 = boost
                6.690832 = idf(docFreq=149, maxDocs=44421)
                0.014023416 = queryNorm
              0.59139156 = fieldWeight in 2607, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.690832 = idf(docFreq=149, maxDocs=44421)
                0.0625 = fieldNorm(doc=2607)
          0.05363328 = weight(abstract_txt:relevance in 2607) [ClassicSimilarity], result of:
            0.05363328 = score(doc=2607,freq=1.0), product of:
              0.17407991 = queryWeight, product of:
                2.5181937 = boost
                4.929532 = idf(docFreq=872, maxDocs=44421)
                0.014023416 = queryNorm
              0.30809575 = fieldWeight in 2607, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.929532 = idf(docFreq=872, maxDocs=44421)
                0.0625 = fieldNorm(doc=2607)
          0.16248348 = weight(abstract_txt:feedback in 2607) [ClassicSimilarity], result of:
            0.16248348 = score(doc=2607,freq=3.0), product of:
              0.2527114 = queryWeight, product of:
                3.0340815 = boost
                5.9394164 = idf(docFreq=317, maxDocs=44421)
                0.014023416 = queryNorm
              0.64296067 = fieldWeight in 2607, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.9394164 = idf(docFreq=317, maxDocs=44421)
                0.0625 = fieldNorm(doc=2607)
          0.8595178 = weight(abstract_txt:rocchio in 2607) [ClassicSimilarity], result of:
            0.8595178 = score(doc=2607,freq=5.0), product of:
              0.64710134 = queryWeight, product of:
                4.855131 = boost
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.014023416 = queryNorm
              1.3282584 = fieldWeight in 2607, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.0625 = fieldNorm(doc=2607)
        0.16 = coord(4/25)
    
  2. Zhu, W.Z.; Allen, R.B.: Document clustering using the LSI subspace signature model (2013) 0.18
    0.18255967 = sum of:
      0.18255967 = product of:
        0.6519988 = sum of:
          0.03258162 = weight(abstract_txt:indexing in 1690) [ClassicSimilarity], result of:
            0.03258162 = score(doc=1690,freq=2.0), product of:
              0.06778717 = queryWeight, product of:
                1.1111523 = boost
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.014023416 = queryNorm
              0.48064584 = fieldWeight in 1690, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.078125 = fieldNorm(doc=1690)
          0.043421082 = weight(abstract_txt:semantic in 1690) [ClassicSimilarity], result of:
            0.043421082 = score(doc=1690,freq=3.0), product of:
              0.071713746 = queryWeight, product of:
                1.142881 = boost
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.014023416 = queryNorm
              0.6054778 = fieldWeight in 1690, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.078125 = fieldNorm(doc=1690)
          0.23583621 = weight(abstract_txt:subspace in 1690) [ClassicSimilarity], result of:
            0.23583621 = score(doc=1690,freq=3.0), product of:
              0.17587292 = queryWeight, product of:
                1.2655646 = boost
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.014023416 = queryNorm
              1.3409467 = fieldWeight in 1690, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.078125 = fieldNorm(doc=1690)
          0.043901335 = weight(abstract_txt:space in 1690) [ClassicSimilarity], result of:
            0.043901335 = score(doc=1690,freq=1.0), product of:
              0.104190364 = queryWeight, product of:
                1.3775698 = boost
                5.393369 = idf(docFreq=548, maxDocs=44421)
                0.014023416 = queryNorm
              0.42135698 = fieldWeight in 1690, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.393369 = idf(docFreq=548, maxDocs=44421)
                0.078125 = fieldNorm(doc=1690)
          0.053036906 = weight(abstract_txt:model in 1690) [ClassicSimilarity], result of:
            0.053036906 = score(doc=1690,freq=4.0), product of:
              0.0852259 = queryWeight, product of:
                1.5259182 = boost
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.014023416 = queryNorm
              0.6223097 = fieldWeight in 1690, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.078125 = fieldNorm(doc=1690)
          0.07754912 = weight(abstract_txt:vector in 1690) [ClassicSimilarity], result of:
            0.07754912 = score(doc=1690,freq=1.0), product of:
              0.15225106 = queryWeight, product of:
                1.6652521 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.014023416 = queryNorm
              0.5093503 = fieldWeight in 1690, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.078125 = fieldNorm(doc=1690)
          0.16567254 = weight(abstract_txt:latent in 1690) [ClassicSimilarity], result of:
            0.16567254 = score(doc=1690,freq=3.0), product of:
              0.17510654 = queryWeight, product of:
                1.7858747 = boost
                6.9919376 = idf(docFreq=110, maxDocs=44421)
                0.014023416 = queryNorm
              0.94612426 = fieldWeight in 1690, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.9919376 = idf(docFreq=110, maxDocs=44421)
                0.078125 = fieldNorm(doc=1690)
        0.28 = coord(7/25)
    
  3. Pan, M.; Huang, J.X.; He, T.; Mao, Z.; Ying, Z.; Tu, X.: ¬A simple kernel co-occurrence-based enhancement for pseudo-relevance feedback (2020) 0.16
    0.15748425 = sum of:
      0.15748425 = product of:
        0.7874212 = sum of:
          0.044153534 = weight(abstract_txt:least in 678) [ClassicSimilarity], result of:
            0.044153534 = score(doc=678,freq=1.0), product of:
              0.121364795 = queryWeight, product of:
                1.4867783 = boost
                5.820935 = idf(docFreq=357, maxDocs=44421)
                0.014023416 = queryNorm
              0.36380842 = fieldWeight in 678, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.820935 = idf(docFreq=357, maxDocs=44421)
                0.0625 = fieldNorm(doc=678)
          0.030002205 = weight(abstract_txt:model in 678) [ClassicSimilarity], result of:
            0.030002205 = score(doc=678,freq=2.0), product of:
              0.0852259 = queryWeight, product of:
                1.5259182 = boost
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.014023416 = queryNorm
              0.35203153 = fieldWeight in 678, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.0625 = fieldNorm(doc=678)
          0.07584891 = weight(abstract_txt:relevance in 678) [ClassicSimilarity], result of:
            0.07584891 = score(doc=678,freq=2.0), product of:
              0.17407991 = queryWeight, product of:
                2.5181937 = boost
                4.929532 = idf(docFreq=872, maxDocs=44421)
                0.014023416 = queryNorm
              0.43571317 = fieldWeight in 678, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.929532 = idf(docFreq=872, maxDocs=44421)
                0.0625 = fieldNorm(doc=678)
          0.09380989 = weight(abstract_txt:feedback in 678) [ClassicSimilarity], result of:
            0.09380989 = score(doc=678,freq=1.0), product of:
              0.2527114 = queryWeight, product of:
                3.0340815 = boost
                5.9394164 = idf(docFreq=317, maxDocs=44421)
                0.014023416 = queryNorm
              0.37121353 = fieldWeight in 678, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9394164 = idf(docFreq=317, maxDocs=44421)
                0.0625 = fieldNorm(doc=678)
          0.5436067 = weight(abstract_txt:rocchio in 678) [ClassicSimilarity], result of:
            0.5436067 = score(doc=678,freq=2.0), product of:
              0.64710134 = queryWeight, product of:
                4.855131 = boost
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.014023416 = queryNorm
              0.8400643 = fieldWeight in 678, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.0625 = fieldNorm(doc=678)
        0.2 = coord(5/25)
    
  4. Kumar, C.A.; Radvansky, M.; Annapurna, J.: Analysis of Vector Space Model, Latent Semantic Indexing and Formal Concept Analysis for information retrieval (2012) 0.14
    0.13551387 = sum of:
      0.13551387 = product of:
        0.48397812 = sum of:
          0.027646422 = weight(abstract_txt:indexing in 3710) [ClassicSimilarity], result of:
            0.027646422 = score(doc=3710,freq=1.0), product of:
              0.06778717 = queryWeight, product of:
                1.1111523 = boost
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.014023416 = queryNorm
              0.4078415 = fieldWeight in 3710, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.09375 = fieldNorm(doc=3710)
          0.0425438 = weight(abstract_txt:semantic in 3710) [ClassicSimilarity], result of:
            0.0425438 = score(doc=3710,freq=2.0), product of:
              0.071713746 = queryWeight, product of:
                1.142881 = boost
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.014023416 = queryNorm
              0.5932447 = fieldWeight in 3710, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.09375 = fieldNorm(doc=3710)
          0.052681603 = weight(abstract_txt:space in 3710) [ClassicSimilarity], result of:
            0.052681603 = score(doc=3710,freq=1.0), product of:
              0.104190364 = queryWeight, product of:
                1.3775698 = boost
                5.393369 = idf(docFreq=548, maxDocs=44421)
                0.014023416 = queryNorm
              0.50562835 = fieldWeight in 3710, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.393369 = idf(docFreq=548, maxDocs=44421)
                0.09375 = fieldNorm(doc=3710)
          0.04500331 = weight(abstract_txt:model in 3710) [ClassicSimilarity], result of:
            0.04500331 = score(doc=3710,freq=2.0), product of:
              0.0852259 = queryWeight, product of:
                1.5259182 = boost
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.014023416 = queryNorm
              0.5280473 = fieldWeight in 3710, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.09375 = fieldNorm(doc=3710)
          0.09305895 = weight(abstract_txt:vector in 3710) [ClassicSimilarity], result of:
            0.09305895 = score(doc=3710,freq=1.0), product of:
              0.15225106 = queryWeight, product of:
                1.6652521 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.014023416 = queryNorm
              0.61122036 = fieldWeight in 3710, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.09375 = fieldNorm(doc=3710)
          0.1623253 = weight(abstract_txt:latent in 3710) [ClassicSimilarity], result of:
            0.1623253 = score(doc=3710,freq=2.0), product of:
              0.17510654 = queryWeight, product of:
                1.7858747 = boost
                6.9919376 = idf(docFreq=110, maxDocs=44421)
                0.014023416 = queryNorm
              0.92700875 = fieldWeight in 3710, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.9919376 = idf(docFreq=110, maxDocs=44421)
                0.09375 = fieldNorm(doc=3710)
          0.060718752 = weight(abstract_txt:relationship in 3710) [ClassicSimilarity], result of:
            0.060718752 = score(doc=3710,freq=1.0), product of:
              0.13110942 = queryWeight, product of:
                1.8926156 = boost
                4.9398947 = idf(docFreq=863, maxDocs=44421)
                0.014023416 = queryNorm
              0.46311513 = fieldWeight in 3710, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9398947 = idf(docFreq=863, maxDocs=44421)
                0.09375 = fieldNorm(doc=3710)
        0.28 = coord(7/25)
    
  5. Layfield, C.; Azzopardi, J,; Staff, C.: Experiments with document retrieval from small text collections using Latent Semantic Analysis or term similarity with query coordination and automatic relevance feedback (2017) 0.13
    0.13030286 = sum of:
      0.13030286 = product of:
        0.40719646 = sum of:
          0.01612708 = weight(abstract_txt:indexing in 4478) [ClassicSimilarity], result of:
            0.01612708 = score(doc=4478,freq=1.0), product of:
              0.06778717 = queryWeight, product of:
                1.1111523 = boost
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.014023416 = queryNorm
              0.23790754 = fieldWeight in 4478, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.0546875 = fieldNorm(doc=4478)
          0.030394757 = weight(abstract_txt:semantic in 4478) [ClassicSimilarity], result of:
            0.030394757 = score(doc=4478,freq=3.0), product of:
              0.071713746 = queryWeight, product of:
                1.142881 = boost
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.014023416 = queryNorm
              0.42383447 = fieldWeight in 4478, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.0546875 = fieldNorm(doc=4478)
          0.030730937 = weight(abstract_txt:space in 4478) [ClassicSimilarity], result of:
            0.030730937 = score(doc=4478,freq=1.0), product of:
              0.104190364 = queryWeight, product of:
                1.3775698 = boost
                5.393369 = idf(docFreq=548, maxDocs=44421)
                0.014023416 = queryNorm
              0.2949499 = fieldWeight in 4478, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.393369 = idf(docFreq=548, maxDocs=44421)
                0.0546875 = fieldNorm(doc=4478)
          0.026251929 = weight(abstract_txt:model in 4478) [ClassicSimilarity], result of:
            0.026251929 = score(doc=4478,freq=2.0), product of:
              0.0852259 = queryWeight, product of:
                1.5259182 = boost
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.014023416 = queryNorm
              0.3080276 = fieldWeight in 4478, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.9827821 = idf(docFreq=2249, maxDocs=44421)
                0.0546875 = fieldNorm(doc=4478)
          0.054284386 = weight(abstract_txt:vector in 4478) [ClassicSimilarity], result of:
            0.054284386 = score(doc=4478,freq=1.0), product of:
              0.15225106 = queryWeight, product of:
                1.6652521 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.014023416 = queryNorm
              0.3565452 = fieldWeight in 4478, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.0546875 = fieldNorm(doc=4478)
          0.06695577 = weight(abstract_txt:latent in 4478) [ClassicSimilarity], result of:
            0.06695577 = score(doc=4478,freq=1.0), product of:
              0.17510654 = queryWeight, product of:
                1.7858747 = boost
                6.9919376 = idf(docFreq=110, maxDocs=44421)
                0.014023416 = queryNorm
              0.3823716 = fieldWeight in 4478, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9919376 = idf(docFreq=110, maxDocs=44421)
                0.0546875 = fieldNorm(doc=4478)
          0.06636779 = weight(abstract_txt:relevance in 4478) [ClassicSimilarity], result of:
            0.06636779 = score(doc=4478,freq=2.0), product of:
              0.17407991 = queryWeight, product of:
                2.5181937 = boost
                4.929532 = idf(docFreq=872, maxDocs=44421)
                0.014023416 = queryNorm
              0.381249 = fieldWeight in 4478, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.929532 = idf(docFreq=872, maxDocs=44421)
                0.0546875 = fieldNorm(doc=4478)
          0.1160838 = weight(abstract_txt:feedback in 4478) [ClassicSimilarity], result of:
            0.1160838 = score(doc=4478,freq=2.0), product of:
              0.2527114 = queryWeight, product of:
                3.0340815 = boost
                5.9394164 = idf(docFreq=317, maxDocs=44421)
                0.014023416 = queryNorm
              0.45935327 = fieldWeight in 4478, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.9394164 = idf(docFreq=317, maxDocs=44421)
                0.0546875 = fieldNorm(doc=4478)
        0.32 = coord(8/25)