Document (#32579)

Author
Chen, Z.
Fu, B.
Title
On the complexity of Rocchio's similarity-based relevance feedback algorithm
Source
Journal of the American Society for Information Science and Technology. 58(2007) no.10, S.1392-1400
Year
2007
Abstract
Rocchio's similarity-based relevance feedback algorithm, one of the most important query reformation methods in information retrieval, is essentially an adaptive learning algorithm from examples in searching for documents represented by a linear classifier. Despite its popularity in various applications, there is little rigorous analysis of its learning complexity in literature. In this article, the authors prove for the first time that the learning complexity of Rocchio's algorithm is O(d + d**2(log d + log n)) over the discretized vector space {0, ... , n - 1 }**d when the inner product similarity measure is used. The upper bound on the learning complexity for searching for documents represented by a monotone linear classifier (q, 0) over {0, ... , n - 1 }d can be improved to, at most, 1 + 2k (n - 1) (log d + log(n - 1)), where k is the number of nonzero components in q. Several lower bounds on the learning complexity are also obtained for Rocchio's algorithm. For example, the authors prove that Rocchio's algorithm has a lower bound Omega((d über 2)log n) on its learning complexity over the Boolean vector space {0,1}**d.
Theme
Retrievalalgorithmen
Object
Rocchio-Feedback-Algorithmus

Similar documents (author)

  1. Chen, Y.N.; Chen, S.J.: ¬A metadata practice of the OFLA FRBR model : a case study for the National Palace Museum in Taipai (2004) 4.34
    4.3394766 = sum of:
      4.3394766 = weight(author_txt:chen in 4384) [ClassicSimilarity], result of:
        4.3394766 = score(doc=4384,freq=2.0), product of:
          0.99999994 = queryWeight, product of:
            6.136947 = idf(docFreq=260, maxDocs=44421)
            0.16294746 = queryNorm
          4.339477 = fieldWeight in 4384, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            6.136947 = idf(docFreq=260, maxDocs=44421)
            0.5 = fieldNorm(doc=4384)
    
  2. Chen, C.C.; Chen, H.H.; Chen, K.H.: ¬The design of the XML/Metadata management system (2000) 3.99
    3.9860637 = sum of:
      3.9860637 = weight(author_txt:chen in 5633) [ClassicSimilarity], result of:
        3.9860637 = score(doc=5633,freq=3.0), product of:
          0.99999994 = queryWeight, product of:
            6.136947 = idf(docFreq=260, maxDocs=44421)
            0.16294746 = queryNorm
          3.986064 = fieldWeight in 5633, product of:
            1.7320508 = tf(freq=3.0), with freq of:
              3.0 = termFreq=3.0
            6.136947 = idf(docFreq=260, maxDocs=44421)
            0.375 = fieldNorm(doc=5633)
    
  3. Chen, W.Y.: Observations on cataloguing and classification (1991) 3.84
    3.8355918 = sum of:
      3.8355918 = weight(author_txt:chen in 4183) [ClassicSimilarity], result of:
        3.8355918 = score(doc=4183,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            6.136947 = idf(docFreq=260, maxDocs=44421)
            0.16294746 = queryNorm
          3.835592 = fieldWeight in 4183, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            6.136947 = idf(docFreq=260, maxDocs=44421)
            0.625 = fieldNorm(doc=4183)
    
  4. Chen, H.: Knowledge-based document retrieval : framework and design (1992) 3.84
    3.8355918 = sum of:
      3.8355918 = weight(author_txt:chen in 5282) [ClassicSimilarity], result of:
        3.8355918 = score(doc=5282,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            6.136947 = idf(docFreq=260, maxDocs=44421)
            0.16294746 = queryNorm
          3.835592 = fieldWeight in 5282, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            6.136947 = idf(docFreq=260, maxDocs=44421)
            0.625 = fieldNorm(doc=5282)
    
  5. Chen, P.S.: On inference rules of logic-based information retrieval systems (1994) 3.84
    3.8355918 = sum of:
      3.8355918 = weight(author_txt:chen in 6730) [ClassicSimilarity], result of:
        3.8355918 = score(doc=6730,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            6.136947 = idf(docFreq=260, maxDocs=44421)
            0.16294746 = queryNorm
          3.835592 = fieldWeight in 6730, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            6.136947 = idf(docFreq=260, maxDocs=44421)
            0.625 = fieldNorm(doc=6730)
    

Similar documents (content)

  1. Efron, M.: Query expansion and dimensionality reduction : Notions of optimality in Rocchio relevance feedback and latent semantic indexing (2008) 0.20
    0.20346536 = sum of:
      0.20346536 = product of:
        0.84777236 = sum of:
          0.01611864 = weight(abstract_txt:documents in 3020) [ClassicSimilarity], result of:
            0.01611864 = score(doc=3020,freq=1.0), product of:
              0.050036985 = queryWeight, product of:
                1.122793 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.010807972 = queryNorm
              0.32213452 = fieldWeight in 3020, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.078125 = fieldNorm(doc=3020)
          0.047705054 = weight(abstract_txt:relevance in 3020) [ClassicSimilarity], result of:
            0.047705054 = score(doc=3020,freq=3.0), product of:
              0.0715168 = queryWeight, product of:
                1.3423264 = boost
                4.929532 = idf(docFreq=872, maxDocs=44421)
                0.010807972 = queryNorm
              0.66704684 = fieldWeight in 3020, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.929532 = idf(docFreq=872, maxDocs=44421)
                0.078125 = fieldNorm(doc=3020)
          0.051013153 = weight(abstract_txt:space in 3020) [ClassicSimilarity], result of:
            0.051013153 = score(doc=3020,freq=2.0), product of:
              0.08560852 = queryWeight, product of:
                1.4686307 = boost
                5.393369 = idf(docFreq=548, maxDocs=44421)
                0.010807972 = queryNorm
              0.59588873 = fieldWeight in 3020, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.393369 = idf(docFreq=548, maxDocs=44421)
                0.078125 = fieldNorm(doc=3020)
          0.08344083 = weight(abstract_txt:feedback in 3020) [ClassicSimilarity], result of:
            0.08344083 = score(doc=3020,freq=3.0), product of:
              0.10382076 = queryWeight, product of:
                1.617321 = boost
                5.9394164 = idf(docFreq=317, maxDocs=44421)
                0.010807972 = queryNorm
              0.8037008 = fieldWeight in 3020, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.9394164 = idf(docFreq=317, maxDocs=44421)
                0.078125 = fieldNorm(doc=3020)
          0.09011173 = weight(abstract_txt:vector in 3020) [ClassicSimilarity], result of:
            0.09011173 = score(doc=3020,freq=2.0), product of:
              0.12509783 = queryWeight, product of:
                1.7753296 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.010807972 = queryNorm
              0.7203301 = fieldWeight in 3020, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.078125 = fieldNorm(doc=3020)
          0.559383 = weight(abstract_txt:rocchio's in 3020) [ClassicSimilarity], result of:
            0.559383 = score(doc=3020,freq=1.0), product of:
              0.7225341 = queryWeight, product of:
                6.746108 = boost
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.010807972 = queryNorm
              0.7741959 = fieldWeight in 3020, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.078125 = fieldNorm(doc=3020)
        0.24 = coord(6/25)
    
  2. Ruiz, M.E.; Srinivasan, P.: Combining machine learning and hierarchical indexing structures for text categorization (2001) 0.18
    0.17895393 = sum of:
      0.17895393 = product of:
        1.1184621 = sum of:
          0.105018154 = weight(abstract_txt:classifier in 2595) [ClassicSimilarity], result of:
            0.105018154 = score(doc=2595,freq=1.0), product of:
              0.15457086 = queryWeight, product of:
                1.9734128 = boost
                7.2471204 = idf(docFreq=85, maxDocs=44421)
                0.010807972 = queryNorm
              0.67941755 = fieldWeight in 2595, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2471204 = idf(docFreq=85, maxDocs=44421)
                0.09375 = fieldNorm(doc=2595)
          0.124826804 = weight(abstract_txt:learning in 2595) [ClassicSimilarity], result of:
            0.124826804 = score(doc=2595,freq=2.0), product of:
              0.19854261 = queryWeight, product of:
                3.8738391 = boost
                4.7420692 = idf(docFreq=1052, maxDocs=44421)
                0.010807972 = queryNorm
              0.62871546 = fieldWeight in 2595, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.7420692 = idf(docFreq=1052, maxDocs=44421)
                0.09375 = fieldNorm(doc=2595)
          0.21735756 = weight(abstract_txt:algorithm in 2595) [ClassicSimilarity], result of:
            0.21735756 = score(doc=2595,freq=2.0), product of:
              0.28736353 = queryWeight, product of:
                4.660479 = boost
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.010807972 = queryNorm
              0.7563853 = fieldWeight in 2595, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.09375 = fieldNorm(doc=2595)
          0.6712596 = weight(abstract_txt:rocchio's in 2595) [ClassicSimilarity], result of:
            0.6712596 = score(doc=2595,freq=1.0), product of:
              0.7225341 = queryWeight, product of:
                6.746108 = boost
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.010807972 = queryNorm
              0.9290351 = fieldWeight in 2595, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.09375 = fieldNorm(doc=2595)
        0.16 = coord(4/25)
    
  3. Mengle, S.S.R.; Goharian, N.: Ambiguity measure feature-selection algorithm (2009) 0.15
    0.14684309 = sum of:
      0.14684309 = product of:
        0.52443963 = sum of:
          0.011269876 = weight(abstract_txt:most in 3804) [ClassicSimilarity], result of:
            0.011269876 = score(doc=3804,freq=1.0), product of:
              0.045739524 = queryWeight, product of:
                1.0734948 = boost
                3.94228 = idf(docFreq=2342, maxDocs=44421)
                0.010807972 = queryNorm
              0.2463925 = fieldWeight in 3804, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.94228 = idf(docFreq=2342, maxDocs=44421)
                0.0625 = fieldNorm(doc=3804)
          0.01823616 = weight(abstract_txt:documents in 3804) [ClassicSimilarity], result of:
            0.01823616 = score(doc=3804,freq=2.0), product of:
              0.050036985 = queryWeight, product of:
                1.122793 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.010807972 = queryNorm
              0.3644536 = fieldWeight in 3804, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.0625 = fieldNorm(doc=3804)
          0.028857397 = weight(abstract_txt:space in 3804) [ClassicSimilarity], result of:
            0.028857397 = score(doc=3804,freq=1.0), product of:
              0.08560852 = queryWeight, product of:
                1.4686307 = boost
                5.393369 = idf(docFreq=548, maxDocs=44421)
                0.010807972 = queryNorm
              0.33708557 = fieldWeight in 3804, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.393369 = idf(docFreq=548, maxDocs=44421)
                0.0625 = fieldNorm(doc=3804)
          0.05097489 = weight(abstract_txt:vector in 3804) [ClassicSimilarity], result of:
            0.05097489 = score(doc=3804,freq=1.0), product of:
              0.12509783 = queryWeight, product of:
                1.7753296 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.010807972 = queryNorm
              0.40748024 = fieldWeight in 3804, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.0625 = fieldNorm(doc=3804)
          0.15655182 = weight(abstract_txt:classifier in 3804) [ClassicSimilarity], result of:
            0.15655182 = score(doc=3804,freq=5.0), product of:
              0.15457086 = queryWeight, product of:
                1.9734128 = boost
                7.2471204 = idf(docFreq=85, maxDocs=44421)
                0.010807972 = queryNorm
              1.0128158 = fieldWeight in 3804, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.2471204 = idf(docFreq=85, maxDocs=44421)
                0.0625 = fieldNorm(doc=3804)
          0.14490505 = weight(abstract_txt:algorithm in 3804) [ClassicSimilarity], result of:
            0.14490505 = score(doc=3804,freq=2.0), product of:
              0.28736353 = queryWeight, product of:
                4.660479 = boost
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.010807972 = queryNorm
              0.5042569 = fieldWeight in 3804, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.0625 = fieldNorm(doc=3804)
          0.1136444 = weight(abstract_txt:complexity in 3804) [ClassicSimilarity], result of:
            0.1136444 = score(doc=3804,freq=1.0), product of:
              0.30790588 = queryWeight, product of:
                4.8241825 = boost
                5.90541 = idf(docFreq=328, maxDocs=44421)
                0.010807972 = queryNorm
              0.3690881 = fieldWeight in 3804, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.90541 = idf(docFreq=328, maxDocs=44421)
                0.0625 = fieldNorm(doc=3804)
        0.28 = coord(7/25)
    
  4. Borodin, Y.; Polishchuk, V.; Mahmud, J.; Ramakrishnan, I.V.; Stent, A.: Live and learn from mistakes : a lightweight system for document classification (2013) 0.14
    0.14383858 = sum of:
      0.14383858 = product of:
        0.51370925 = sum of:
          0.022334645 = weight(abstract_txt:documents in 3722) [ClassicSimilarity], result of:
            0.022334645 = score(doc=3722,freq=3.0), product of:
              0.050036985 = queryWeight, product of:
                1.122793 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.010807972 = queryNorm
              0.4463627 = fieldWeight in 3722, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.0625 = fieldNorm(doc=3722)
          0.028857397 = weight(abstract_txt:space in 3722) [ClassicSimilarity], result of:
            0.028857397 = score(doc=3722,freq=1.0), product of:
              0.08560852 = queryWeight, product of:
                1.4686307 = boost
                5.393369 = idf(docFreq=548, maxDocs=44421)
                0.010807972 = queryNorm
              0.33708557 = fieldWeight in 3722, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.393369 = idf(docFreq=548, maxDocs=44421)
                0.0625 = fieldNorm(doc=3722)
          0.03853967 = weight(abstract_txt:feedback in 3722) [ClassicSimilarity], result of:
            0.03853967 = score(doc=3722,freq=1.0), product of:
              0.10382076 = queryWeight, product of:
                1.617321 = boost
                5.9394164 = idf(docFreq=317, maxDocs=44421)
                0.010807972 = queryNorm
              0.37121353 = fieldWeight in 3722, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9394164 = idf(docFreq=317, maxDocs=44421)
                0.0625 = fieldNorm(doc=3722)
          0.03649699 = weight(abstract_txt:over in 3722) [ClassicSimilarity], result of:
            0.03649699 = score(doc=3722,freq=3.0), product of:
              0.07946458 = queryWeight, product of:
                1.732952 = boost
                4.242705 = idf(docFreq=1734, maxDocs=44421)
                0.010807972 = queryNorm
              0.45928627 = fieldWeight in 3722, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.242705 = idf(docFreq=1734, maxDocs=44421)
                0.0625 = fieldNorm(doc=3722)
          0.05097489 = weight(abstract_txt:vector in 3722) [ClassicSimilarity], result of:
            0.05097489 = score(doc=3722,freq=1.0), product of:
              0.12509783 = queryWeight, product of:
                1.7753296 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.010807972 = queryNorm
              0.40748024 = fieldWeight in 3722, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.0625 = fieldNorm(doc=3722)
          0.13157901 = weight(abstract_txt:learning in 3722) [ClassicSimilarity], result of:
            0.13157901 = score(doc=3722,freq=5.0), product of:
              0.19854261 = queryWeight, product of:
                3.8738391 = boost
                4.7420692 = idf(docFreq=1052, maxDocs=44421)
                0.010807972 = queryNorm
              0.6627243 = fieldWeight in 3722, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.7420692 = idf(docFreq=1052, maxDocs=44421)
                0.0625 = fieldNorm(doc=3722)
          0.20492668 = weight(abstract_txt:algorithm in 3722) [ClassicSimilarity], result of:
            0.20492668 = score(doc=3722,freq=4.0), product of:
              0.28736353 = queryWeight, product of:
                4.660479 = boost
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.010807972 = queryNorm
              0.71312696 = fieldWeight in 3722, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.0625 = fieldNorm(doc=3722)
        0.28 = coord(7/25)
    
  5. Widyantoro, D.H.; Ioerger, T.R.; Yen, J.: Learning user Interest dynamics with a three-descriptor representation (2001) 0.14
    0.14236145 = sum of:
      0.14236145 = product of:
        0.71180725 = sum of:
          0.01823616 = weight(abstract_txt:documents in 185) [ClassicSimilarity], result of:
            0.01823616 = score(doc=185,freq=2.0), product of:
              0.050036985 = queryWeight, product of:
                1.122793 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.010807972 = queryNorm
              0.3644536 = fieldWeight in 185, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.0625 = fieldNorm(doc=185)
          0.066752665 = weight(abstract_txt:feedback in 185) [ClassicSimilarity], result of:
            0.066752665 = score(doc=185,freq=3.0), product of:
              0.10382076 = queryWeight, product of:
                1.617321 = boost
                5.9394164 = idf(docFreq=317, maxDocs=44421)
                0.010807972 = queryNorm
              0.64296067 = fieldWeight in 185, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.9394164 = idf(docFreq=317, maxDocs=44421)
                0.0625 = fieldNorm(doc=185)
          0.076848716 = weight(abstract_txt:similarity in 185) [ClassicSimilarity], result of:
            0.076848716 = score(doc=185,freq=2.0), product of:
              0.14943662 = queryWeight, product of:
                2.3764477 = boost
                5.8181453 = idf(docFreq=358, maxDocs=44421)
                0.010807972 = queryNorm
              0.51425624 = fieldWeight in 185, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.8181453 = idf(docFreq=358, maxDocs=44421)
                0.0625 = fieldNorm(doc=185)
          0.10246334 = weight(abstract_txt:algorithm in 185) [ClassicSimilarity], result of:
            0.10246334 = score(doc=185,freq=1.0), product of:
              0.28736353 = queryWeight, product of:
                4.660479 = boost
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.010807972 = queryNorm
              0.35656348 = fieldWeight in 185, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.0625 = fieldNorm(doc=185)
          0.4475064 = weight(abstract_txt:rocchio's in 185) [ClassicSimilarity], result of:
            0.4475064 = score(doc=185,freq=1.0), product of:
              0.7225341 = queryWeight, product of:
                6.746108 = boost
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.010807972 = queryNorm
              0.61935675 = fieldWeight in 185, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.0625 = fieldNorm(doc=185)
        0.2 = coord(5/25)