Document (#27582)

Author
Drucker, H.
Shahrary, B.
Gibbon, D.C.
Title
Support vector machines : relevance feedback and information retrieval
Source
Information processing and management. 38(2002) no.3, S.305-323
Year
2002
Abstract
We compare support vector machines (SVMs) to Rocchio, Ide regular and Ide dec-hi algorithms in information retrieval (IR) of text documents using relevancy feedback. It is assumed a preliminary search finds a set of documents that the user marks as relevant or not and then feedback iterations commence. Particular attention is paid to IR searches where the number of relevant documents in the database is low and the preliminary set of documents used to start the search has few relevant documents. Experiments show that if inverse document frequency (IDF) weighting is not used because one is unwilling to pay the time penalty needed to obtain these features, then SVMs are better whether using term-frequency (TF) or binary weighting. SVM performance is marginally better than Ide dec-hi if TF-IDF weighting is used and there is a reasonable number of relevant documents found in the preliminary search. If the preliminary search is so poor that one has to search through many documents to find at least one relevant document, then SVM is preferred.
Theme
Retrievalalgorithmen

Similar documents (content)

  1. Sormunen, E.; Kekäläinen, J.; Koivisto, J.; Järvelin, K.: Document text characteristics affect the ranking of the most relevant documents by expanded structured queries (2001) 0.25
    0.24510619 = sum of:
      0.24510619 = product of:
        0.7659569 = sum of:
          0.020829326 = weight(abstract_txt:number in 5487) [ClassicSimilarity], result of:
            0.020829326 = score(doc=5487,freq=1.0), product of:
              0.080584005 = queryWeight, product of:
                1.1376352 = boost
                4.1356745 = idf(docFreq=1930, maxDocs=44421)
                0.017127717 = queryNorm
              0.25847965 = fieldWeight in 5487, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1356745 = idf(docFreq=1930, maxDocs=44421)
                0.0625 = fieldNorm(doc=5487)
          0.02331674 = weight(abstract_txt:document in 5487) [ClassicSimilarity], result of:
            0.02331674 = score(doc=5487,freq=1.0), product of:
              0.08687816 = queryWeight, product of:
                1.1812285 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.017127717 = queryNorm
              0.26838437 = fieldWeight in 5487, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.0625 = fieldNorm(doc=5487)
          0.17786124 = weight(abstract_txt:marginally in 5487) [ClassicSimilarity], result of:
            0.17786124 = score(doc=5487,freq=3.0), product of:
              0.1852689 = queryWeight, product of:
                1.2197332 = boost
                8.868255 = idf(docFreq=16, maxDocs=44421)
                0.017127717 = queryNorm
              0.9600167 = fieldWeight in 5487, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.868255 = idf(docFreq=16, maxDocs=44421)
                0.0625 = fieldNorm(doc=5487)
          0.03178322 = weight(abstract_txt:better in 5487) [ClassicSimilarity], result of:
            0.03178322 = score(doc=5487,freq=1.0), product of:
              0.106806405 = queryWeight, product of:
                1.3097163 = boost
                4.7612453 = idf(docFreq=1032, maxDocs=44421)
                0.017127717 = queryNorm
              0.29757783 = fieldWeight in 5487, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7612453 = idf(docFreq=1032, maxDocs=44421)
                0.0625 = fieldNorm(doc=5487)
          0.016713182 = weight(abstract_txt:used in 5487) [ClassicSimilarity], result of:
            0.016713182 = score(doc=5487,freq=1.0), product of:
              0.07965295 = queryWeight, product of:
                1.3852404 = boost
                3.3572001 = idf(docFreq=4205, maxDocs=44421)
                0.017127717 = queryNorm
              0.20982501 = fieldWeight in 5487, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3572001 = idf(docFreq=4205, maxDocs=44421)
                0.0625 = fieldNorm(doc=5487)
          0.03593313 = weight(abstract_txt:search in 5487) [ClassicSimilarity], result of:
            0.03593313 = score(doc=5487,freq=1.0), product of:
              0.15731691 = queryWeight, product of:
                2.5132537 = boost
                3.654598 = idf(docFreq=3123, maxDocs=44421)
                0.017127717 = queryNorm
              0.22841237 = fieldWeight in 5487, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.654598 = idf(docFreq=3123, maxDocs=44421)
                0.0625 = fieldNorm(doc=5487)
          0.23104121 = weight(abstract_txt:relevant in 5487) [ClassicSimilarity], result of:
            0.23104121 = score(doc=5487,freq=10.0), product of:
              0.25248662 = queryWeight, product of:
                3.1839614 = boost
                4.6298943 = idf(docFreq=1177, maxDocs=44421)
                0.017127717 = queryNorm
              0.9150632 = fieldWeight in 5487, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                4.6298943 = idf(docFreq=1177, maxDocs=44421)
                0.0625 = fieldNorm(doc=5487)
          0.22847886 = weight(abstract_txt:documents in 5487) [ClassicSimilarity], result of:
            0.22847886 = score(doc=5487,freq=10.0), product of:
              0.28036174 = queryWeight, product of:
                3.969831 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.017127717 = queryNorm
              0.8149431 = fieldWeight in 5487, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.0625 = fieldNorm(doc=5487)
        0.32 = coord(8/25)
    
  2. Ruthven, I.; Lalmas, M.; Rijsbergen, K. van: Combining and selecting characteristics of information use (2002) 0.23
    0.23179634 = sum of:
      0.23179634 = product of:
        0.72436357 = sum of:
          0.05029754 = weight(abstract_txt:inverse in 208) [ClassicSimilarity], result of:
            0.05029754 = score(doc=208,freq=1.0), product of:
              0.13945873 = queryWeight, product of:
                1.0582455 = boost
                7.694134 = idf(docFreq=54, maxDocs=44421)
                0.017127717 = queryNorm
              0.36066255 = fieldWeight in 208, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.694134 = idf(docFreq=54, maxDocs=44421)
                0.046875 = fieldNorm(doc=208)
          0.049462274 = weight(abstract_txt:document in 208) [ClassicSimilarity], result of:
            0.049462274 = score(doc=208,freq=8.0), product of:
              0.08687816 = queryWeight, product of:
                1.1812285 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.017127717 = queryNorm
              0.5693292 = fieldWeight in 208, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.046875 = fieldNorm(doc=208)
          0.080531724 = weight(abstract_txt:frequency in 208) [ClassicSimilarity], result of:
            0.080531724 = score(doc=208,freq=3.0), product of:
              0.16673577 = queryWeight, product of:
                1.6364133 = boost
                5.948895 = idf(docFreq=314, maxDocs=44421)
                0.017127717 = queryNorm
              0.4829901 = fieldWeight in 208, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.948895 = idf(docFreq=314, maxDocs=44421)
                0.046875 = fieldNorm(doc=208)
          0.056235388 = weight(abstract_txt:then in 208) [ClassicSimilarity], result of:
            0.056235388 = score(doc=208,freq=3.0), product of:
              0.1502292 = queryWeight, product of:
                1.9023981 = boost
                4.6105576 = idf(docFreq=1200, maxDocs=44421)
                0.017127717 = queryNorm
              0.3743306 = fieldWeight in 208, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.6105576 = idf(docFreq=1200, maxDocs=44421)
                0.046875 = fieldNorm(doc=208)
          0.0981601 = weight(abstract_txt:feedback in 208) [ClassicSimilarity], result of:
            0.0981601 = score(doc=208,freq=2.0), product of:
              0.24930729 = queryWeight, product of:
                2.4507089 = boost
                5.9394164 = idf(docFreq=317, maxDocs=44421)
                0.017127717 = queryNorm
              0.3937314 = fieldWeight in 208, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.9394164 = idf(docFreq=317, maxDocs=44421)
                0.046875 = fieldNorm(doc=208)
          0.15891482 = weight(abstract_txt:weighting in 208) [ClassicSimilarity], result of:
            0.15891482 = score(doc=208,freq=2.0), product of:
              0.343733 = queryWeight, product of:
                2.8776295 = boost
                6.9740796 = idf(docFreq=112, maxDocs=44421)
                0.017127717 = queryNorm
              0.4623205 = fieldWeight in 208, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.9740796 = idf(docFreq=112, maxDocs=44421)
                0.046875 = fieldNorm(doc=208)
          0.10959247 = weight(abstract_txt:relevant in 208) [ClassicSimilarity], result of:
            0.10959247 = score(doc=208,freq=4.0), product of:
              0.25248662 = queryWeight, product of:
                3.1839614 = boost
                4.6298943 = idf(docFreq=1177, maxDocs=44421)
                0.017127717 = queryNorm
              0.4340526 = fieldWeight in 208, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.6298943 = idf(docFreq=1177, maxDocs=44421)
                0.046875 = fieldNorm(doc=208)
          0.121169224 = weight(abstract_txt:documents in 208) [ClassicSimilarity], result of:
            0.121169224 = score(doc=208,freq=5.0), product of:
              0.28036174 = queryWeight, product of:
                3.969831 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.017127717 = queryNorm
              0.43218887 = fieldWeight in 208, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.046875 = fieldNorm(doc=208)
        0.32 = coord(8/25)
    
  3. Bodoff, D.; Wu, B.; Wong, K.Y.M.: Relevance data for language models using maximum likelihood (2003) 0.22
    0.21537703 = sum of:
      0.21537703 = product of:
        0.76920366 = sum of:
          0.123368435 = weight(abstract_txt:relevancy in 2822) [ClassicSimilarity], result of:
            0.123368435 = score(doc=2822,freq=1.0), product of:
              0.159783 = queryWeight, product of:
                1.1327366 = boost
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.017127717 = queryNorm
              0.77209985 = fieldWeight in 2822, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.09375 = fieldNorm(doc=2822)
          0.049462274 = weight(abstract_txt:document in 2822) [ClassicSimilarity], result of:
            0.049462274 = score(doc=2822,freq=2.0), product of:
              0.08687816 = queryWeight, product of:
                1.1812285 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.017127717 = queryNorm
              0.5693292 = fieldWeight in 2822, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.09375 = fieldNorm(doc=2822)
          0.18960512 = weight(abstract_txt:rocchio in 2822) [ClassicSimilarity], result of:
            0.18960512 = score(doc=2822,freq=1.0), product of:
              0.21279491 = queryWeight, product of:
                1.3072066 = boost
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.017127717 = queryNorm
              0.8910228 = fieldWeight in 2822, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.09375 = fieldNorm(doc=2822)
          0.047674824 = weight(abstract_txt:better in 2822) [ClassicSimilarity], result of:
            0.047674824 = score(doc=2822,freq=1.0), product of:
              0.106806405 = queryWeight, product of:
                1.3097163 = boost
                4.7612453 = idf(docFreq=1032, maxDocs=44421)
                0.017127717 = queryNorm
              0.44636673 = fieldWeight in 2822, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7612453 = idf(docFreq=1032, maxDocs=44421)
                0.09375 = fieldNorm(doc=2822)
          0.02506977 = weight(abstract_txt:used in 2822) [ClassicSimilarity], result of:
            0.02506977 = score(doc=2822,freq=1.0), product of:
              0.07965295 = queryWeight, product of:
                1.3852404 = boost
                3.3572001 = idf(docFreq=4205, maxDocs=44421)
                0.017127717 = queryNorm
              0.3147375 = fieldWeight in 2822, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3572001 = idf(docFreq=4205, maxDocs=44421)
                0.09375 = fieldNorm(doc=2822)
          0.2256462 = weight(abstract_txt:preliminary in 2822) [ClassicSimilarity], result of:
            0.2256462 = score(doc=2822,freq=1.0), product of:
              0.379344 = queryWeight, product of:
                3.4906814 = boost
                6.3448815 = idf(docFreq=211, maxDocs=44421)
                0.017127717 = queryNorm
              0.59483266 = fieldWeight in 2822, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3448815 = idf(docFreq=211, maxDocs=44421)
                0.09375 = fieldNorm(doc=2822)
          0.10837704 = weight(abstract_txt:documents in 2822) [ClassicSimilarity], result of:
            0.10837704 = score(doc=2822,freq=1.0), product of:
              0.28036174 = queryWeight, product of:
                3.969831 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.017127717 = queryNorm
              0.38656145 = fieldWeight in 2822, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.09375 = fieldNorm(doc=2822)
        0.28 = coord(7/25)
    
  4. Smith, M.P.; Pollitt, S.A.: ¬A comparison of ranking formulae and their ranks (1995) 0.21
    0.2106828 = sum of:
      0.2106828 = product of:
        0.75243855 = sum of:
          0.058219735 = weight(abstract_txt:number in 5870) [ClassicSimilarity], result of:
            0.058219735 = score(doc=5870,freq=5.0), product of:
              0.080584005 = queryWeight, product of:
                1.1376352 = boost
                4.1356745 = idf(docFreq=1930, maxDocs=44421)
                0.017127717 = queryNorm
              0.7224726 = fieldWeight in 5870, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.1356745 = idf(docFreq=1930, maxDocs=44421)
                0.078125 = fieldNorm(doc=5870)
          0.029145924 = weight(abstract_txt:document in 5870) [ClassicSimilarity], result of:
            0.029145924 = score(doc=5870,freq=1.0), product of:
              0.08687816 = queryWeight, product of:
                1.1812285 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.017127717 = queryNorm
              0.33548045 = fieldWeight in 5870, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.078125 = fieldNorm(doc=5870)
          0.020891476 = weight(abstract_txt:used in 5870) [ClassicSimilarity], result of:
            0.020891476 = score(doc=5870,freq=1.0), product of:
              0.07965295 = queryWeight, product of:
                1.3852404 = boost
                3.3572001 = idf(docFreq=4205, maxDocs=44421)
                0.017127717 = queryNorm
              0.26228127 = fieldWeight in 5870, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3572001 = idf(docFreq=4205, maxDocs=44421)
                0.078125 = fieldNorm(doc=5870)
          0.077491686 = weight(abstract_txt:frequency in 5870) [ClassicSimilarity], result of:
            0.077491686 = score(doc=5870,freq=1.0), product of:
              0.16673577 = queryWeight, product of:
                1.6364133 = boost
                5.948895 = idf(docFreq=314, maxDocs=44421)
                0.017127717 = queryNorm
              0.4647574 = fieldWeight in 5870, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.948895 = idf(docFreq=314, maxDocs=44421)
                0.078125 = fieldNorm(doc=5870)
          0.18728293 = weight(abstract_txt:weighting in 5870) [ClassicSimilarity], result of:
            0.18728293 = score(doc=5870,freq=1.0), product of:
              0.343733 = queryWeight, product of:
                2.8776295 = boost
                6.9740796 = idf(docFreq=112, maxDocs=44421)
                0.017127717 = queryNorm
              0.54485 = fieldWeight in 5870, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9740796 = idf(docFreq=112, maxDocs=44421)
                0.078125 = fieldNorm(doc=5870)
          0.1581831 = weight(abstract_txt:relevant in 5870) [ClassicSimilarity], result of:
            0.1581831 = score(doc=5870,freq=3.0), product of:
              0.25248662 = queryWeight, product of:
                3.1839614 = boost
                4.6298943 = idf(docFreq=1177, maxDocs=44421)
                0.017127717 = queryNorm
              0.6265009 = fieldWeight in 5870, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.6298943 = idf(docFreq=1177, maxDocs=44421)
                0.078125 = fieldNorm(doc=5870)
          0.22122373 = weight(abstract_txt:documents in 5870) [ClassicSimilarity], result of:
            0.22122373 = score(doc=5870,freq=6.0), product of:
              0.28036174 = queryWeight, product of:
                3.969831 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.017127717 = queryNorm
              0.7890653 = fieldWeight in 5870, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.078125 = fieldNorm(doc=5870)
        0.28 = coord(7/25)
    
  5. Ye, Z.; Huang, J.X.: ¬A learning to rank approach for quality-aware pseudo-relevance feedback (2016) 0.20
    0.20365593 = sum of:
      0.20365593 = product of:
        0.7273426 = sum of:
          0.056588225 = weight(abstract_txt:reasonable in 3855) [ClassicSimilarity], result of:
            0.056588225 = score(doc=3855,freq=1.0), product of:
              0.12452965 = queryWeight, product of:
                7.270651 = idf(docFreq=83, maxDocs=44421)
                0.017127717 = queryNorm
              0.45441568 = fieldWeight in 3855, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.270651 = idf(docFreq=83, maxDocs=44421)
                0.0625 = fieldNorm(doc=3855)
          0.052137814 = weight(abstract_txt:document in 3855) [ClassicSimilarity], result of:
            0.052137814 = score(doc=3855,freq=5.0), product of:
              0.08687816 = queryWeight, product of:
                1.1812285 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.017127717 = queryNorm
              0.6001257 = fieldWeight in 3855, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.0625 = fieldNorm(doc=3855)
          0.043290023 = weight(abstract_txt:then in 3855) [ClassicSimilarity], result of:
            0.043290023 = score(doc=3855,freq=1.0), product of:
              0.1502292 = queryWeight, product of:
                1.9023981 = boost
                4.6105576 = idf(docFreq=1200, maxDocs=44421)
                0.017127717 = queryNorm
              0.28815985 = fieldWeight in 3855, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6105576 = idf(docFreq=1200, maxDocs=44421)
                0.0625 = fieldNorm(doc=3855)
          0.22669107 = weight(abstract_txt:feedback in 3855) [ClassicSimilarity], result of:
            0.22669107 = score(doc=3855,freq=6.0), product of:
              0.24930729 = queryWeight, product of:
                2.4507089 = boost
                5.9394164 = idf(docFreq=317, maxDocs=44421)
                0.017127717 = queryNorm
              0.90928376 = fieldWeight in 3855, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.9394164 = idf(docFreq=317, maxDocs=44421)
                0.0625 = fieldNorm(doc=3855)
          0.073061645 = weight(abstract_txt:relevant in 3855) [ClassicSimilarity], result of:
            0.073061645 = score(doc=3855,freq=1.0), product of:
              0.25248662 = queryWeight, product of:
                3.1839614 = boost
                4.6298943 = idf(docFreq=1177, maxDocs=44421)
                0.017127717 = queryNorm
              0.2893684 = fieldWeight in 3855, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6298943 = idf(docFreq=1177, maxDocs=44421)
                0.0625 = fieldNorm(doc=3855)
          0.15043078 = weight(abstract_txt:preliminary in 3855) [ClassicSimilarity], result of:
            0.15043078 = score(doc=3855,freq=1.0), product of:
              0.379344 = queryWeight, product of:
                3.4906814 = boost
                6.3448815 = idf(docFreq=211, maxDocs=44421)
                0.017127717 = queryNorm
              0.3965551 = fieldWeight in 3855, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3448815 = idf(docFreq=211, maxDocs=44421)
                0.0625 = fieldNorm(doc=3855)
          0.12514302 = weight(abstract_txt:documents in 3855) [ClassicSimilarity], result of:
            0.12514302 = score(doc=3855,freq=3.0), product of:
              0.28036174 = queryWeight, product of:
                3.969831 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.017127717 = queryNorm
              0.4463627 = fieldWeight in 3855, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.0625 = fieldNorm(doc=3855)
        0.28 = coord(7/25)