Document (#32927)

Author
He, B.
Ounis, I.
Title
Combining fields for query expansion and adaptive query expansion
Source
Information processing and management. 43(2007) no.5, S.1294-1307
Year
2007
Abstract
In this paper, we aim to improve query expansion for ad-hoc retrieval, by proposing a more fine-grained term reweighting process. This fine-grained process uses statistics from the representation of documents in various fields, such as their titles, the anchor text of their incoming links, and their body content. The contribution of this paper is twofold: First, we propose a novel query expansion mechanism on fields by combining field evidence available in a corpora. Second, we propose an adaptive query expansion mechanism that selects an appropriate collection resource, either the local collection, or a high-quality external resource, for query expansion on a per-query basis. The two proposed query expansion approaches are thoroughly evaluated using two standard Text Retrieval Conference (TREC) Web collections, namely the WT10G collection and the large-scale .GOV2 collection. From the experimental results, we observe a statistically significant improvement compared with the baselines. Moreover, we conclude that the adaptive query expansion mechanism is very effective when the external collection used is much larger than the local collection.

Similar documents (author)

  1. Lioma, C.; Ounis, I.: ¬A syntactically-based query reformulation technique for information retrieval (2008) 4.81
    4.811013 = sum of:
      4.811013 = weight(author_txt:ounis in 3031) [ClassicSimilarity], result of:
        4.811013 = fieldWeight in 3031, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.622026 = idf(docFreq=7, maxDocs=44421)
          0.5 = fieldNorm(doc=3031)
    
  2. Chang, Y.; Ounis, I.; Kim, M.: Query reformulation using automatically generated query concepts from a document space (2006) 3.61
    3.60826 = sum of:
      3.60826 = weight(author_txt:ounis in 1972) [ClassicSimilarity], result of:
        3.60826 = fieldWeight in 1972, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.622026 = idf(docFreq=7, maxDocs=44421)
          0.375 = fieldNorm(doc=1972)
    
  3. Cacheda, F.; Plachouras, V.; Ounis, l.: ¬A case study of distributed information retrieval architectures to index one terabyte of text (2005) 3.61
    3.60826 = sum of:
      3.60826 = weight(author_txt:ounis in 2042) [ClassicSimilarity], result of:
        3.60826 = fieldWeight in 2042, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.622026 = idf(docFreq=7, maxDocs=44421)
          0.375 = fieldNorm(doc=2042)
    
  4. Cacheda, F.; Carneiro, V.; Plachouras, V.; Ounis, I.: Performance analysis of distributed information retrieval architectures using an improved network simulation model (2007) 3.01
    3.0068831 = sum of:
      3.0068831 = weight(author_txt:ounis in 1903) [ClassicSimilarity], result of:
        3.0068831 = fieldWeight in 1903, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.622026 = idf(docFreq=7, maxDocs=44421)
          0.3125 = fieldNorm(doc=1903)
    
  5. Gray, A.J.G.; Gray, N.; Hall, C.W.; Ounis, I.: Finding the right term : retrieving and exploring semantic concepts in astronomical vocabularies (2010) 3.01
    3.0068831 = sum of:
      3.0068831 = weight(author_txt:ounis in 235) [ClassicSimilarity], result of:
        3.0068831 = fieldWeight in 235, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.622026 = idf(docFreq=7, maxDocs=44421)
          0.3125 = fieldNorm(doc=235)
    

Similar documents (content)

  1. Sah, M.; Wade, V.: Personalized concept-based search on the Linked Open Data (2015) 0.27
    0.27181798 = sum of:
      0.27181798 = product of:
        0.75504994 = sum of:
          0.006194893 = weight(abstract_txt:this in 3511) [ClassicSimilarity], result of:
            0.006194893 = score(doc=3511,freq=2.0), product of:
              0.033288386 = queryWeight, product of:
                1.0008143 = boost
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.013822967 = queryNorm
              0.18609773 = fieldWeight in 3511, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3511)
          0.061439153 = weight(abstract_txt:selects in 3511) [ClassicSimilarity], result of:
            0.061439153 = score(doc=3511,freq=1.0), product of:
              0.13423629 = queryWeight, product of:
                1.1603298 = boost
                8.369263 = idf(docFreq=27, maxDocs=44421)
                0.013822967 = queryNorm
              0.45769405 = fieldWeight in 3511, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.369263 = idf(docFreq=27, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3511)
          0.009849722 = weight(abstract_txt:their in 3511) [ClassicSimilarity], result of:
            0.009849722 = score(doc=3511,freq=1.0), product of:
              0.057134204 = queryWeight, product of:
                1.3111585 = boost
                3.1523883 = idf(docFreq=5161, maxDocs=44421)
                0.013822967 = queryNorm
              0.17239624 = fieldWeight in 3511, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1523883 = idf(docFreq=5161, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3511)
          0.029838724 = weight(abstract_txt:local in 3511) [ClassicSimilarity], result of:
            0.029838724 = score(doc=3511,freq=1.0), product of:
              0.104496874 = queryWeight, product of:
                1.4478152 = boost
                5.221423 = idf(docFreq=651, maxDocs=44421)
                0.013822967 = queryNorm
              0.28554657 = fieldWeight in 3511, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.221423 = idf(docFreq=651, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3511)
          0.05056296 = weight(abstract_txt:combining in 3511) [ClassicSimilarity], result of:
            0.05056296 = score(doc=3511,freq=1.0), product of:
              0.14852679 = queryWeight, product of:
                1.7260917 = boost
                6.225004 = idf(docFreq=238, maxDocs=44421)
                0.013822967 = queryNorm
              0.3404299 = fieldWeight in 3511, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.225004 = idf(docFreq=238, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3511)
          0.07908381 = weight(abstract_txt:mechanism in 3511) [ClassicSimilarity], result of:
            0.07908381 = score(doc=3511,freq=1.0), product of:
              0.22908954 = queryWeight, product of:
                2.6254861 = boost
                6.312396 = idf(docFreq=218, maxDocs=44421)
                0.013822967 = queryNorm
              0.34520915 = fieldWeight in 3511, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.312396 = idf(docFreq=218, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3511)
          0.10467711 = weight(abstract_txt:adaptive in 3511) [ClassicSimilarity], result of:
            0.10467711 = score(doc=3511,freq=1.0), product of:
              0.27617308 = queryWeight, product of:
                2.882689 = boost
                6.930783 = idf(docFreq=117, maxDocs=44421)
                0.013822967 = queryNorm
              0.3790272 = fieldWeight in 3511, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.930783 = idf(docFreq=117, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3511)
          0.1433684 = weight(abstract_txt:query in 3511) [ClassicSimilarity], result of:
            0.1433684 = score(doc=3511,freq=2.0), product of:
              0.38989374 = queryWeight, product of:
                5.932543 = boost
                4.754492 = idf(docFreq=1039, maxDocs=44421)
                0.013822967 = queryNorm
              0.36771145 = fieldWeight in 3511, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.754492 = idf(docFreq=1039, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3511)
          0.27003518 = weight(abstract_txt:expansion in 3511) [ClassicSimilarity], result of:
            0.27003518 = score(doc=3511,freq=2.0), product of:
              0.5717507 = queryWeight, product of:
                6.7732143 = boost
                6.106756 = idf(docFreq=268, maxDocs=44421)
                0.013822967 = queryNorm
              0.4722953 = fieldWeight in 3511, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.106756 = idf(docFreq=268, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3511)
        0.36 = coord(9/25)
    
  2. Efthimiadis, E.N.: End-users' understanding of thesaural knowledge structures in interactive query expansion (1994) 0.20
    0.2018501 = sum of:
      0.2018501 = product of:
        1.0092505 = sum of:
          0.010012459 = weight(abstract_txt:this in 6693) [ClassicSimilarity], result of:
            0.010012459 = score(doc=6693,freq=1.0), product of:
              0.033288386 = queryWeight, product of:
                1.0008143 = boost
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.013822967 = queryNorm
              0.30077934 = fieldWeight in 6693, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.125 = fieldNorm(doc=6693)
          0.031801958 = weight(abstract_txt:process in 6693) [ClassicSimilarity], result of:
            0.031801958 = score(doc=6693,freq=1.0), product of:
              0.0628354 = queryWeight, product of:
                1.1226999 = boost
                4.048922 = idf(docFreq=2105, maxDocs=44421)
                0.013822967 = queryNorm
              0.50611526 = fieldWeight in 6693, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.048922 = idf(docFreq=2105, maxDocs=44421)
                0.125 = fieldNorm(doc=6693)
          0.02251365 = weight(abstract_txt:their in 6693) [ClassicSimilarity], result of:
            0.02251365 = score(doc=6693,freq=1.0), product of:
              0.057134204 = queryWeight, product of:
                1.3111585 = boost
                3.1523883 = idf(docFreq=5161, maxDocs=44421)
                0.013822967 = queryNorm
              0.39404854 = fieldWeight in 6693, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1523883 = idf(docFreq=5161, maxDocs=44421)
                0.125 = fieldNorm(doc=6693)
          0.32769918 = weight(abstract_txt:query in 6693) [ClassicSimilarity], result of:
            0.32769918 = score(doc=6693,freq=2.0), product of:
              0.38989374 = queryWeight, product of:
                5.932543 = boost
                4.754492 = idf(docFreq=1039, maxDocs=44421)
                0.013822967 = queryNorm
              0.8404833 = fieldWeight in 6693, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.754492 = idf(docFreq=1039, maxDocs=44421)
                0.125 = fieldNorm(doc=6693)
          0.61722326 = weight(abstract_txt:expansion in 6693) [ClassicSimilarity], result of:
            0.61722326 = score(doc=6693,freq=2.0), product of:
              0.5717507 = queryWeight, product of:
                6.7732143 = boost
                6.106756 = idf(docFreq=268, maxDocs=44421)
                0.013822967 = queryNorm
              1.0795321 = fieldWeight in 6693, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.106756 = idf(docFreq=268, maxDocs=44421)
                0.125 = fieldNorm(doc=6693)
        0.2 = coord(5/25)
    
  3. Qiu, Y.; Frei, H.P.: Concept based query expansion (1993) 0.20
    0.20043707 = sum of:
      0.20043707 = product of:
        1.2527317 = sum of:
          0.03377047 = weight(abstract_txt:their in 2677) [ClassicSimilarity], result of:
            0.03377047 = score(doc=2677,freq=1.0), product of:
              0.057134204 = queryWeight, product of:
                1.3111585 = boost
                3.1523883 = idf(docFreq=5161, maxDocs=44421)
                0.013822967 = queryNorm
              0.5910728 = fieldWeight in 2677, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1523883 = idf(docFreq=5161, maxDocs=44421)
                0.1875 = fieldNorm(doc=2677)
          0.21671958 = weight(abstract_txt:collection in 2677) [ClassicSimilarity], result of:
            0.21671958 = score(doc=2677,freq=1.0), product of:
              0.24858801 = queryWeight, product of:
                3.8677838 = boost
                4.649612 = idf(docFreq=1154, maxDocs=44421)
                0.013822967 = queryNorm
              0.8718022 = fieldWeight in 2677, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.649612 = idf(docFreq=1154, maxDocs=44421)
                0.1875 = fieldNorm(doc=2677)
          0.34757748 = weight(abstract_txt:query in 2677) [ClassicSimilarity], result of:
            0.34757748 = score(doc=2677,freq=1.0), product of:
              0.38989374 = queryWeight, product of:
                5.932543 = boost
                4.754492 = idf(docFreq=1039, maxDocs=44421)
                0.013822967 = queryNorm
              0.8914672 = fieldWeight in 2677, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.754492 = idf(docFreq=1039, maxDocs=44421)
                0.1875 = fieldNorm(doc=2677)
          0.65466416 = weight(abstract_txt:expansion in 2677) [ClassicSimilarity], result of:
            0.65466416 = score(doc=2677,freq=1.0), product of:
              0.5717507 = queryWeight, product of:
                6.7732143 = boost
                6.106756 = idf(docFreq=268, maxDocs=44421)
                0.013822967 = queryNorm
              1.1450168 = fieldWeight in 2677, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.106756 = idf(docFreq=268, maxDocs=44421)
                0.1875 = fieldNorm(doc=2677)
        0.16 = coord(4/25)
    
  4. Efthimiadis, E.N.: Query expansion (1996) 0.20
    0.20021468 = sum of:
      0.20021468 = product of:
        1.6684557 = sum of:
          0.031801958 = weight(abstract_txt:process in 4915) [ClassicSimilarity], result of:
            0.031801958 = score(doc=4915,freq=1.0), product of:
              0.0628354 = queryWeight, product of:
                1.1226999 = boost
                4.048922 = idf(docFreq=2105, maxDocs=44421)
                0.013822967 = queryNorm
              0.50611526 = fieldWeight in 4915, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.048922 = idf(docFreq=2105, maxDocs=44421)
                0.125 = fieldNorm(doc=4915)
          0.56759167 = weight(abstract_txt:query in 4915) [ClassicSimilarity], result of:
            0.56759167 = score(doc=4915,freq=6.0), product of:
              0.38989374 = queryWeight, product of:
                5.932543 = boost
                4.754492 = idf(docFreq=1039, maxDocs=44421)
                0.013822967 = queryNorm
              1.4557599 = fieldWeight in 4915, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.754492 = idf(docFreq=1039, maxDocs=44421)
                0.125 = fieldNorm(doc=4915)
          1.0690621 = weight(abstract_txt:expansion in 4915) [ClassicSimilarity], result of:
            1.0690621 = score(doc=4915,freq=6.0), product of:
              0.5717507 = queryWeight, product of:
                6.7732143 = boost
                6.106756 = idf(docFreq=268, maxDocs=44421)
                0.013822967 = queryNorm
              1.8698046 = fieldWeight in 4915, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.106756 = idf(docFreq=268, maxDocs=44421)
                0.125 = fieldNorm(doc=4915)
        0.12 = coord(3/25)
    
  5. Brandão, W.C.; Santos, R.L.T.; Ziviani, N.; Moura, E.S. de; Silva, A.S. da: Learning to expand queries using entities (2014) 0.20
    0.19841103 = sum of:
      0.19841103 = product of:
        0.82671267 = sum of:
          0.0070798774 = weight(abstract_txt:this in 2343) [ClassicSimilarity], result of:
            0.0070798774 = score(doc=2343,freq=2.0), product of:
              0.033288386 = queryWeight, product of:
                1.0008143 = boost
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.013822967 = queryNorm
              0.21268311 = fieldWeight in 2343, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.0625 = fieldNorm(doc=2343)
          0.015806442 = weight(abstract_txt:text in 2343) [ClassicSimilarity], result of:
            0.015806442 = score(doc=2343,freq=1.0), product of:
              0.06258611 = queryWeight, product of:
                1.1204705 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.013822967 = queryNorm
              0.25255513 = fieldWeight in 2343, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0625 = fieldNorm(doc=2343)
          0.011256825 = weight(abstract_txt:their in 2343) [ClassicSimilarity], result of:
            0.011256825 = score(doc=2343,freq=1.0), product of:
              0.057134204 = queryWeight, product of:
                1.3111585 = boost
                3.1523883 = idf(docFreq=5161, maxDocs=44421)
                0.013822967 = queryNorm
              0.19702427 = fieldWeight in 2343, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1523883 = idf(docFreq=5161, maxDocs=44421)
                0.0625 = fieldNorm(doc=2343)
          0.045542713 = weight(abstract_txt:fields in 2343) [ClassicSimilarity], result of:
            0.045542713 = score(doc=2343,freq=1.0), product of:
              0.14506574 = queryWeight, product of:
                2.0892458 = boost
                5.0231256 = idf(docFreq=794, maxDocs=44421)
                0.013822967 = queryNorm
              0.31394535 = fieldWeight in 2343, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0231256 = idf(docFreq=794, maxDocs=44421)
                0.0625 = fieldNorm(doc=2343)
          0.25906897 = weight(abstract_txt:query in 2343) [ClassicSimilarity], result of:
            0.25906897 = score(doc=2343,freq=5.0), product of:
              0.38989374 = queryWeight, product of:
                5.932543 = boost
                4.754492 = idf(docFreq=1039, maxDocs=44421)
                0.013822967 = queryNorm
              0.6644604 = fieldWeight in 2343, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.754492 = idf(docFreq=1039, maxDocs=44421)
                0.0625 = fieldNorm(doc=2343)
          0.48795784 = weight(abstract_txt:expansion in 2343) [ClassicSimilarity], result of:
            0.48795784 = score(doc=2343,freq=5.0), product of:
              0.5717507 = queryWeight, product of:
                6.7732143 = boost
                6.106756 = idf(docFreq=268, maxDocs=44421)
                0.013822967 = queryNorm
              0.8534451 = fieldWeight in 2343, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.106756 = idf(docFreq=268, maxDocs=44421)
                0.0625 = fieldNorm(doc=2343)
        0.24 = coord(6/25)