Document (#39696)

Author
Buccio, E. Di
Melucci, M.
Moro, F.
Title
Detecting verbose queries and improving information retrieval
Source
Information processing and management. 50(2014) no.2, S.342-360
Year
2014
Abstract
Although most of the queries submitted to search engines are composed of a few keywords and have a length that ranges from three to six words, more than 15% of the total volume of the queries are verbose, introduce ambiguity and cause topic drifts. We consider verbosity a different property of queries from length since a verbose query is not necessarily long, it might be succinct and a short query might be verbose. This paper proposes a methodology to automatically detect verbose queries and conditionally modify queries. The methodology proposed in this paper exploits state-of-the-art classification algorithms, combines concepts from a large linguistic database and uses a topic gisting algorithm we designed for verbose query modification purposes. Our experimental results have been obtained using the TREC Robust track collection, thirty topics classified by difficulty degree, four queries per topic classified by verbosity and length, and human assessment of query verbosity. Our results suggest that the methodology for query modification conditioned to query verbosity detection and topic gisting is significantly effective and that query modification should be refined when topic difficulty and query verbosity are considered since these two properties interact and query verbosity is not straightforwardly related to query length.
Content
Vgl.: doi: 10.1016/j.ipm.2013.09.003.
Theme
Semantisches Umfeld in Indexierung u. Retrieval

Similar documents (author)

  1. Melucci, M.: Passage retrieval : a probabilistic technique (1998) 5.81
    5.814733 = sum of:
      5.814733 = weight(author_txt:melucci in 2150) [ClassicSimilarity], result of:
        5.814733 = fieldWeight in 2150, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.303573 = idf(docFreq=10, maxDocs=44421)
          0.625 = fieldNorm(doc=2150)
    
  2. Melucci, M.: Making digital libraries effective : automatic generation of links for similarity search across hyper-textbooks (2004) 5.81
    5.814733 = sum of:
      5.814733 = weight(author_txt:melucci in 3226) [ClassicSimilarity], result of:
        5.814733 = fieldWeight in 3226, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.303573 = idf(docFreq=10, maxDocs=44421)
          0.625 = fieldNorm(doc=3226)
    
  3. Melucci, M.: Contextual search : a computational framework (2012) 5.81
    5.814733 = sum of:
      5.814733 = weight(author_txt:melucci in 913) [ClassicSimilarity], result of:
        5.814733 = fieldWeight in 913, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.303573 = idf(docFreq=10, maxDocs=44421)
          0.625 = fieldNorm(doc=913)
    
  4. Agosti, M.; Melucci, M.: Information retrieval techniques for the automatic construction of hypertext (2000) 4.65
    4.6517863 = sum of:
      4.6517863 = weight(author_txt:melucci in 5671) [ClassicSimilarity], result of:
        4.6517863 = fieldWeight in 5671, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.303573 = idf(docFreq=10, maxDocs=44421)
          0.5 = fieldNorm(doc=5671)
    
  5. Melucci, M.; Orio, N.: Combining melody processing and information retrieval techniques : methodology, evaluation, and system implementation (2004) 4.65
    4.6517863 = sum of:
      4.6517863 = weight(author_txt:melucci in 4087) [ClassicSimilarity], result of:
        4.6517863 = fieldWeight in 4087, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.303573 = idf(docFreq=10, maxDocs=44421)
          0.5 = fieldNorm(doc=4087)
    

Similar documents (content)

  1. Hoenkamp, E.; Bruza, P.D.; Song, D.; Huang, Q.: ¬An effective approach to verbose queries using a limited dependencies language model (2009) 0.20
    0.1982674 = sum of:
      0.1982674 = product of:
        0.991337 = sum of:
          0.004731874 = weight(abstract_txt:that in 3122) [ClassicSimilarity], result of:
            0.004731874 = score(doc=3122,freq=2.0), product of:
              0.022637002 = queryWeight, product of:
                1.0160156 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.00942105 = queryNorm
              0.20903271 = fieldWeight in 3122, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0625 = fieldNorm(doc=3122)
          0.005315008 = weight(abstract_txt:from in 3122) [ClassicSimilarity], result of:
            0.005315008 = score(doc=3122,freq=1.0), product of:
              0.030818352 = queryWeight, product of:
                1.1854838 = boost
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.00942105 = queryNorm
              0.17246243 = fieldWeight in 3122, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.0625 = fieldNorm(doc=3122)
          0.13574266 = weight(abstract_txt:queries in 3122) [ClassicSimilarity], result of:
            0.13574266 = score(doc=3122,freq=3.0), product of:
              0.24579303 = queryWeight, product of:
                5.11404 = boost
                5.1015973 = idf(docFreq=734, maxDocs=44421)
                0.00942105 = queryNorm
              0.5522641 = fieldWeight in 3122, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.1015973 = idf(docFreq=734, maxDocs=44421)
                0.0625 = fieldNorm(doc=3122)
          0.18125144 = weight(abstract_txt:query in 3122) [ClassicSimilarity], result of:
            0.18125144 = score(doc=3122,freq=4.0), product of:
              0.30497718 = queryWeight, product of:
                6.8086967 = boost
                4.754492 = idf(docFreq=1039, maxDocs=44421)
                0.00942105 = queryNorm
              0.5943115 = fieldWeight in 3122, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.754492 = idf(docFreq=1039, maxDocs=44421)
                0.0625 = fieldNorm(doc=3122)
          0.66429603 = weight(abstract_txt:verbose in 3122) [ClassicSimilarity], result of:
            0.66429603 = score(doc=3122,freq=2.0), product of:
              0.770397 = queryWeight, product of:
                8.3823 = boost
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.00942105 = queryNorm
              0.86227757 = fieldWeight in 3122, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.0625 = fieldNorm(doc=3122)
        0.2 = coord(5/25)
    
  2. Sa, N.; Yuan, X.J.: Examining users' partial query modification patterns in voice search (2020) 0.18
    0.18223885 = sum of:
      0.18223885 = product of:
        0.65085304 = sum of:
          0.005795338 = weight(abstract_txt:that in 675) [ClassicSimilarity], result of:
            0.005795338 = score(doc=675,freq=3.0), product of:
              0.022637002 = queryWeight, product of:
                1.0160156 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.00942105 = queryNorm
              0.25601172 = fieldWeight in 675, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0625 = fieldNorm(doc=675)
          0.035741553 = weight(abstract_txt:thirty in 675) [ClassicSimilarity], result of:
            0.035741553 = score(doc=675,freq=1.0), product of:
              0.07612874 = queryWeight, product of:
                1.0757334 = boost
                7.5118127 = idf(docFreq=65, maxDocs=44421)
                0.00942105 = queryNorm
              0.4694883 = fieldWeight in 675, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5118127 = idf(docFreq=65, maxDocs=44421)
                0.0625 = fieldNorm(doc=675)
          0.04149191 = weight(abstract_txt:modify in 675) [ClassicSimilarity], result of:
            0.04149191 = score(doc=675,freq=1.0), product of:
              0.08408955 = queryWeight, product of:
                1.1305801 = boost
                7.894805 = idf(docFreq=44, maxDocs=44421)
                0.00942105 = queryNorm
              0.4934253 = fieldWeight in 675, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.894805 = idf(docFreq=44, maxDocs=44421)
                0.0625 = fieldNorm(doc=675)
          0.005315008 = weight(abstract_txt:from in 675) [ClassicSimilarity], result of:
            0.005315008 = score(doc=675,freq=1.0), product of:
              0.030818352 = queryWeight, product of:
                1.1854838 = boost
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.00942105 = queryNorm
              0.17246243 = fieldWeight in 675, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.0625 = fieldNorm(doc=675)
          0.21190274 = weight(abstract_txt:modification in 675) [ClassicSimilarity], result of:
            0.21190274 = score(doc=675,freq=4.0), product of:
              0.22657458 = queryWeight, product of:
                3.2143748 = boost
                7.48196 = idf(docFreq=67, maxDocs=44421)
                0.00942105 = queryNorm
              0.935245 = fieldWeight in 675, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.48196 = idf(docFreq=67, maxDocs=44421)
                0.0625 = fieldNorm(doc=675)
          0.11083342 = weight(abstract_txt:queries in 675) [ClassicSimilarity], result of:
            0.11083342 = score(doc=675,freq=2.0), product of:
              0.24579303 = queryWeight, product of:
                5.11404 = boost
                5.1015973 = idf(docFreq=734, maxDocs=44421)
                0.00942105 = queryNorm
              0.45092174 = fieldWeight in 675, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.1015973 = idf(docFreq=734, maxDocs=44421)
                0.0625 = fieldNorm(doc=675)
          0.23977311 = weight(abstract_txt:query in 675) [ClassicSimilarity], result of:
            0.23977311 = score(doc=675,freq=7.0), product of:
              0.30497718 = queryWeight, product of:
                6.8086967 = boost
                4.754492 = idf(docFreq=1039, maxDocs=44421)
                0.00942105 = queryNorm
              0.78620017 = fieldWeight in 675, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                4.754492 = idf(docFreq=1039, maxDocs=44421)
                0.0625 = fieldNorm(doc=675)
        0.28 = coord(7/25)
    
  3. Spink, A.; Ozmultu, H.C.: Characteristics of question format web queries : an exploratory study (2002) 0.18
    0.17865676 = sum of:
      0.17865676 = product of:
        0.63805985 = sum of:
          0.028934568 = weight(abstract_txt:submitted in 4910) [ClassicSimilarity], result of:
            0.028934568 = score(doc=4910,freq=1.0), product of:
              0.06612683 = queryWeight, product of:
                1.0025803 = boost
                7.000987 = idf(docFreq=109, maxDocs=44421)
                0.00942105 = queryNorm
              0.4375617 = fieldWeight in 4910, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.000987 = idf(docFreq=109, maxDocs=44421)
                0.0625 = fieldNorm(doc=4910)
          0.004731874 = weight(abstract_txt:that in 4910) [ClassicSimilarity], result of:
            0.004731874 = score(doc=4910,freq=2.0), product of:
              0.022637002 = queryWeight, product of:
                1.0160156 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.00942105 = queryNorm
              0.20903271 = fieldWeight in 4910, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0625 = fieldNorm(doc=4910)
          0.005315008 = weight(abstract_txt:from in 4910) [ClassicSimilarity], result of:
            0.005315008 = score(doc=4910,freq=1.0), product of:
              0.030818352 = queryWeight, product of:
                1.1854838 = boost
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.00942105 = queryNorm
              0.17246243 = fieldWeight in 4910, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.0625 = fieldNorm(doc=4910)
          0.05441993 = weight(abstract_txt:topic in 4910) [ClassicSimilarity], result of:
            0.05441993 = score(doc=4910,freq=1.0), product of:
              0.17229065 = queryWeight, product of:
                3.6186466 = boost
                5.053779 = idf(docFreq=770, maxDocs=44421)
                0.00942105 = queryNorm
              0.3158612 = fieldWeight in 4910, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.053779 = idf(docFreq=770, maxDocs=44421)
                0.0625 = fieldNorm(doc=4910)
          0.09445239 = weight(abstract_txt:length in 4910) [ClassicSimilarity], result of:
            0.09445239 = score(doc=4910,freq=1.0), product of:
              0.23099098 = queryWeight, product of:
                3.7476394 = boost
                6.5424123 = idf(docFreq=173, maxDocs=44421)
                0.00942105 = queryNorm
              0.40890077 = fieldWeight in 4910, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5424123 = idf(docFreq=173, maxDocs=44421)
                0.0625 = fieldNorm(doc=4910)
          0.2932377 = weight(abstract_txt:queries in 4910) [ClassicSimilarity], result of:
            0.2932377 = score(doc=4910,freq=14.0), product of:
              0.24579303 = queryWeight, product of:
                5.11404 = boost
                5.1015973 = idf(docFreq=734, maxDocs=44421)
                0.00942105 = queryNorm
              1.1930269 = fieldWeight in 4910, product of:
                3.7416575 = tf(freq=14.0), with freq of:
                  14.0 = termFreq=14.0
                5.1015973 = idf(docFreq=734, maxDocs=44421)
                0.0625 = fieldNorm(doc=4910)
          0.15696836 = weight(abstract_txt:query in 4910) [ClassicSimilarity], result of:
            0.15696836 = score(doc=4910,freq=3.0), product of:
              0.30497718 = queryWeight, product of:
                6.8086967 = boost
                4.754492 = idf(docFreq=1039, maxDocs=44421)
                0.00942105 = queryNorm
              0.51468885 = fieldWeight in 4910, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.754492 = idf(docFreq=1039, maxDocs=44421)
                0.0625 = fieldNorm(doc=4910)
        0.28 = coord(7/25)
    
  4. Koopman, B.; Zuccon, G.; Bruza, P.; Nguyen, A.: What makes an effective clinical query and querier? (2017) 0.16
    0.16355374 = sum of:
      0.16355374 = product of:
        0.8177687 = sum of:
          0.004731874 = weight(abstract_txt:that in 4922) [ClassicSimilarity], result of:
            0.004731874 = score(doc=4922,freq=2.0), product of:
              0.022637002 = queryWeight, product of:
                1.0160156 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.00942105 = queryNorm
              0.20903271 = fieldWeight in 4922, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0625 = fieldNorm(doc=4922)
          0.005315008 = weight(abstract_txt:from in 4922) [ClassicSimilarity], result of:
            0.005315008 = score(doc=4922,freq=1.0), product of:
              0.030818352 = queryWeight, product of:
                1.1854838 = boost
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.00942105 = queryNorm
              0.17246243 = fieldWeight in 4922, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.0625 = fieldNorm(doc=4922)
          0.15674213 = weight(abstract_txt:queries in 4922) [ClassicSimilarity], result of:
            0.15674213 = score(doc=4922,freq=4.0), product of:
              0.24579303 = queryWeight, product of:
                5.11404 = boost
                5.1015973 = idf(docFreq=734, maxDocs=44421)
                0.00942105 = queryNorm
              0.63769966 = fieldWeight in 4922, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.1015973 = idf(docFreq=734, maxDocs=44421)
                0.0625 = fieldNorm(doc=4922)
          0.18125144 = weight(abstract_txt:query in 4922) [ClassicSimilarity], result of:
            0.18125144 = score(doc=4922,freq=4.0), product of:
              0.30497718 = queryWeight, product of:
                6.8086967 = boost
                4.754492 = idf(docFreq=1039, maxDocs=44421)
                0.00942105 = queryNorm
              0.5943115 = fieldWeight in 4922, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.754492 = idf(docFreq=1039, maxDocs=44421)
                0.0625 = fieldNorm(doc=4922)
          0.46972826 = weight(abstract_txt:verbose in 4922) [ClassicSimilarity], result of:
            0.46972826 = score(doc=4922,freq=1.0), product of:
              0.770397 = queryWeight, product of:
                8.3823 = boost
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.00942105 = queryNorm
              0.6097223 = fieldWeight in 4922, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.0625 = fieldNorm(doc=4922)
        0.2 = coord(5/25)
    
  5. Li, X.; Schijvenaars, B.J.A.; Rijke, M.de: Investigating queries and search failures in academic search (2017) 0.16
    0.1604477 = sum of:
      0.1604477 = product of:
        0.66853213 = sum of:
          0.0065465313 = weight(abstract_txt:that in 33) [ClassicSimilarity], result of:
            0.0065465313 = score(doc=33,freq=5.0), product of:
              0.022637002 = queryWeight, product of:
                1.0160156 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.00942105 = queryNorm
              0.28919604 = fieldWeight in 33, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0546875 = fieldNorm(doc=33)
          0.0065769865 = weight(abstract_txt:from in 33) [ClassicSimilarity], result of:
            0.0065769865 = score(doc=33,freq=2.0), product of:
              0.030818352 = queryWeight, product of:
                1.1854838 = boost
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.00942105 = queryNorm
              0.21341136 = fieldWeight in 33, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.0546875 = fieldNorm(doc=33)
          0.059415057 = weight(abstract_txt:conditioned in 33) [ClassicSimilarity], result of:
            0.059415057 = score(doc=33,freq=1.0), product of:
              0.116777375 = queryWeight, product of:
                1.3323234 = boost
                9.303573 = idf(docFreq=10, maxDocs=44421)
                0.00942105 = queryNorm
              0.5087891 = fieldWeight in 33, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.303573 = idf(docFreq=10, maxDocs=44421)
                0.0546875 = fieldNorm(doc=33)
          0.08264584 = weight(abstract_txt:length in 33) [ClassicSimilarity], result of:
            0.08264584 = score(doc=33,freq=1.0), product of:
              0.23099098 = queryWeight, product of:
                3.7476394 = boost
                6.5424123 = idf(docFreq=173, maxDocs=44421)
                0.00942105 = queryNorm
              0.35778818 = fieldWeight in 33, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5424123 = idf(docFreq=173, maxDocs=44421)
                0.0546875 = fieldNorm(doc=33)
          0.2274365 = weight(abstract_txt:queries in 33) [ClassicSimilarity], result of:
            0.2274365 = score(doc=33,freq=11.0), product of:
              0.24579303 = queryWeight, product of:
                5.11404 = boost
                5.1015973 = idf(docFreq=734, maxDocs=44421)
                0.00942105 = queryNorm
              0.9253171 = fieldWeight in 33, product of:
                3.3166249 = tf(freq=11.0), with freq of:
                  11.0 = termFreq=11.0
                5.1015973 = idf(docFreq=734, maxDocs=44421)
                0.0546875 = fieldNorm(doc=33)
          0.28591123 = weight(abstract_txt:query in 33) [ClassicSimilarity], result of:
            0.28591123 = score(doc=33,freq=13.0), product of:
              0.30497718 = queryWeight, product of:
                6.8086967 = boost
                4.754492 = idf(docFreq=1039, maxDocs=44421)
                0.00942105 = queryNorm
              0.937484 = fieldWeight in 33, product of:
                3.6055512 = tf(freq=13.0), with freq of:
                  13.0 = termFreq=13.0
                4.754492 = idf(docFreq=1039, maxDocs=44421)
                0.0546875 = fieldNorm(doc=33)
        0.24 = coord(6/25)