Document (#34442)

Author
Song, R.
Luo, Z.
Nie, J.-Y.
Yu, Y.
Hon, H.-W.
Title
Identification of ambiguous queries in web search
Source
Information processing and management. 45(2009) no.2, S.216-229
Year
2009
Abstract
It is widely believed that many queries submitted to search engines are inherently ambiguous (e.g., java and apple). However, few studies have tried to classify queries based on ambiguity and to answer "what the proportion of ambiguous queries is". This paper deals with these issues. First, we clarify the definition of ambiguous queries by constructing the taxonomy of queries from being ambiguous to specific. Second, we ask human annotators to manually classify queries. From manually labeled results, we observe that query ambiguity is to some extent predictable. Third, we propose a supervised learning approach to automatically identify ambiguous queries. Experimental results show that we can correctly identify 87% of labeled queries with the approach. Finally, by using our approach, we estimate that about 16% of queries in a real search log are ambiguous.
Theme
Suchmaschinen
Suchtaktik

Similar documents (author)

  1. Song, F.W.: Virtual communities : bowling alone, online together (2009) 4.95
    4.9482985 = sum of:
      4.9482985 = weight(author_txt:song in 274) [ClassicSimilarity], result of:
        4.9482985 = fieldWeight in 274, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.917278 = idf(docFreq=43, maxDocs=44421)
          0.625 = fieldNorm(doc=274)
    
  2. Song, S.-f.: Rethinking of the development of reference service (1997) 3.96
    3.958639 = sum of:
      3.958639 = weight(author_txt:song in 859) [ClassicSimilarity], result of:
        3.958639 = fieldWeight in 859, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.917278 = idf(docFreq=43, maxDocs=44421)
          0.5 = fieldNorm(doc=859)
    
  3. Song, D.; Bruza, P.D.: Towards context sensitive information inference (2003) 3.96
    3.958639 = sum of:
      3.958639 = weight(author_txt:song in 2428) [ClassicSimilarity], result of:
        3.958639 = fieldWeight in 2428, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.917278 = idf(docFreq=43, maxDocs=44421)
          0.5 = fieldNorm(doc=2428)
    
  4. Song, Y.-S.: International business students : a study on their use of electronic library services (2004) 3.96
    3.958639 = sum of:
      3.958639 = weight(author_txt:song in 1546) [ClassicSimilarity], result of:
        3.958639 = fieldWeight in 1546, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.917278 = idf(docFreq=43, maxDocs=44421)
          0.5 = fieldNorm(doc=1546)
    
  5. Lau, R.Y.K.; Bruza, P.D.; Song, D.: Belief revision for adaptive information retrieval (2004) 2.97
    2.9689791 = sum of:
      2.9689791 = weight(author_txt:song in 5077) [ClassicSimilarity], result of:
        2.9689791 = fieldWeight in 5077, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.917278 = idf(docFreq=43, maxDocs=44421)
          0.375 = fieldNorm(doc=5077)
    

Similar documents (content)

  1. Liu, W.; Dog(an, R.I.; Kim, S.; Comeau, D.C.; Kim, W.; Yeganova, L.; Lu, Z.; Wilbur, W.J.: Author name disambiguation for PubMed (2014) 0.17
    0.17110135 = sum of:
      0.17110135 = product of:
        0.61107624 = sum of:
          0.016237969 = weight(abstract_txt:results in 2240) [ClassicSimilarity], result of:
            0.016237969 = score(doc=2240,freq=4.0), product of:
              0.04267794 = queryWeight, product of:
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.012268551 = queryNorm
              0.38047686 = fieldWeight in 2240, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.0546875 = fieldNorm(doc=2240)
          0.03585403 = weight(abstract_txt:estimate in 2240) [ClassicSimilarity], result of:
            0.03585403 = score(doc=2240,freq=1.0), product of:
              0.09117679 = queryWeight, product of:
                1.0335356 = boost
                7.190608 = idf(docFreq=90, maxDocs=44421)
                0.012268551 = queryNorm
              0.39323637 = fieldWeight in 2240, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.190608 = idf(docFreq=90, maxDocs=44421)
                0.0546875 = fieldNorm(doc=2240)
          0.007215623 = weight(abstract_txt:that in 2240) [ClassicSimilarity], result of:
            0.007215623 = score(doc=2240,freq=2.0), product of:
              0.03945041 = queryWeight, product of:
                1.3596873 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.012268551 = queryNorm
              0.18290362 = fieldWeight in 2240, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0546875 = fieldNorm(doc=2240)
          0.014121513 = weight(abstract_txt:search in 2240) [ClassicSimilarity], result of:
            0.014121513 = score(doc=2240,freq=1.0), product of:
              0.07065673 = queryWeight, product of:
                1.5758711 = boost
                3.654598 = idf(docFreq=3123, maxDocs=44421)
                0.012268551 = queryNorm
              0.19986083 = fieldWeight in 2240, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.654598 = idf(docFreq=3123, maxDocs=44421)
                0.0546875 = fieldNorm(doc=2240)
          0.07042878 = weight(abstract_txt:ambiguity in 2240) [ClassicSimilarity], result of:
            0.07042878 = score(doc=2240,freq=1.0), product of:
              0.18017828 = queryWeight, product of:
                2.0547051 = boost
                7.1475906 = idf(docFreq=94, maxDocs=44421)
                0.012268551 = queryNorm
              0.39088386 = fieldWeight in 2240, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.1475906 = idf(docFreq=94, maxDocs=44421)
                0.0546875 = fieldNorm(doc=2240)
          0.18108173 = weight(abstract_txt:queries in 2240) [ClassicSimilarity], result of:
            0.18108173 = score(doc=2240,freq=2.0), product of:
              0.45895004 = queryWeight, product of:
                7.332735 = boost
                5.1015973 = idf(docFreq=734, maxDocs=44421)
                0.012268551 = queryNorm
              0.39455652 = fieldWeight in 2240, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.1015973 = idf(docFreq=734, maxDocs=44421)
                0.0546875 = fieldNorm(doc=2240)
          0.2861366 = weight(abstract_txt:ambiguous in 2240) [ClassicSimilarity], result of:
            0.2861366 = score(doc=2240,freq=1.0), product of:
              0.69653124 = queryWeight, product of:
                7.557925 = boost
                7.5118127 = idf(docFreq=65, maxDocs=44421)
                0.012268551 = queryNorm
              0.41080225 = fieldWeight in 2240, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5118127 = idf(docFreq=65, maxDocs=44421)
                0.0546875 = fieldNorm(doc=2240)
        0.28 = coord(7/25)
    
  2. Bouidghaghen, O.; Tamine, L.: Spatio-temporal based personalization for mobile search (2012) 0.17
    0.16543388 = sum of:
      0.16543388 = product of:
        0.68930787 = sum of:
          0.011598549 = weight(abstract_txt:results in 1108) [ClassicSimilarity], result of:
            0.011598549 = score(doc=1108,freq=1.0), product of:
              0.04267794 = queryWeight, product of:
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.012268551 = queryNorm
              0.2717692 = fieldWeight in 1108, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.078125 = fieldNorm(doc=1108)
          0.010308034 = weight(abstract_txt:that in 1108) [ClassicSimilarity], result of:
            0.010308034 = score(doc=1108,freq=2.0), product of:
              0.03945041 = queryWeight, product of:
                1.3596873 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.012268551 = queryNorm
              0.2612909 = fieldWeight in 1108, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.078125 = fieldNorm(doc=1108)
          0.045109518 = weight(abstract_txt:search in 1108) [ClassicSimilarity], result of:
            0.045109518 = score(doc=1108,freq=5.0), product of:
              0.07065673 = queryWeight, product of:
                1.5758711 = boost
                3.654598 = idf(docFreq=3123, maxDocs=44421)
                0.012268551 = queryNorm
              0.63843197 = fieldWeight in 1108, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.654598 = idf(docFreq=3123, maxDocs=44421)
                0.078125 = fieldNorm(doc=1108)
          0.030605014 = weight(abstract_txt:approach in 1108) [ClassicSimilarity], result of:
            0.030605014 = score(doc=1108,freq=2.0), product of:
              0.07404286 = queryWeight, product of:
                1.6131898 = boost
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.012268551 = queryNorm
              0.41334188 = fieldWeight in 1108, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.078125 = fieldNorm(doc=1108)
          0.18292017 = weight(abstract_txt:queries in 1108) [ClassicSimilarity], result of:
            0.18292017 = score(doc=1108,freq=1.0), product of:
              0.45895004 = queryWeight, product of:
                7.332735 = boost
                5.1015973 = idf(docFreq=734, maxDocs=44421)
                0.012268551 = queryNorm
              0.39856228 = fieldWeight in 1108, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1015973 = idf(docFreq=734, maxDocs=44421)
                0.078125 = fieldNorm(doc=1108)
          0.40876657 = weight(abstract_txt:ambiguous in 1108) [ClassicSimilarity], result of:
            0.40876657 = score(doc=1108,freq=1.0), product of:
              0.69653124 = queryWeight, product of:
                7.557925 = boost
                7.5118127 = idf(docFreq=65, maxDocs=44421)
                0.012268551 = queryNorm
              0.58686036 = fieldWeight in 1108, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5118127 = idf(docFreq=65, maxDocs=44421)
                0.078125 = fieldNorm(doc=1108)
        0.24 = coord(6/25)
    
  3. Spink, A.; Ozmultu, H.C.: Characteristics of question format web queries : an exploratory study (2002) 0.16
    0.16050376 = sum of:
      0.16050376 = product of:
        0.66876566 = sum of:
          0.0092788385 = weight(abstract_txt:results in 4910) [ClassicSimilarity], result of:
            0.0092788385 = score(doc=4910,freq=1.0), product of:
              0.04267794 = queryWeight, product of:
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.012268551 = queryNorm
              0.21741535 = fieldWeight in 4910, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.0625 = fieldNorm(doc=4910)
          0.037819076 = weight(abstract_txt:submitted in 4910) [ClassicSimilarity], result of:
            0.037819076 = score(doc=4910,freq=1.0), product of:
              0.086431414 = queryWeight, product of:
                1.0062805 = boost
                7.000987 = idf(docFreq=109, maxDocs=44421)
                0.012268551 = queryNorm
              0.4375617 = fieldWeight in 4910, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.000987 = idf(docFreq=109, maxDocs=44421)
                0.0625 = fieldNorm(doc=4910)
          0.0082464265 = weight(abstract_txt:that in 4910) [ClassicSimilarity], result of:
            0.0082464265 = score(doc=4910,freq=2.0), product of:
              0.03945041 = queryWeight, product of:
                1.3596873 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.012268551 = queryNorm
              0.20903271 = fieldWeight in 4910, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0625 = fieldNorm(doc=4910)
          0.02634955 = weight(abstract_txt:identify in 4910) [ClassicSimilarity], result of:
            0.02634955 = score(doc=4910,freq=1.0), product of:
              0.085583456 = queryWeight, product of:
                1.4160976 = boost
                4.9261017 = idf(docFreq=875, maxDocs=44421)
                0.012268551 = queryNorm
              0.30788136 = fieldWeight in 4910, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9261017 = idf(docFreq=875, maxDocs=44421)
                0.0625 = fieldNorm(doc=4910)
          0.039532002 = weight(abstract_txt:search in 4910) [ClassicSimilarity], result of:
            0.039532002 = score(doc=4910,freq=6.0), product of:
              0.07065673 = queryWeight, product of:
                1.5758711 = boost
                3.654598 = idf(docFreq=3123, maxDocs=44421)
                0.012268551 = queryNorm
              0.5594938 = fieldWeight in 4910, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                3.654598 = idf(docFreq=3123, maxDocs=44421)
                0.0625 = fieldNorm(doc=4910)
          0.5475398 = weight(abstract_txt:queries in 4910) [ClassicSimilarity], result of:
            0.5475398 = score(doc=4910,freq=14.0), product of:
              0.45895004 = queryWeight, product of:
                7.332735 = boost
                5.1015973 = idf(docFreq=734, maxDocs=44421)
                0.012268551 = queryNorm
              1.1930269 = fieldWeight in 4910, product of:
                3.7416575 = tf(freq=14.0), with freq of:
                  14.0 = termFreq=14.0
                5.1015973 = idf(docFreq=734, maxDocs=44421)
                0.0625 = fieldNorm(doc=4910)
        0.24 = coord(6/25)
    
  4. Li, M.; Li, H.; Zhou, Z.-H.: Semi-supervised document retrieval (2009) 0.15
    0.1529252 = sum of:
      0.1529252 = product of:
        0.5461614 = sum of:
          0.0092788385 = weight(abstract_txt:results in 218) [ClassicSimilarity], result of:
            0.0092788385 = score(doc=218,freq=1.0), product of:
              0.04267794 = queryWeight, product of:
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.012268551 = queryNorm
              0.21741535 = fieldWeight in 218, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.0625 = fieldNorm(doc=218)
          0.038117222 = weight(abstract_txt:constructing in 218) [ClassicSimilarity], result of:
            0.038117222 = score(doc=218,freq=1.0), product of:
              0.08688508 = queryWeight, product of:
                1.008918 = boost
                7.019336 = idf(docFreq=107, maxDocs=44421)
                0.012268551 = queryNorm
              0.4387085 = fieldWeight in 218, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.019336 = idf(docFreq=107, maxDocs=44421)
                0.0625 = fieldNorm(doc=218)
          0.079486825 = weight(abstract_txt:supervised in 218) [ClassicSimilarity], result of:
            0.079486825 = score(doc=218,freq=3.0), product of:
              0.0983303 = queryWeight, product of:
                1.0733144 = boost
                7.467361 = idf(docFreq=68, maxDocs=44421)
                0.012268551 = queryNorm
              0.8083655 = fieldWeight in 218, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.467361 = idf(docFreq=68, maxDocs=44421)
                0.0625 = fieldNorm(doc=218)
          0.005831104 = weight(abstract_txt:that in 218) [ClassicSimilarity], result of:
            0.005831104 = score(doc=218,freq=1.0), product of:
              0.03945041 = queryWeight, product of:
                1.3596873 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.012268551 = queryNorm
              0.14780845 = fieldWeight in 218, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0625 = fieldNorm(doc=218)
          0.016138872 = weight(abstract_txt:search in 218) [ClassicSimilarity], result of:
            0.016138872 = score(doc=218,freq=1.0), product of:
              0.07065673 = queryWeight, product of:
                1.5758711 = boost
                3.654598 = idf(docFreq=3123, maxDocs=44421)
                0.012268551 = queryNorm
              0.22841237 = fieldWeight in 218, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.654598 = idf(docFreq=3123, maxDocs=44421)
                0.0625 = fieldNorm(doc=218)
          0.19035801 = weight(abstract_txt:labeled in 218) [ClassicSimilarity], result of:
            0.19035801 = score(doc=218,freq=4.0), product of:
              0.20148148 = queryWeight, product of:
                2.1727805 = boost
                7.558333 = idf(docFreq=62, maxDocs=44421)
                0.012268551 = queryNorm
              0.9447916 = fieldWeight in 218, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.558333 = idf(docFreq=62, maxDocs=44421)
                0.0625 = fieldNorm(doc=218)
          0.20695056 = weight(abstract_txt:queries in 218) [ClassicSimilarity], result of:
            0.20695056 = score(doc=218,freq=2.0), product of:
              0.45895004 = queryWeight, product of:
                7.332735 = boost
                5.1015973 = idf(docFreq=734, maxDocs=44421)
                0.012268551 = queryNorm
              0.45092174 = fieldWeight in 218, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.1015973 = idf(docFreq=734, maxDocs=44421)
                0.0625 = fieldNorm(doc=218)
        0.28 = coord(7/25)
    
  5. Ortiz-Cordova, A.; Yang, Y.; Jansen, B.J.: External to internal search : associating searching on search engines with searching on sites (2015) 0.15
    0.14523 = sum of:
      0.14523 = product of:
        0.605125 = sum of:
          0.053484246 = weight(abstract_txt:submitted in 3675) [ClassicSimilarity], result of:
            0.053484246 = score(doc=3675,freq=2.0), product of:
              0.086431414 = queryWeight, product of:
                1.0062805 = boost
                7.000987 = idf(docFreq=109, maxDocs=44421)
                0.012268551 = queryNorm
              0.61880565 = fieldWeight in 3675, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.000987 = idf(docFreq=109, maxDocs=44421)
                0.0625 = fieldNorm(doc=3675)
          0.0082464265 = weight(abstract_txt:that in 3675) [ClassicSimilarity], result of:
            0.0082464265 = score(doc=3675,freq=2.0), product of:
              0.03945041 = queryWeight, product of:
                1.3596873 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.012268551 = queryNorm
              0.20903271 = fieldWeight in 3675, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0625 = fieldNorm(doc=3675)
          0.02634955 = weight(abstract_txt:identify in 3675) [ClassicSimilarity], result of:
            0.02634955 = score(doc=3675,freq=1.0), product of:
              0.085583456 = queryWeight, product of:
                1.4160976 = boost
                4.9261017 = idf(docFreq=875, maxDocs=44421)
                0.012268551 = queryNorm
              0.30788136 = fieldWeight in 3675, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9261017 = idf(docFreq=875, maxDocs=44421)
                0.0625 = fieldNorm(doc=3675)
          0.06847143 = weight(abstract_txt:search in 3675) [ClassicSimilarity], result of:
            0.06847143 = score(doc=3675,freq=18.0), product of:
              0.07065673 = queryWeight, product of:
                1.5758711 = boost
                3.654598 = idf(docFreq=3123, maxDocs=44421)
                0.012268551 = queryNorm
              0.96907157 = fieldWeight in 3675, product of:
                4.2426405 = tf(freq=18.0), with freq of:
                  18.0 = termFreq=18.0
                3.654598 = idf(docFreq=3123, maxDocs=44421)
                0.0625 = fieldNorm(doc=3675)
          0.06140431 = weight(abstract_txt:classify in 3675) [ClassicSimilarity], result of:
            0.06140431 = score(doc=3675,freq=1.0), product of:
              0.150432 = queryWeight, product of:
                1.8774501 = boost
                6.5309834 = idf(docFreq=175, maxDocs=44421)
                0.012268551 = queryNorm
              0.40818647 = fieldWeight in 3675, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5309834 = idf(docFreq=175, maxDocs=44421)
                0.0625 = fieldNorm(doc=3675)
          0.38716903 = weight(abstract_txt:queries in 3675) [ClassicSimilarity], result of:
            0.38716903 = score(doc=3675,freq=7.0), product of:
              0.45895004 = queryWeight, product of:
                7.332735 = boost
                5.1015973 = idf(docFreq=734, maxDocs=44421)
                0.012268551 = queryNorm
              0.84359735 = fieldWeight in 3675, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                5.1015973 = idf(docFreq=734, maxDocs=44421)
                0.0625 = fieldNorm(doc=3675)
        0.24 = coord(6/25)