Document (#42084)

Author
Goslin, K.
Hofmann, M.
Title
¬A Wikipedia powered state-based approach to automatic search query enhancement
Source
Information processing and management. 54(2018) no.4, S.726-739
Year
2018
Abstract
This paper describes the development and testing of a novel Automatic Search Query Enhancement (ASQE) algorithm, the Wikipedia N Sub-state Algorithm (WNSSA), which utilises Wikipedia as the sole data source for prior knowledge. This algorithm is built upon the concept of iterative states and sub-states, harnessing the power of Wikipedia's data set and link information to identify and utilise reoccurring terms to aid term selection and weighting during enhancement. This algorithm is designed to prevent query drift by making callbacks to the user's original search intent by persisting the original query between internal states with additional selected enhancement terms. The developed algorithm has shown to improve both short and long queries by providing a better understanding of the query and available data. The proposed algorithm was compared against five existing ASQE algorithms that utilise Wikipedia as the sole data source, showing an average Mean Average Precision (MAP) improvement of 0.273 over the tested existing ASQE algorithms.
Content
Vgl.: https://doi.org/10.1016/j.ipm.2017.10.001.
Theme
Semantisches Umfeld in Indexierung u. Retrieval
Object
Wikipedia

Similar documents (author)

  1. Hofmann, U.: Kritische Erfolgsfaktoren führender US-Bibliotheken (1992) 5.41
    5.4105906 = sum of:
      5.4105906 = weight(author_txt:hofmann in 3973) [ClassicSimilarity], result of:
        5.4105906 = fieldWeight in 3973, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.656945 = idf(docFreq=20, maxDocs=44421)
          0.625 = fieldNorm(doc=3973)
    
  2. Hofmann, U.: Bibliothek und Buchhandel im Verbund : Kosten und Nutzen integrierter Informationsverarbeitung (1993) 5.41
    5.4105906 = sum of:
      5.4105906 = weight(author_txt:hofmann in 4499) [ClassicSimilarity], result of:
        5.4105906 = fieldWeight in 4499, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.656945 = idf(docFreq=20, maxDocs=44421)
          0.625 = fieldNorm(doc=4499)
    
  3. Hofmann, W.: Zur Frage internationaler Klassifikationssysteme (1947) 5.41
    5.4105906 = sum of:
      5.4105906 = weight(author_txt:hofmann in 5202) [ClassicSimilarity], result of:
        5.4105906 = fieldWeight in 5202, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.656945 = idf(docFreq=20, maxDocs=44421)
          0.625 = fieldNorm(doc=5202)
    
  4. Hofmann, M.: DFÜ mit dem PC : eine Übersicht über Methoden, Hardware und Software zur Datenkommunikation zwischen PCs (1992) 5.41
    5.4105906 = sum of:
      5.4105906 = weight(author_txt:hofmann in 6933) [ClassicSimilarity], result of:
        5.4105906 = fieldWeight in 6933, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.656945 = idf(docFreq=20, maxDocs=44421)
          0.625 = fieldNorm(doc=6933)
    
  5. Hofmann, M.: TREC Konferenzbericht (7.10.93) (1995) 5.41
    5.4105906 = sum of:
      5.4105906 = weight(author_txt:hofmann in 5073) [ClassicSimilarity], result of:
        5.4105906 = fieldWeight in 5073, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.656945 = idf(docFreq=20, maxDocs=44421)
          0.625 = fieldNorm(doc=5073)
    

Similar documents (content)

  1. Brandão, W.C.; Santos, R.L.T.; Ziviani, N.; Moura, E.S. de; Silva, A.S. da: Learning to expand queries using entities (2014) 0.17
    0.16543575 = sum of:
      0.16543575 = product of:
        0.5169867 = sum of:
          0.03979901 = weight(abstract_txt:terms in 2343) [ClassicSimilarity], result of:
            0.03979901 = score(doc=2343,freq=6.0), product of:
              0.06428895 = queryWeight, product of:
                1.0535829 = boost
                4.043712 = idf(docFreq=2116, maxDocs=44421)
                0.015089936 = queryNorm
              0.6190645 = fieldWeight in 2343, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.043712 = idf(docFreq=2116, maxDocs=44421)
                0.0625 = fieldNorm(doc=2343)
          0.024590667 = weight(abstract_txt:existing in 2343) [ClassicSimilarity], result of:
            0.024590667 = score(doc=2343,freq=1.0), product of:
              0.08474592 = queryWeight, product of:
                1.2096506 = boost
                4.6427093 = idf(docFreq=1162, maxDocs=44421)
                0.015089936 = queryNorm
              0.29016933 = fieldWeight in 2343, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6427093 = idf(docFreq=1162, maxDocs=44421)
                0.0625 = fieldNorm(doc=2343)
          0.02762367 = weight(abstract_txt:state in 2343) [ClassicSimilarity], result of:
            0.02762367 = score(doc=2343,freq=1.0), product of:
              0.091578364 = queryWeight, product of:
                1.2574681 = boost
                4.8262353 = idf(docFreq=967, maxDocs=44421)
                0.015089936 = queryNorm
              0.3016397 = fieldWeight in 2343, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8262353 = idf(docFreq=967, maxDocs=44421)
                0.0625 = fieldNorm(doc=2343)
          0.038747583 = weight(abstract_txt:original in 2343) [ClassicSimilarity], result of:
            0.038747583 = score(doc=2343,freq=1.0), product of:
              0.11475413 = queryWeight, product of:
                1.4076177 = boost
                5.4025183 = idf(docFreq=543, maxDocs=44421)
                0.015089936 = queryNorm
              0.3376574 = fieldWeight in 2343, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4025183 = idf(docFreq=543, maxDocs=44421)
                0.0625 = fieldNorm(doc=2343)
          0.01799144 = weight(abstract_txt:search in 2343) [ClassicSimilarity], result of:
            0.01799144 = score(doc=2343,freq=1.0), product of:
              0.07876737 = queryWeight, product of:
                1.4282997 = boost
                3.654598 = idf(docFreq=3123, maxDocs=44421)
                0.015089936 = queryNorm
              0.22841237 = fieldWeight in 2343, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.654598 = idf(docFreq=3123, maxDocs=44421)
                0.0625 = fieldNorm(doc=2343)
          0.050528478 = weight(abstract_txt:average in 2343) [ClassicSimilarity], result of:
            0.050528478 = score(doc=2343,freq=1.0), product of:
              0.13697125 = queryWeight, product of:
                1.5378546 = boost
                5.9023747 = idf(docFreq=329, maxDocs=44421)
                0.015089936 = queryNorm
              0.36889842 = fieldWeight in 2343, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9023747 = idf(docFreq=329, maxDocs=44421)
                0.0625 = fieldNorm(doc=2343)
          0.14763631 = weight(abstract_txt:query in 2343) [ClassicSimilarity], result of:
            0.14763631 = score(doc=2343,freq=5.0), product of:
              0.22218977 = queryWeight, product of:
                3.0969384 = boost
                4.754492 = idf(docFreq=1039, maxDocs=44421)
                0.015089936 = queryNorm
              0.6644604 = fieldWeight in 2343, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.754492 = idf(docFreq=1039, maxDocs=44421)
                0.0625 = fieldNorm(doc=2343)
          0.17006956 = weight(abstract_txt:wikipedia in 2343) [ClassicSimilarity], result of:
            0.17006956 = score(doc=2343,freq=2.0), product of:
              0.30762598 = queryWeight, product of:
                3.2593203 = boost
                6.25473 = idf(docFreq=231, maxDocs=44421)
                0.015089936 = queryNorm
              0.55284524 = fieldWeight in 2343, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.25473 = idf(docFreq=231, maxDocs=44421)
                0.0625 = fieldNorm(doc=2343)
        0.32 = coord(8/25)
    
  2. Selvaretnam, B.; Belkhatir, M.: ¬A linguistically driven framework for query expansion via grammatical constituent highlighting and role-based concept weighting (2016) 0.14
    0.14482461 = sum of:
      0.14482461 = product of:
        0.6034359 = sum of:
          0.06946411 = weight(abstract_txt:intent in 3876) [ClassicSimilarity], result of:
            0.06946411 = score(doc=3876,freq=1.0), product of:
              0.1158321 = queryWeight, product of:
                7.676116 = idf(docFreq=55, maxDocs=44421)
                0.015089936 = queryNorm
              0.5996966 = fieldWeight in 3876, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.676116 = idf(docFreq=55, maxDocs=44421)
                0.078125 = fieldNorm(doc=3876)
          0.020309845 = weight(abstract_txt:terms in 3876) [ClassicSimilarity], result of:
            0.020309845 = score(doc=3876,freq=1.0), product of:
              0.06428895 = queryWeight, product of:
                1.0535829 = boost
                4.043712 = idf(docFreq=2116, maxDocs=44421)
                0.015089936 = queryNorm
              0.31591502 = fieldWeight in 3876, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.043712 = idf(docFreq=2116, maxDocs=44421)
                0.078125 = fieldNorm(doc=3876)
          0.031804677 = weight(abstract_txt:search in 3876) [ClassicSimilarity], result of:
            0.031804677 = score(doc=3876,freq=2.0), product of:
              0.07876737 = queryWeight, product of:
                1.4282997 = boost
                3.654598 = idf(docFreq=3123, maxDocs=44421)
                0.015089936 = queryNorm
              0.40377986 = fieldWeight in 3876, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.654598 = idf(docFreq=3123, maxDocs=44421)
                0.078125 = fieldNorm(doc=3876)
          0.0631606 = weight(abstract_txt:average in 3876) [ClassicSimilarity], result of:
            0.0631606 = score(doc=3876,freq=1.0), product of:
              0.13697125 = queryWeight, product of:
                1.5378546 = boost
                5.9023747 = idf(docFreq=329, maxDocs=44421)
                0.015089936 = queryNorm
              0.46112302 = fieldWeight in 3876, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9023747 = idf(docFreq=329, maxDocs=44421)
                0.078125 = fieldNorm(doc=3876)
          0.24759361 = weight(abstract_txt:query in 3876) [ClassicSimilarity], result of:
            0.24759361 = score(doc=3876,freq=9.0), product of:
              0.22218977 = queryWeight, product of:
                3.0969384 = boost
                4.754492 = idf(docFreq=1039, maxDocs=44421)
                0.015089936 = queryNorm
              1.114334 = fieldWeight in 3876, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                4.754492 = idf(docFreq=1039, maxDocs=44421)
                0.078125 = fieldNorm(doc=3876)
          0.17110302 = weight(abstract_txt:algorithm in 3876) [ClassicSimilarity], result of:
            0.17110302 = score(doc=3876,freq=1.0), product of:
              0.38389352 = queryWeight, product of:
                4.4592986 = boost
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.015089936 = queryNorm
              0.44570434 = fieldWeight in 3876, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.078125 = fieldNorm(doc=3876)
        0.24 = coord(6/25)
    
  3. Sankarasubramaniam, Y.; Ramanathan, K.; Ghosh, S.: Text summarization using Wikipedia (2014) 0.13
    0.1340386 = sum of:
      0.1340386 = product of:
        0.55849415 = sum of:
          0.016247876 = weight(abstract_txt:terms in 3693) [ClassicSimilarity], result of:
            0.016247876 = score(doc=3693,freq=1.0), product of:
              0.06428895 = queryWeight, product of:
                1.0535829 = boost
                4.043712 = idf(docFreq=2116, maxDocs=44421)
                0.015089936 = queryNorm
              0.252732 = fieldWeight in 3693, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.043712 = idf(docFreq=2116, maxDocs=44421)
                0.0625 = fieldNorm(doc=3693)
          0.03446545 = weight(abstract_txt:automatic in 3693) [ClassicSimilarity], result of:
            0.03446545 = score(doc=3693,freq=1.0), product of:
              0.106135644 = queryWeight, product of:
                1.3537272 = boost
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.015089936 = queryNorm
              0.32473022 = fieldWeight in 3693, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.0625 = fieldNorm(doc=3693)
          0.06435873 = weight(abstract_txt:algorithms in 3693) [ClassicSimilarity], result of:
            0.06435873 = score(doc=3693,freq=2.0), product of:
              0.12774198 = queryWeight, product of:
                1.4851398 = boost
                5.7000527 = idf(docFreq=403, maxDocs=44421)
                0.015089936 = queryNorm
              0.5038182 = fieldWeight in 3693, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.7000527 = idf(docFreq=403, maxDocs=44421)
                0.0625 = fieldNorm(doc=3693)
          0.06602497 = weight(abstract_txt:query in 3693) [ClassicSimilarity], result of:
            0.06602497 = score(doc=3693,freq=1.0), product of:
              0.22218977 = queryWeight, product of:
                3.0969384 = boost
                4.754492 = idf(docFreq=1039, maxDocs=44421)
                0.015089936 = queryNorm
              0.29715574 = fieldWeight in 3693, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.754492 = idf(docFreq=1039, maxDocs=44421)
                0.0625 = fieldNorm(doc=3693)
          0.2405147 = weight(abstract_txt:wikipedia in 3693) [ClassicSimilarity], result of:
            0.2405147 = score(doc=3693,freq=4.0), product of:
              0.30762598 = queryWeight, product of:
                3.2593203 = boost
                6.25473 = idf(docFreq=231, maxDocs=44421)
                0.015089936 = queryNorm
              0.7818413 = fieldWeight in 3693, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.25473 = idf(docFreq=231, maxDocs=44421)
                0.0625 = fieldNorm(doc=3693)
          0.13688241 = weight(abstract_txt:algorithm in 3693) [ClassicSimilarity], result of:
            0.13688241 = score(doc=3693,freq=1.0), product of:
              0.38389352 = queryWeight, product of:
                4.4592986 = boost
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.015089936 = queryNorm
              0.35656348 = fieldWeight in 3693, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.0625 = fieldNorm(doc=3693)
        0.24 = coord(6/25)
    
  4. Abdelali, A.; Cowie, J.; Soliman, H.S.: Improving query precision using semantic expansion (2007) 0.12
    0.124902986 = sum of:
      0.124902986 = product of:
        0.52042913 = sum of:
          0.020309845 = weight(abstract_txt:terms in 1917) [ClassicSimilarity], result of:
            0.020309845 = score(doc=1917,freq=1.0), product of:
              0.06428895 = queryWeight, product of:
                1.0535829 = boost
                4.043712 = idf(docFreq=2116, maxDocs=44421)
                0.015089936 = queryNorm
              0.31591502 = fieldWeight in 1917, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.043712 = idf(docFreq=2116, maxDocs=44421)
                0.078125 = fieldNorm(doc=1917)
          0.043470565 = weight(abstract_txt:existing in 1917) [ClassicSimilarity], result of:
            0.043470565 = score(doc=1917,freq=2.0), product of:
              0.08474592 = queryWeight, product of:
                1.2096506 = boost
                4.6427093 = idf(docFreq=1162, maxDocs=44421)
                0.015089936 = queryNorm
              0.51295173 = fieldWeight in 1917, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.6427093 = idf(docFreq=1162, maxDocs=44421)
                0.078125 = fieldNorm(doc=1917)
          0.048434477 = weight(abstract_txt:original in 1917) [ClassicSimilarity], result of:
            0.048434477 = score(doc=1917,freq=1.0), product of:
              0.11475413 = queryWeight, product of:
                1.4076177 = boost
                5.4025183 = idf(docFreq=543, maxDocs=44421)
                0.015089936 = queryNorm
              0.42207175 = fieldWeight in 1917, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4025183 = idf(docFreq=543, maxDocs=44421)
                0.078125 = fieldNorm(doc=1917)
          0.022689074 = weight(abstract_txt:data in 1917) [ClassicSimilarity], result of:
            0.022689074 = score(doc=1917,freq=1.0), product of:
              0.087207355 = queryWeight, product of:
                1.73537 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.015089936 = queryNorm
              0.26017386 = fieldWeight in 1917, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.078125 = fieldNorm(doc=1917)
          0.16506241 = weight(abstract_txt:query in 1917) [ClassicSimilarity], result of:
            0.16506241 = score(doc=1917,freq=4.0), product of:
              0.22218977 = queryWeight, product of:
                3.0969384 = boost
                4.754492 = idf(docFreq=1039, maxDocs=44421)
                0.015089936 = queryNorm
              0.74288934 = fieldWeight in 1917, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.754492 = idf(docFreq=1039, maxDocs=44421)
                0.078125 = fieldNorm(doc=1917)
          0.22046275 = weight(abstract_txt:enhancement in 1917) [ClassicSimilarity], result of:
            0.22046275 = score(doc=1917,freq=1.0), product of:
              0.39709896 = queryWeight, product of:
                3.7030954 = boost
                7.1063476 = idf(docFreq=98, maxDocs=44421)
                0.015089936 = queryNorm
              0.5551834 = fieldWeight in 1917, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.1063476 = idf(docFreq=98, maxDocs=44421)
                0.078125 = fieldNorm(doc=1917)
        0.24 = coord(6/25)
    
  5. Lim, S.: How and why do college students use Wikipedia? (2009) 0.12
    0.12206552 = sum of:
      0.12206552 = product of:
        0.6103276 = sum of:
          0.08657315 = weight(abstract_txt:wikipedia's in 150) [ClassicSimilarity], result of:
            0.08657315 = score(doc=150,freq=1.0), product of:
              0.17015526 = queryWeight, product of:
                1.2120156 = boost
                9.303573 = idf(docFreq=10, maxDocs=44421)
                0.015089936 = queryNorm
              0.5087891 = fieldWeight in 150, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.303573 = idf(docFreq=10, maxDocs=44421)
                0.0546875 = fieldNorm(doc=150)
          0.024170712 = weight(abstract_txt:state in 150) [ClassicSimilarity], result of:
            0.024170712 = score(doc=150,freq=1.0), product of:
              0.091578364 = queryWeight, product of:
                1.2574681 = boost
                4.8262353 = idf(docFreq=967, maxDocs=44421)
                0.015089936 = queryNorm
              0.26393473 = fieldWeight in 150, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8262353 = idf(docFreq=967, maxDocs=44421)
                0.0546875 = fieldNorm(doc=150)
          0.015882352 = weight(abstract_txt:data in 150) [ClassicSimilarity], result of:
            0.015882352 = score(doc=150,freq=1.0), product of:
              0.087207355 = queryWeight, product of:
                1.73537 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.015089936 = queryNorm
              0.18212171 = fieldWeight in 150, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.0546875 = fieldNorm(doc=150)
          0.06280065 = weight(abstract_txt:states in 150) [ClassicSimilarity], result of:
            0.06280065 = score(doc=150,freq=1.0), product of:
              0.19812521 = queryWeight, product of:
                2.2652495 = boost
                5.796106 = idf(docFreq=366, maxDocs=44421)
                0.015089936 = queryNorm
              0.31697455 = fieldWeight in 150, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.796106 = idf(docFreq=366, maxDocs=44421)
                0.0546875 = fieldNorm(doc=150)
          0.4209007 = weight(abstract_txt:wikipedia in 150) [ClassicSimilarity], result of:
            0.4209007 = score(doc=150,freq=16.0), product of:
              0.30762598 = queryWeight, product of:
                3.2593203 = boost
                6.25473 = idf(docFreq=231, maxDocs=44421)
                0.015089936 = queryNorm
              1.3682222 = fieldWeight in 150, product of:
                4.0 = tf(freq=16.0), with freq of:
                  16.0 = termFreq=16.0
                6.25473 = idf(docFreq=231, maxDocs=44421)
                0.0546875 = fieldNorm(doc=150)
        0.2 = coord(5/25)