Document (#32915)

Author
Lin, J.
Title
User simulations for evaluating answers to question series
Source
Information processing and management. 43(2007) no.3, S.717-729
Year
2007
Abstract
Recently, question series have become one focus of research in question answering. These series are comprised of individual factoid, list, and "other" questions organized around a central topic, and represent abstractions of user-system dialogs. Existing evaluation methodologies have yet to catch up with this richer task model, as they fail to take into account contextual dependencies and different user behaviors. This paper presents a novel simulation-based methodology for evaluating answers to question series that addresses some of these shortcomings. Using this methodology, we examine two different behavior models: a "QA-styled" user and an "IR-styled" user. Results suggest that an off-the-shelf document retrieval system is competitive with state-of-the-art QA systems in this task. Advantages and limitations of evaluations based on user simulations are also discussed.
Footnote
Beitrag in: Special issue on Heterogeneous and Distributed IR

Similar documents (content)

  1. Lin, J.; Katz, B.: Building a reusable test collection for question answering (2006) 0.26
    0.25593892 = sum of:
      0.25593892 = product of:
        0.71094143 = sum of:
          0.13185982 = weight(abstract_txt:answering in 45) [ClassicSimilarity], result of:
            0.13185982 = score(doc=45,freq=4.0), product of:
              0.12711538 = queryWeight, product of:
                1.0243416 = boost
                6.6388726 = idf(docFreq=157, maxDocs=44421)
                0.018692138 = queryNorm
              1.0373238 = fieldWeight in 45, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.6388726 = idf(docFreq=157, maxDocs=44421)
                0.078125 = fieldNorm(doc=45)
          0.024451563 = weight(abstract_txt:system in 45) [ClassicSimilarity], result of:
            0.024451563 = score(doc=45,freq=2.0), product of:
              0.065616675 = queryWeight, product of:
                1.0408014 = boost
                3.372775 = idf(docFreq=4140, maxDocs=44421)
                0.018692138 = queryNorm
              0.37264252 = fieldWeight in 45, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.372775 = idf(docFreq=4140, maxDocs=44421)
                0.078125 = fieldNorm(doc=45)
          0.022089077 = weight(abstract_txt:different in 45) [ClassicSimilarity], result of:
            0.022089077 = score(doc=45,freq=1.0), product of:
              0.07725706 = queryWeight, product of:
                1.1293534 = boost
                3.6597328 = idf(docFreq=3107, maxDocs=44421)
                0.018692138 = queryNorm
              0.28591663 = fieldWeight in 45, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6597328 = idf(docFreq=3107, maxDocs=44421)
                0.078125 = fieldNorm(doc=45)
          0.09327989 = weight(abstract_txt:shortcomings in 45) [ClassicSimilarity], result of:
            0.09327989 = score(doc=45,freq=1.0), product of:
              0.1602022 = queryWeight, product of:
                1.1499528 = boost
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.018692138 = queryNorm
              0.58226347 = fieldWeight in 45, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.078125 = fieldNorm(doc=45)
          0.04318572 = weight(abstract_txt:methodology in 45) [ClassicSimilarity], result of:
            0.04318572 = score(doc=45,freq=1.0), product of:
              0.12079433 = queryWeight, product of:
                1.4121604 = boost
                4.5761847 = idf(docFreq=1242, maxDocs=44421)
                0.018692138 = queryNorm
              0.35751444 = fieldWeight in 45, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5761847 = idf(docFreq=1242, maxDocs=44421)
                0.078125 = fieldNorm(doc=45)
          0.021748737 = weight(abstract_txt:this in 45) [ClassicSimilarity], result of:
            0.021748737 = score(doc=45,freq=3.0), product of:
              0.06679522 = queryWeight, product of:
                1.4850752 = boost
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.018692138 = queryNorm
              0.3256032 = fieldWeight in 45, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.078125 = fieldNorm(doc=45)
          0.053204555 = weight(abstract_txt:task in 45) [ClassicSimilarity], result of:
            0.053204555 = score(doc=45,freq=1.0), product of:
              0.1388201 = queryWeight, product of:
                1.5138642 = boost
                4.9057617 = idf(docFreq=893, maxDocs=44421)
                0.018692138 = queryNorm
              0.38326263 = fieldWeight in 45, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9057617 = idf(docFreq=893, maxDocs=44421)
                0.078125 = fieldNorm(doc=45)
          0.25370005 = weight(abstract_txt:question in 45) [ClassicSimilarity], result of:
            0.25370005 = score(doc=45,freq=4.0), product of:
              0.31214532 = queryWeight, product of:
                3.2103631 = boost
                5.2016807 = idf(docFreq=664, maxDocs=44421)
                0.018692138 = queryNorm
              0.8127626 = fieldWeight in 45, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.2016807 = idf(docFreq=664, maxDocs=44421)
                0.078125 = fieldNorm(doc=45)
          0.067422 = weight(abstract_txt:user in 45) [ClassicSimilarity], result of:
            0.067422 = score(doc=45,freq=1.0), product of:
              0.23445597 = queryWeight, product of:
                3.4076269 = boost
                3.6808684 = idf(docFreq=3042, maxDocs=44421)
                0.018692138 = queryNorm
              0.28756785 = fieldWeight in 45, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6808684 = idf(docFreq=3042, maxDocs=44421)
                0.078125 = fieldNorm(doc=45)
        0.36 = coord(9/25)
    
  2. Abacha, A.B.; Zweigenbaum, P.: MEANS: A medical question-answering system combining NLP techniques and semantic Web technologies (2015) 0.17
    0.16731575 = sum of:
      0.16731575 = product of:
        0.59755623 = sum of:
          0.093238965 = weight(abstract_txt:answering in 3677) [ClassicSimilarity], result of:
            0.093238965 = score(doc=3677,freq=2.0), product of:
              0.12711538 = queryWeight, product of:
                1.0243416 = boost
                6.6388726 = idf(docFreq=157, maxDocs=44421)
                0.018692138 = queryNorm
              0.7334987 = fieldWeight in 3677, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.6388726 = idf(docFreq=157, maxDocs=44421)
                0.078125 = fieldNorm(doc=3677)
          0.029946927 = weight(abstract_txt:system in 3677) [ClassicSimilarity], result of:
            0.029946927 = score(doc=3677,freq=3.0), product of:
              0.065616675 = queryWeight, product of:
                1.0408014 = boost
                3.372775 = idf(docFreq=4140, maxDocs=44421)
                0.018692138 = queryNorm
              0.45639202 = fieldWeight in 3677, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.372775 = idf(docFreq=4140, maxDocs=44421)
                0.078125 = fieldNorm(doc=3677)
          0.01775777 = weight(abstract_txt:this in 3677) [ClassicSimilarity], result of:
            0.01775777 = score(doc=3677,freq=2.0), product of:
              0.06679522 = queryWeight, product of:
                1.4850752 = boost
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.018692138 = queryNorm
              0.26585388 = fieldWeight in 3677, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.078125 = fieldNorm(doc=3677)
          0.053204555 = weight(abstract_txt:task in 3677) [ClassicSimilarity], result of:
            0.053204555 = score(doc=3677,freq=1.0), product of:
              0.1388201 = queryWeight, product of:
                1.5138642 = boost
                4.9057617 = idf(docFreq=893, maxDocs=44421)
                0.018692138 = queryNorm
              0.38326263 = fieldWeight in 3677, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9057617 = idf(docFreq=893, maxDocs=44421)
                0.078125 = fieldNorm(doc=3677)
          0.15659304 = weight(abstract_txt:answers in 3677) [ClassicSimilarity], result of:
            0.15659304 = score(doc=3677,freq=2.0), product of:
              0.22628622 = queryWeight, product of:
                1.9328128 = boost
                6.263388 = idf(docFreq=229, maxDocs=44421)
                0.018692138 = queryNorm
              0.69201314 = fieldWeight in 3677, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.263388 = idf(docFreq=229, maxDocs=44421)
                0.078125 = fieldNorm(doc=3677)
          0.17939301 = weight(abstract_txt:question in 3677) [ClassicSimilarity], result of:
            0.17939301 = score(doc=3677,freq=2.0), product of:
              0.31214532 = queryWeight, product of:
                3.2103631 = boost
                5.2016807 = idf(docFreq=664, maxDocs=44421)
                0.018692138 = queryNorm
              0.5747099 = fieldWeight in 3677, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.2016807 = idf(docFreq=664, maxDocs=44421)
                0.078125 = fieldNorm(doc=3677)
          0.067422 = weight(abstract_txt:user in 3677) [ClassicSimilarity], result of:
            0.067422 = score(doc=3677,freq=1.0), product of:
              0.23445597 = queryWeight, product of:
                3.4076269 = boost
                3.6808684 = idf(docFreq=3042, maxDocs=44421)
                0.018692138 = queryNorm
              0.28756785 = fieldWeight in 3677, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6808684 = idf(docFreq=3042, maxDocs=44421)
                0.078125 = fieldNorm(doc=3677)
        0.28 = coord(7/25)
    
  3. Radev, D.; Fan, W.; Qu, H.; Wu, H.; Grewal, A.: Probabilistic question answering on the Web (2005) 0.16
    0.15986945 = sum of:
      0.15986945 = product of:
        0.5709623 = sum of:
          0.093238965 = weight(abstract_txt:answering in 4455) [ClassicSimilarity], result of:
            0.093238965 = score(doc=4455,freq=2.0), product of:
              0.12711538 = queryWeight, product of:
                1.0243416 = boost
                6.6388726 = idf(docFreq=157, maxDocs=44421)
                0.018692138 = queryNorm
              0.7334987 = fieldWeight in 4455, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.6388726 = idf(docFreq=157, maxDocs=44421)
                0.078125 = fieldNorm(doc=4455)
          0.017289868 = weight(abstract_txt:system in 4455) [ClassicSimilarity], result of:
            0.017289868 = score(doc=4455,freq=1.0), product of:
              0.065616675 = queryWeight, product of:
                1.0408014 = boost
                3.372775 = idf(docFreq=4140, maxDocs=44421)
                0.018692138 = queryNorm
              0.26349807 = fieldWeight in 4455, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.372775 = idf(docFreq=4140, maxDocs=44421)
                0.078125 = fieldNorm(doc=4455)
          0.022089077 = weight(abstract_txt:different in 4455) [ClassicSimilarity], result of:
            0.022089077 = score(doc=4455,freq=1.0), product of:
              0.07725706 = queryWeight, product of:
                1.1293534 = boost
                3.6597328 = idf(docFreq=3107, maxDocs=44421)
                0.018692138 = queryNorm
              0.28591663 = fieldWeight in 4455, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6597328 = idf(docFreq=3107, maxDocs=44421)
                0.078125 = fieldNorm(doc=4455)
          0.0125566395 = weight(abstract_txt:this in 4455) [ClassicSimilarity], result of:
            0.0125566395 = score(doc=4455,freq=1.0), product of:
              0.06679522 = queryWeight, product of:
                1.4850752 = boost
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.018692138 = queryNorm
              0.18798709 = fieldWeight in 4455, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.078125 = fieldNorm(doc=4455)
          0.110728 = weight(abstract_txt:answers in 4455) [ClassicSimilarity], result of:
            0.110728 = score(doc=4455,freq=1.0), product of:
              0.22628622 = queryWeight, product of:
                1.9328128 = boost
                6.263388 = idf(docFreq=229, maxDocs=44421)
                0.018692138 = queryNorm
              0.4893272 = fieldWeight in 4455, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.263388 = idf(docFreq=229, maxDocs=44421)
                0.078125 = fieldNorm(doc=4455)
          0.21971068 = weight(abstract_txt:question in 4455) [ClassicSimilarity], result of:
            0.21971068 = score(doc=4455,freq=3.0), product of:
              0.31214532 = queryWeight, product of:
                3.2103631 = boost
                5.2016807 = idf(docFreq=664, maxDocs=44421)
                0.018692138 = queryNorm
              0.70387304 = fieldWeight in 4455, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.2016807 = idf(docFreq=664, maxDocs=44421)
                0.078125 = fieldNorm(doc=4455)
          0.0953491 = weight(abstract_txt:user in 4455) [ClassicSimilarity], result of:
            0.0953491 = score(doc=4455,freq=2.0), product of:
              0.23445597 = queryWeight, product of:
                3.4076269 = boost
                3.6808684 = idf(docFreq=3042, maxDocs=44421)
                0.018692138 = queryNorm
              0.40668234 = fieldWeight in 4455, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.6808684 = idf(docFreq=3042, maxDocs=44421)
                0.078125 = fieldNorm(doc=4455)
        0.28 = coord(7/25)
    
  4. Budzik, J.; Hammond, K.: Q&A: a system for the capture, organization and reuse of expertise (1999) 0.15
    0.15267149 = sum of:
      0.15267149 = product of:
        0.5452553 = sum of:
          0.06526727 = weight(abstract_txt:answering in 668) [ClassicSimilarity], result of:
            0.06526727 = score(doc=668,freq=2.0), product of:
              0.12711538 = queryWeight, product of:
                1.0243416 = boost
                6.6388726 = idf(docFreq=157, maxDocs=44421)
                0.018692138 = queryNorm
              0.5134491 = fieldWeight in 668, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.6388726 = idf(docFreq=157, maxDocs=44421)
                0.0546875 = fieldNorm(doc=668)
          0.017116092 = weight(abstract_txt:system in 668) [ClassicSimilarity], result of:
            0.017116092 = score(doc=668,freq=2.0), product of:
              0.065616675 = queryWeight, product of:
                1.0408014 = boost
                3.372775 = idf(docFreq=4140, maxDocs=44421)
                0.018692138 = queryNorm
              0.26084974 = fieldWeight in 668, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.372775 = idf(docFreq=4140, maxDocs=44421)
                0.0546875 = fieldNorm(doc=668)
          0.015224116 = weight(abstract_txt:this in 668) [ClassicSimilarity], result of:
            0.015224116 = score(doc=668,freq=3.0), product of:
              0.06679522 = queryWeight, product of:
                1.4850752 = boost
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.018692138 = queryNorm
              0.22792223 = fieldWeight in 668, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.0546875 = fieldNorm(doc=668)
          0.037243187 = weight(abstract_txt:task in 668) [ClassicSimilarity], result of:
            0.037243187 = score(doc=668,freq=1.0), product of:
              0.1388201 = queryWeight, product of:
                1.5138642 = boost
                4.9057617 = idf(docFreq=893, maxDocs=44421)
                0.018692138 = queryNorm
              0.26828384 = fieldWeight in 668, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9057617 = idf(docFreq=893, maxDocs=44421)
                0.0546875 = fieldNorm(doc=668)
          0.0775096 = weight(abstract_txt:answers in 668) [ClassicSimilarity], result of:
            0.0775096 = score(doc=668,freq=1.0), product of:
              0.22628622 = queryWeight, product of:
                1.9328128 = boost
                6.263388 = idf(docFreq=229, maxDocs=44421)
                0.018692138 = queryNorm
              0.34252903 = fieldWeight in 668, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.263388 = idf(docFreq=229, maxDocs=44421)
                0.0546875 = fieldNorm(doc=668)
          0.25115022 = weight(abstract_txt:question in 668) [ClassicSimilarity], result of:
            0.25115022 = score(doc=668,freq=8.0), product of:
              0.31214532 = queryWeight, product of:
                3.2103631 = boost
                5.2016807 = idf(docFreq=664, maxDocs=44421)
                0.018692138 = queryNorm
              0.8045939 = fieldWeight in 668, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                5.2016807 = idf(docFreq=664, maxDocs=44421)
                0.0546875 = fieldNorm(doc=668)
          0.08174483 = weight(abstract_txt:user in 668) [ClassicSimilarity], result of:
            0.08174483 = score(doc=668,freq=3.0), product of:
              0.23445597 = queryWeight, product of:
                3.4076269 = boost
                3.6808684 = idf(docFreq=3042, maxDocs=44421)
                0.018692138 = queryNorm
              0.3486575 = fieldWeight in 668, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.6808684 = idf(docFreq=3042, maxDocs=44421)
                0.0546875 = fieldNorm(doc=668)
        0.28 = coord(7/25)
    
  5. Le, L.T.; Shah, C.: Retrieving people : identifying potential answerers in Community Question-Answering (2018) 0.14
    0.13785249 = sum of:
      0.13785249 = product of:
        0.5743854 = sum of:
          0.052743927 = weight(abstract_txt:answering in 467) [ClassicSimilarity], result of:
            0.052743927 = score(doc=467,freq=1.0), product of:
              0.12711538 = queryWeight, product of:
                1.0243416 = boost
                6.6388726 = idf(docFreq=157, maxDocs=44421)
                0.018692138 = queryNorm
              0.41492954 = fieldWeight in 467, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6388726 = idf(docFreq=157, maxDocs=44421)
                0.0625 = fieldNorm(doc=467)
          0.053831365 = weight(abstract_txt:evaluations in 467) [ClassicSimilarity], result of:
            0.053831365 = score(doc=467,freq=1.0), product of:
              0.12885661 = queryWeight, product of:
                1.0313334 = boost
                6.684188 = idf(docFreq=150, maxDocs=44421)
                0.018692138 = queryNorm
              0.41776174 = fieldWeight in 467, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.684188 = idf(docFreq=150, maxDocs=44421)
                0.0625 = fieldNorm(doc=467)
          0.010045311 = weight(abstract_txt:this in 467) [ClassicSimilarity], result of:
            0.010045311 = score(doc=467,freq=1.0), product of:
              0.06679522 = queryWeight, product of:
                1.4850752 = boost
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.018692138 = queryNorm
              0.15038967 = fieldWeight in 467, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4062347 = idf(docFreq=10885, maxDocs=44421)
                0.0625 = fieldNorm(doc=467)
          0.088582404 = weight(abstract_txt:answers in 467) [ClassicSimilarity], result of:
            0.088582404 = score(doc=467,freq=1.0), product of:
              0.22628622 = queryWeight, product of:
                1.9328128 = boost
                6.263388 = idf(docFreq=229, maxDocs=44421)
                0.018692138 = queryNorm
              0.39146176 = fieldWeight in 467, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.263388 = idf(docFreq=229, maxDocs=44421)
                0.0625 = fieldNorm(doc=467)
          0.24857427 = weight(abstract_txt:question in 467) [ClassicSimilarity], result of:
            0.24857427 = score(doc=467,freq=6.0), product of:
              0.31214532 = queryWeight, product of:
                3.2103631 = boost
                5.2016807 = idf(docFreq=664, maxDocs=44421)
                0.018692138 = queryNorm
              0.7963415 = fieldWeight in 467, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.2016807 = idf(docFreq=664, maxDocs=44421)
                0.0625 = fieldNorm(doc=467)
          0.120608136 = weight(abstract_txt:user in 467) [ClassicSimilarity], result of:
            0.120608136 = score(doc=467,freq=5.0), product of:
              0.23445597 = queryWeight, product of:
                3.4076269 = boost
                3.6808684 = idf(docFreq=3042, maxDocs=44421)
                0.018692138 = queryNorm
              0.514417 = fieldWeight in 467, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.6808684 = idf(docFreq=3042, maxDocs=44421)
                0.0625 = fieldNorm(doc=467)
        0.24 = coord(6/25)