Document (#40326)

Author
Aletras, N.
Baldwin, T.
Lau, J.H.
Stevenson, M.
Title
Evaluating topic representations for exploring document collections
Source
Journal of the Association for Information Science and Technology. 68(2017) no.1, S.154-167
Year
2017
Abstract
Topic models have been shown to be a useful way of representing the content of large document collections, for example, via visualization interfaces (topic browsers). These systems enable users to explore collections by way of latent topics. A standard way to represent a topic is using a term list; that is the top-n words with highest conditional probability within the topic. Other topic representations such as textual and image labels also have been proposed. However, there has been no comparison of these alternative representations. In this article, we compare 3 different topic representations in a document retrieval task. Participants were asked to retrieve relevant documents based on predefined queries within a fixed time limit, presenting topics in one of the following modalities: (a) lists of terms, (b) textual phrase labels, and (c) image labels. Results show that textual labels are easier for users to interpret than are term lists and image labels. Moreover, the precision of retrieved documents for textual and image labels is comparable to the precision achieved by representing topics using term lists, demonstrating that labeling methods are an effective alternative topic representation.
Content
Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23574/full.
Theme
Visualisierung

Similar documents (author)

  1. Stevenson, G.: ¬The Mainzer Sachkatalog and his background (1970) 5.81
    5.814733 = sum of:
      5.814733 = weight(author_txt:stevenson in 753) [ClassicSimilarity], result of:
        5.814733 = fieldWeight in 753, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.303573 = idf(docFreq=10, maxDocs=44421)
          0.625 = fieldNorm(doc=753)
    
  2. Stevenson, G.: ¬The historical context: traditional classification since 1950 (1974) 5.81
    5.814733 = sum of:
      5.814733 = weight(author_txt:stevenson in 1258) [ClassicSimilarity], result of:
        5.814733 = fieldWeight in 1258, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.303573 = idf(docFreq=10, maxDocs=44421)
          0.625 = fieldNorm(doc=1258)
    
  3. Stevenson, G.: Andreas Schleiermacher's bibliographic classification and its relationship to the Dewey Decimal Classification (1978) 5.81
    5.814733 = sum of:
      5.814733 = weight(author_txt:stevenson in 3549) [ClassicSimilarity], result of:
        5.814733 = fieldWeight in 3549, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.303573 = idf(docFreq=10, maxDocs=44421)
          0.625 = fieldNorm(doc=3549)
    
  4. McDonald, S.; Stevenson, R.J.: Navigation in hyperspace : an evaluation of the effects of navigational tools and subject matter expertise on browsing and information retrieval in hypertext (1998) 4.65
    4.6517863 = sum of:
      4.6517863 = weight(author_txt:stevenson in 4760) [ClassicSimilarity], result of:
        4.6517863 = fieldWeight in 4760, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.303573 = idf(docFreq=10, maxDocs=44421)
          0.5 = fieldNorm(doc=4760)
    
  5. Cole, C.; Mandelblatt, B.; Stevenson, J.: Visualizing a high recall search strategy output for undergraduates in an exploration stage of researching a term paper (2002) 3.49
    3.4888396 = sum of:
      3.4888396 = weight(author_txt:stevenson in 3575) [ClassicSimilarity], result of:
        3.4888396 = fieldWeight in 3575, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.303573 = idf(docFreq=10, maxDocs=44421)
          0.375 = fieldNorm(doc=3575)
    

Similar documents (content)

  1. Alkhodair, S.A.; Fung, B.C.M.; Patrick, O.R.; Hung, C.K.: Improving interpretations of topic modeling in microblogs (2018) 0.25
    0.24784513 = sum of:
      0.24784513 = product of:
        0.88516116 = sum of:
          0.057694767 = weight(abstract_txt:labeling in 181) [ClassicSimilarity], result of:
            0.057694767 = score(doc=181,freq=1.0), product of:
              0.11789078 = queryWeight, product of:
                1.0872867 = boost
                7.8302665 = idf(docFreq=47, maxDocs=44421)
                0.013847114 = queryNorm
              0.48939165 = fieldWeight in 181, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.8302665 = idf(docFreq=47, maxDocs=44421)
                0.0625 = fieldNorm(doc=181)
          0.029183619 = weight(abstract_txt:documents in 181) [ClassicSimilarity], result of:
            0.029183619 = score(doc=181,freq=3.0), product of:
              0.06538095 = queryWeight, product of:
                1.1451036 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.013847114 = queryNorm
              0.4463627 = fieldWeight in 181, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.0625 = fieldNorm(doc=181)
          0.028546946 = weight(abstract_txt:document in 181) [ClassicSimilarity], result of:
            0.028546946 = score(doc=181,freq=1.0), product of:
              0.106365904 = queryWeight, product of:
                1.7888173 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.013847114 = queryNorm
              0.26838437 = fieldWeight in 181, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.0625 = fieldNorm(doc=181)
          0.037370306 = weight(abstract_txt:collections in 181) [ClassicSimilarity], result of:
            0.037370306 = score(doc=181,freq=1.0), product of:
              0.12728594 = queryWeight, product of:
                1.9568384 = boost
                4.6974936 = idf(docFreq=1100, maxDocs=44421)
                0.013847114 = queryNorm
              0.29359335 = fieldWeight in 181, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6974936 = idf(docFreq=1100, maxDocs=44421)
                0.0625 = fieldNorm(doc=181)
          0.10560359 = weight(abstract_txt:topics in 181) [ClassicSimilarity], result of:
            0.10560359 = score(doc=181,freq=5.0), product of:
              0.14878476 = queryWeight, product of:
                2.1156507 = boost
                5.078731 = idf(docFreq=751, maxDocs=44421)
                0.013847114 = queryNorm
              0.70977426 = fieldWeight in 181, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.078731 = idf(docFreq=751, maxDocs=44421)
                0.0625 = fieldNorm(doc=181)
          0.3509866 = weight(abstract_txt:topic in 181) [ClassicSimilarity], result of:
            0.3509866 = score(doc=181,freq=8.0), product of:
              0.39287037 = queryWeight, product of:
                5.6140175 = boost
                5.053779 = idf(docFreq=770, maxDocs=44421)
                0.013847114 = queryNorm
              0.89339036 = fieldWeight in 181, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                5.053779 = idf(docFreq=770, maxDocs=44421)
                0.0625 = fieldNorm(doc=181)
          0.27577534 = weight(abstract_txt:labels in 181) [ClassicSimilarity], result of:
            0.27577534 = score(doc=181,freq=1.0), product of:
              0.6078685 = queryWeight, product of:
                6.0476213 = boost
                7.2588162 = idf(docFreq=84, maxDocs=44421)
                0.013847114 = queryNorm
              0.45367602 = fieldWeight in 181, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2588162 = idf(docFreq=84, maxDocs=44421)
                0.0625 = fieldNorm(doc=181)
        0.28 = coord(7/25)
    
  2. Ouyang, Y.; Li, W.; Li, S.; Lu, Q.: Intertopic information mining for query-based summarization (2010) 0.17
    0.16849512 = sum of:
      0.16849512 = product of:
        0.702063 = sum of:
          0.01684917 = weight(abstract_txt:documents in 446) [ClassicSimilarity], result of:
            0.01684917 = score(doc=446,freq=1.0), product of:
              0.06538095 = queryWeight, product of:
                1.1451036 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.013847114 = queryNorm
              0.25770763 = fieldWeight in 446, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.0625 = fieldNorm(doc=446)
          0.01702362 = weight(abstract_txt:been in 446) [ClassicSimilarity], result of:
            0.01702362 = score(doc=446,freq=1.0), product of:
              0.07535821 = queryWeight, product of:
                1.5056709 = boost
                3.614442 = idf(docFreq=3251, maxDocs=44421)
                0.013847114 = queryNorm
              0.22590263 = fieldWeight in 446, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.614442 = idf(docFreq=3251, maxDocs=44421)
                0.0625 = fieldNorm(doc=446)
          0.028546946 = weight(abstract_txt:document in 446) [ClassicSimilarity], result of:
            0.028546946 = score(doc=446,freq=1.0), product of:
              0.106365904 = queryWeight, product of:
                1.7888173 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.013847114 = queryNorm
              0.26838437 = fieldWeight in 446, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.0625 = fieldNorm(doc=446)
          0.11568294 = weight(abstract_txt:topics in 446) [ClassicSimilarity], result of:
            0.11568294 = score(doc=446,freq=6.0), product of:
              0.14878476 = queryWeight, product of:
                2.1156507 = boost
                5.078731 = idf(docFreq=751, maxDocs=44421)
                0.013847114 = queryNorm
              0.77751875 = fieldWeight in 446, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.078731 = idf(docFreq=751, maxDocs=44421)
                0.0625 = fieldNorm(doc=446)
          0.24818501 = weight(abstract_txt:topic in 446) [ClassicSimilarity], result of:
            0.24818501 = score(doc=446,freq=4.0), product of:
              0.39287037 = queryWeight, product of:
                5.6140175 = boost
                5.053779 = idf(docFreq=770, maxDocs=44421)
                0.013847114 = queryNorm
              0.6317224 = fieldWeight in 446, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.053779 = idf(docFreq=770, maxDocs=44421)
                0.0625 = fieldNorm(doc=446)
          0.27577534 = weight(abstract_txt:labels in 446) [ClassicSimilarity], result of:
            0.27577534 = score(doc=446,freq=1.0), product of:
              0.6078685 = queryWeight, product of:
                6.0476213 = boost
                7.2588162 = idf(docFreq=84, maxDocs=44421)
                0.013847114 = queryNorm
              0.45367602 = fieldWeight in 446, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2588162 = idf(docFreq=84, maxDocs=44421)
                0.0625 = fieldNorm(doc=446)
        0.24 = coord(6/25)
    
  3. Soricut, R.; Marcu, D.: Abstractive headline generation using WIDL-expressions (2007) 0.16
    0.16353965 = sum of:
      0.16353965 = product of:
        0.6814152 = sum of:
          0.08591797 = weight(abstract_txt:representing in 1943) [ClassicSimilarity], result of:
            0.08591797 = score(doc=1943,freq=2.0), product of:
              0.13248649 = queryWeight, product of:
                1.630065 = boost
                5.869585 = idf(docFreq=340, maxDocs=44421)
                0.013847114 = queryNorm
              0.64850366 = fieldWeight in 1943, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.869585 = idf(docFreq=340, maxDocs=44421)
                0.078125 = fieldNorm(doc=1943)
          0.07136736 = weight(abstract_txt:document in 1943) [ClassicSimilarity], result of:
            0.07136736 = score(doc=1943,freq=4.0), product of:
              0.106365904 = queryWeight, product of:
                1.7888173 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.013847114 = queryNorm
              0.6709609 = fieldWeight in 1943, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.078125 = fieldNorm(doc=1943)
          0.059034202 = weight(abstract_txt:topics in 1943) [ClassicSimilarity], result of:
            0.059034202 = score(doc=1943,freq=1.0), product of:
              0.14878476 = queryWeight, product of:
                2.1156507 = boost
                5.078731 = idf(docFreq=751, maxDocs=44421)
                0.013847114 = queryNorm
              0.39677587 = fieldWeight in 1943, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.078731 = idf(docFreq=751, maxDocs=44421)
                0.078125 = fieldNorm(doc=1943)
          0.18034339 = weight(abstract_txt:textual in 1943) [ClassicSimilarity], result of:
            0.18034339 = score(doc=1943,freq=2.0), product of:
              0.27364808 = queryWeight, product of:
                3.313068 = boost
                5.9648952 = idf(docFreq=309, maxDocs=44421)
                0.013847114 = queryNorm
              0.659034 = fieldWeight in 1943, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.9648952 = idf(docFreq=309, maxDocs=44421)
                0.078125 = fieldNorm(doc=1943)
          0.12963663 = weight(abstract_txt:representations in 1943) [ClassicSimilarity], result of:
            0.12963663 = score(doc=1943,freq=1.0), product of:
              0.2766649 = queryWeight, product of:
                3.3312802 = boost
                5.997685 = idf(docFreq=299, maxDocs=44421)
                0.013847114 = queryNorm
              0.46856913 = fieldWeight in 1943, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.997685 = idf(docFreq=299, maxDocs=44421)
                0.078125 = fieldNorm(doc=1943)
          0.15511563 = weight(abstract_txt:topic in 1943) [ClassicSimilarity], result of:
            0.15511563 = score(doc=1943,freq=1.0), product of:
              0.39287037 = queryWeight, product of:
                5.6140175 = boost
                5.053779 = idf(docFreq=770, maxDocs=44421)
                0.013847114 = queryNorm
              0.3948265 = fieldWeight in 1943, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.053779 = idf(docFreq=770, maxDocs=44421)
                0.078125 = fieldNorm(doc=1943)
        0.24 = coord(6/25)
    
  4. Xu, J.; Croft, W.B.: Topic-based language models for distributed retrieval (2000) 0.15
    0.1534139 = sum of:
      0.1534139 = product of:
        0.63922465 = sum of:
          0.07211845 = weight(abstract_txt:labeling in 1038) [ClassicSimilarity], result of:
            0.07211845 = score(doc=1038,freq=1.0), product of:
              0.11789078 = queryWeight, product of:
                1.0872867 = boost
                7.8302665 = idf(docFreq=47, maxDocs=44421)
                0.013847114 = queryNorm
              0.6117396 = fieldWeight in 1038, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.8302665 = idf(docFreq=47, maxDocs=44421)
                0.078125 = fieldNorm(doc=1038)
          0.03647952 = weight(abstract_txt:documents in 1038) [ClassicSimilarity], result of:
            0.03647952 = score(doc=1038,freq=3.0), product of:
              0.06538095 = queryWeight, product of:
                1.1451036 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.013847114 = queryNorm
              0.55795336 = fieldWeight in 1038, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.078125 = fieldNorm(doc=1038)
          0.050464347 = weight(abstract_txt:document in 1038) [ClassicSimilarity], result of:
            0.050464347 = score(doc=1038,freq=2.0), product of:
              0.106365904 = queryWeight, product of:
                1.7888173 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.013847114 = queryNorm
              0.47444102 = fieldWeight in 1038, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.078125 = fieldNorm(doc=1038)
          0.093425766 = weight(abstract_txt:collections in 1038) [ClassicSimilarity], result of:
            0.093425766 = score(doc=1038,freq=4.0), product of:
              0.12728594 = queryWeight, product of:
                1.9568384 = boost
                4.6974936 = idf(docFreq=1100, maxDocs=44421)
                0.013847114 = queryNorm
              0.7339834 = fieldWeight in 1038, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.6974936 = idf(docFreq=1100, maxDocs=44421)
                0.078125 = fieldNorm(doc=1038)
          0.118068404 = weight(abstract_txt:topics in 1038) [ClassicSimilarity], result of:
            0.118068404 = score(doc=1038,freq=4.0), product of:
              0.14878476 = queryWeight, product of:
                2.1156507 = boost
                5.078731 = idf(docFreq=751, maxDocs=44421)
                0.013847114 = queryNorm
              0.79355174 = fieldWeight in 1038, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.078731 = idf(docFreq=751, maxDocs=44421)
                0.078125 = fieldNorm(doc=1038)
          0.26866814 = weight(abstract_txt:topic in 1038) [ClassicSimilarity], result of:
            0.26866814 = score(doc=1038,freq=3.0), product of:
              0.39287037 = queryWeight, product of:
                5.6140175 = boost
                5.053779 = idf(docFreq=770, maxDocs=44421)
                0.013847114 = queryNorm
              0.6838595 = fieldWeight in 1038, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.053779 = idf(docFreq=770, maxDocs=44421)
                0.078125 = fieldNorm(doc=1038)
        0.24 = coord(6/25)
    
  5. Li, X.; Zhang, A.; Li, C.; Ouyang, J.; Cai, Y.: Exploring coherent topics by topic modeling with term weighting (2018) 0.14
    0.14287975 = sum of:
      0.14287975 = product of:
        0.5953323 = sum of:
          0.066379845 = weight(abstract_txt:conditional in 45) [ClassicSimilarity], result of:
            0.066379845 = score(doc=45,freq=1.0), product of:
              0.12944335 = queryWeight, product of:
                1.1393155 = boost
                8.20496 = idf(docFreq=32, maxDocs=44421)
                0.013847114 = queryNorm
              0.51281 = fieldWeight in 45, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.20496 = idf(docFreq=32, maxDocs=44421)
                0.0625 = fieldNorm(doc=45)
          0.028546946 = weight(abstract_txt:document in 45) [ClassicSimilarity], result of:
            0.028546946 = score(doc=45,freq=1.0), product of:
              0.106365904 = queryWeight, product of:
                1.7888173 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.013847114 = queryNorm
              0.26838437 = fieldWeight in 45, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.0625 = fieldNorm(doc=45)
          0.068829805 = weight(abstract_txt:term in 45) [ClassicSimilarity], result of:
            0.068829805 = score(doc=45,freq=3.0), product of:
              0.13260908 = queryWeight, product of:
                1.9973372 = boost
                4.794713 = idf(docFreq=998, maxDocs=44421)
                0.013847114 = queryNorm
              0.5190429 = fieldWeight in 45, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.794713 = idf(docFreq=998, maxDocs=44421)
                0.0625 = fieldNorm(doc=45)
          0.066789575 = weight(abstract_txt:topics in 45) [ClassicSimilarity], result of:
            0.066789575 = score(doc=45,freq=2.0), product of:
              0.14878476 = queryWeight, product of:
                2.1156507 = boost
                5.078731 = idf(docFreq=751, maxDocs=44421)
                0.013847114 = queryNorm
              0.44890064 = fieldWeight in 45, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.078731 = idf(docFreq=751, maxDocs=44421)
                0.0625 = fieldNorm(doc=45)
          0.060822815 = weight(abstract_txt:lists in 45) [ClassicSimilarity], result of:
            0.060822815 = score(doc=45,freq=1.0), product of:
              0.17611933 = queryWeight, product of:
                2.3018036 = boost
                5.5256004 = idf(docFreq=480, maxDocs=44421)
                0.013847114 = queryNorm
              0.34535003 = fieldWeight in 45, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5256004 = idf(docFreq=480, maxDocs=44421)
                0.0625 = fieldNorm(doc=45)
          0.30396333 = weight(abstract_txt:topic in 45) [ClassicSimilarity], result of:
            0.30396333 = score(doc=45,freq=6.0), product of:
              0.39287037 = queryWeight, product of:
                5.6140175 = boost
                5.053779 = idf(docFreq=770, maxDocs=44421)
                0.013847114 = queryNorm
              0.7736988 = fieldWeight in 45, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.053779 = idf(docFreq=770, maxDocs=44421)
                0.0625 = fieldNorm(doc=45)
        0.24 = coord(6/25)