Document (#43329)

Dumitrescu, A.
Santini, S.
Full coverage of a reader's interests in context-based information filtering
Journal of the Association for Information Science and Technology. 72(2021) no.8, S.1011-1027
We present a collection of algorithms to filter a stream of documents in such a way that the filtered documents will cover as well as possible the interest of a person, keeping in mind that, at any given time, the offered documents should not only be relevant, but should also be diversified, in the sense of covering all the interests of the person. We use a modification of the WEBSOM algorithm to create a user model based on a self-organizing network trained using a collection of documents representative of the person's interests. We introduce the concepts of freshness and coverage. A document is fresh if it belongs to a semantic area of interest to a person for which no documents were seen in the recent past; a group of documents has coverage to the extent to which it is a good representation of all the interests of a person. Our tests show that these algorithms can effectively increase the coverage of the documents that are shown to the user without overly affecting precision.

Similar documents (content)

  1. Li, Q.; Wu, Y.-f.B.: People search : searching people sharing similar interests from the Web (2008) 0.16
    0.16413262 = sum of:
      0.16413262 = product of:
        1.025829 = sum of:
          0.116988406 = weight(abstract_txt:person's in 1344) [ClassicSimilarity], result of:
            0.116988406 = score(doc=1344,freq=1.0), product of:
              0.1766277 = queryWeight, product of:
                1.2496641 = boost
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.01667138 = queryNorm
              0.66234463 = fieldWeight in 1344, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.078125 = fieldNorm(doc=1344)
          0.07140455 = weight(abstract_txt:algorithms in 1344) [ClassicSimilarity], result of:
            0.07140455 = score(doc=1344,freq=1.0), product of:
              0.1601244 = queryWeight, product of:
                1.6827036 = boost
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.01667138 = queryNorm
              0.4459317 = fieldWeight in 1344, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.078125 = fieldNorm(doc=1344)
          0.55359924 = weight(abstract_txt:person in 1344) [ClassicSimilarity], result of:
            0.55359924 = score(doc=1344,freq=8.0), product of:
              0.39513963 = queryWeight, product of:
                3.7382572 = boost
                6.340301 = idf(docFreq=211, maxDocs=44218)
                0.01667138 = queryNorm
              1.4010217 = fieldWeight in 1344, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                6.340301 = idf(docFreq=211, maxDocs=44218)
                0.078125 = fieldNorm(doc=1344)
          0.2838368 = weight(abstract_txt:interests in 1344) [ClassicSimilarity], result of:
            0.2838368 = score(doc=1344,freq=2.0), product of:
              0.40180874 = queryWeight, product of:
                3.769672 = boost
                6.3935823 = idf(docFreq=200, maxDocs=44218)
                0.01667138 = queryNorm
              0.7063978 = fieldWeight in 1344, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.3935823 = idf(docFreq=200, maxDocs=44218)
                0.078125 = fieldNorm(doc=1344)
        0.16 = coord(4/25)
  2. Losee, R.M.: Browsing document collections : automatically organizing digital libraries and hypermedia using the Gray code (1997) 0.12
    0.12433758 = sum of:
      0.12433758 = product of:
        0.51807326 = sum of:
          0.02659127 = weight(abstract_txt:user in 146) [ClassicSimilarity], result of:
            0.02659127 = score(doc=146,freq=3.0), product of:
              0.06668568 = queryWeight, product of:
                1.0859133 = boost
                3.6835442 = idf(docFreq=3020, maxDocs=44218)
                0.01667138 = queryNorm
              0.39875534 = fieldWeight in 146, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.6835442 = idf(docFreq=3020, maxDocs=44218)
                0.0625 = fieldNorm(doc=146)
          0.043634966 = weight(abstract_txt:collection in 146) [ClassicSimilarity], result of:
            0.043634966 = score(doc=146,freq=2.0), product of:
              0.10620054 = queryWeight, product of:
                1.3703839 = boost
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.01667138 = queryNorm
              0.4108733 = fieldWeight in 146, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.0625 = fieldNorm(doc=146)
          0.01155799 = weight(abstract_txt:that in 146) [ClassicSimilarity], result of:
            0.01155799 = score(doc=146,freq=2.0), product of:
              0.05518679 = queryWeight, product of:
                1.3970484 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.01667138 = queryNorm
              0.20943399 = fieldWeight in 146, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=146)
          0.04093771 = weight(abstract_txt:interest in 146) [ClassicSimilarity], result of:
            0.04093771 = score(doc=146,freq=1.0), product of:
              0.12823187 = queryWeight, product of:
                1.5058331 = boost
                5.1079607 = idf(docFreq=726, maxDocs=44218)
                0.01667138 = queryNorm
              0.31924754 = fieldWeight in 146, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1079607 = idf(docFreq=726, maxDocs=44218)
                0.0625 = fieldNorm(doc=146)
          0.22706942 = weight(abstract_txt:interests in 146) [ClassicSimilarity], result of:
            0.22706942 = score(doc=146,freq=2.0), product of:
              0.40180874 = queryWeight, product of:
                3.769672 = boost
                6.3935823 = idf(docFreq=200, maxDocs=44218)
                0.01667138 = queryNorm
              0.5651182 = fieldWeight in 146, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.3935823 = idf(docFreq=200, maxDocs=44218)
                0.0625 = fieldNorm(doc=146)
          0.16828191 = weight(abstract_txt:documents in 146) [ClassicSimilarity], result of:
            0.16828191 = score(doc=146,freq=5.0), product of:
              0.29217154 = queryWeight, product of:
                4.2523775 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.01667138 = queryNorm
              0.5759696 = fieldWeight in 146, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.0625 = fieldNorm(doc=146)
        0.24 = coord(6/25)
  3. Crystal, A.; Greenberg, J.: Relevance criteria identified by health information users during Web searches (2006) 0.11
    0.11238091 = sum of:
      0.11238091 = product of:
        0.4682538 = sum of:
          0.015352477 = weight(abstract_txt:user in 5909) [ClassicSimilarity], result of:
            0.015352477 = score(doc=5909,freq=1.0), product of:
              0.06668568 = queryWeight, product of:
                1.0859133 = boost
                3.6835442 = idf(docFreq=3020, maxDocs=44218)
                0.01667138 = queryNorm
              0.23022151 = fieldWeight in 5909, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6835442 = idf(docFreq=3020, maxDocs=44218)
                0.0625 = fieldNorm(doc=5909)
          0.04634391 = weight(abstract_txt:should in 5909) [ClassicSimilarity], result of:
            0.04634391 = score(doc=5909,freq=3.0), product of:
              0.0965758 = queryWeight, product of:
                1.3068117 = boost
                4.432857 = idf(docFreq=1427, maxDocs=44218)
                0.01667138 = queryNorm
              0.47987086 = fieldWeight in 5909, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.432857 = idf(docFreq=1427, maxDocs=44218)
                0.0625 = fieldNorm(doc=5909)
          0.014155588 = weight(abstract_txt:that in 5909) [ClassicSimilarity], result of:
            0.014155588 = score(doc=5909,freq=3.0), product of:
              0.05518679 = queryWeight, product of:
                1.3970484 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.01667138 = queryNorm
              0.2565032 = fieldWeight in 5909, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=5909)
          0.15658152 = weight(abstract_txt:person in 5909) [ClassicSimilarity], result of:
            0.15658152 = score(doc=5909,freq=1.0), product of:
              0.39513963 = queryWeight, product of:
                3.7382572 = boost
                6.340301 = idf(docFreq=211, maxDocs=44218)
                0.01667138 = queryNorm
              0.3962688 = fieldWeight in 5909, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.340301 = idf(docFreq=211, maxDocs=44218)
                0.0625 = fieldNorm(doc=5909)
          0.16056232 = weight(abstract_txt:interests in 5909) [ClassicSimilarity], result of:
            0.16056232 = score(doc=5909,freq=1.0), product of:
              0.40180874 = queryWeight, product of:
                3.769672 = boost
                6.3935823 = idf(docFreq=200, maxDocs=44218)
                0.01667138 = queryNorm
              0.3995989 = fieldWeight in 5909, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3935823 = idf(docFreq=200, maxDocs=44218)
                0.0625 = fieldNorm(doc=5909)
          0.07525796 = weight(abstract_txt:documents in 5909) [ClassicSimilarity], result of:
            0.07525796 = score(doc=5909,freq=1.0), product of:
              0.29217154 = queryWeight, product of:
                4.2523775 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.01667138 = queryNorm
              0.2575814 = fieldWeight in 5909, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.0625 = fieldNorm(doc=5909)
        0.24 = coord(6/25)
  4. Hänger, C.: Knowledge management in the digital age : the possibilities of user generated content (2009) 0.09
    0.09422151 = sum of:
      0.09422151 = product of:
        0.39258963 = sum of:
          0.021711681 = weight(abstract_txt:user in 2813) [ClassicSimilarity], result of:
            0.021711681 = score(doc=2813,freq=2.0), product of:
              0.06668568 = queryWeight, product of:
                1.0859133 = boost
                3.6835442 = idf(docFreq=3020, maxDocs=44218)
                0.01667138 = queryNorm
              0.32558239 = fieldWeight in 2813, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.6835442 = idf(docFreq=3020, maxDocs=44218)
                0.0625 = fieldNorm(doc=2813)
          0.03085458 = weight(abstract_txt:collection in 2813) [ClassicSimilarity], result of:
            0.03085458 = score(doc=2813,freq=1.0), product of:
              0.10620054 = queryWeight, product of:
                1.3703839 = boost
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.01667138 = queryNorm
              0.2905313 = fieldWeight in 2813, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.0625 = fieldNorm(doc=2813)
          0.008172733 = weight(abstract_txt:that in 2813) [ClassicSimilarity], result of:
            0.008172733 = score(doc=2813,freq=1.0), product of:
              0.05518679 = queryWeight, product of:
                1.3970484 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.01667138 = queryNorm
              0.1480922 = fieldWeight in 2813, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=2813)
          0.04093771 = weight(abstract_txt:interest in 2813) [ClassicSimilarity], result of:
            0.04093771 = score(doc=2813,freq=1.0), product of:
              0.12823187 = queryWeight, product of:
                1.5058331 = boost
                5.1079607 = idf(docFreq=726, maxDocs=44218)
                0.01667138 = queryNorm
              0.31924754 = fieldWeight in 2813, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1079607 = idf(docFreq=726, maxDocs=44218)
                0.0625 = fieldNorm(doc=2813)
          0.16056232 = weight(abstract_txt:interests in 2813) [ClassicSimilarity], result of:
            0.16056232 = score(doc=2813,freq=1.0), product of:
              0.40180874 = queryWeight, product of:
                3.769672 = boost
                6.3935823 = idf(docFreq=200, maxDocs=44218)
                0.01667138 = queryNorm
              0.3995989 = fieldWeight in 2813, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3935823 = idf(docFreq=200, maxDocs=44218)
                0.0625 = fieldNorm(doc=2813)
          0.1303506 = weight(abstract_txt:documents in 2813) [ClassicSimilarity], result of:
            0.1303506 = score(doc=2813,freq=3.0), product of:
              0.29217154 = queryWeight, product of:
                4.2523775 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.01667138 = queryNorm
              0.44614407 = fieldWeight in 2813, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.0625 = fieldNorm(doc=2813)
        0.24 = coord(6/25)
  5. Krulwich, B.; Burkey, C.: ¬The InfoFinder agent : learning user interests through heuristic phrase extraction (1997) 0.09
    0.094049156 = sum of:
      0.094049156 = product of:
        0.58780724 = sum of:
          0.030704955 = weight(abstract_txt:user in 3938) [ClassicSimilarity], result of:
            0.030704955 = score(doc=3938,freq=1.0), product of:
              0.06668568 = queryWeight, product of:
                1.0859133 = boost
                3.6835442 = idf(docFreq=3020, maxDocs=44218)
                0.01667138 = queryNorm
              0.46044302 = fieldWeight in 3938, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6835442 = idf(docFreq=3020, maxDocs=44218)
                0.125 = fieldNorm(doc=3938)
          0.02311598 = weight(abstract_txt:that in 3938) [ClassicSimilarity], result of:
            0.02311598 = score(doc=3938,freq=2.0), product of:
              0.05518679 = queryWeight, product of:
                1.3970484 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.01667138 = queryNorm
              0.41886798 = fieldWeight in 3938, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.125 = fieldNorm(doc=3938)
          0.32112464 = weight(abstract_txt:interests in 3938) [ClassicSimilarity], result of:
            0.32112464 = score(doc=3938,freq=1.0), product of:
              0.40180874 = queryWeight, product of:
                3.769672 = boost
                6.3935823 = idf(docFreq=200, maxDocs=44218)
                0.01667138 = queryNorm
              0.7991978 = fieldWeight in 3938, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3935823 = idf(docFreq=200, maxDocs=44218)
                0.125 = fieldNorm(doc=3938)
          0.21286164 = weight(abstract_txt:documents in 3938) [ClassicSimilarity], result of:
            0.21286164 = score(doc=3938,freq=2.0), product of:
              0.29217154 = queryWeight, product of:
                4.2523775 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.01667138 = queryNorm
              0.72855026 = fieldWeight in 3938, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.125 = fieldNorm(doc=3938)
        0.16 = coord(4/25)