Document (#34081)

Author
Xiaoyan Li, X.
Croft, W.B.
Title
¬An information-pattern-based approach to novelty detection
Source
Information processing and management. 44(2008) no.3, S.1159-1188
Year
2008
Abstract
In this paper, a new novelty detection approach based on the identification of sentence level information patterns is proposed. First, "novelty" is redefined based on the proposed information patterns, and several different types of information patterns are given corresponding to different types of users' information needs. Second, a thorough analysis of sentence level information patterns is elaborated using data from the TREC novelty tracks, including sentence lengths, named entities (NEs), and sentence level opinion patterns. Finally, a unified information-pattern-based approach to novelty detection (ip-BAND) is presented for both specific NE topics and more general topics. Experiments on novelty detection on data from the TREC 2002, 2003 and 2004 novelty tracks show that the proposed approach significantly improves the performance of novelty detection in terms of precision at top ranks. Future research directions are suggested.

Similar documents (author)

  1. Croft, W.B.: Approaches to intelligent information retrieval (1987) 5.02
    5.023691 = sum of:
      5.023691 = weight(author_txt:croft in 1093) [ClassicSimilarity], result of:
        5.023691 = fieldWeight in 1093, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.037906 = idf(docFreq=38, maxDocs=44421)
          0.625 = fieldNorm(doc=1093)
    
  2. Croft, W.B.: Clustering large files of documents using the single link method (1977) 5.02
    5.023691 = sum of:
      5.023691 = weight(author_txt:croft in 5488) [ClassicSimilarity], result of:
        5.023691 = fieldWeight in 5488, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.037906 = idf(docFreq=38, maxDocs=44421)
          0.625 = fieldNorm(doc=5488)
    
  3. Croft, W.B.: Knowledge-based and statistical approaches to text retrieval (1993) 5.02
    5.023691 = sum of:
      5.023691 = weight(author_txt:croft in 7862) [ClassicSimilarity], result of:
        5.023691 = fieldWeight in 7862, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.037906 = idf(docFreq=38, maxDocs=44421)
          0.625 = fieldNorm(doc=7862)
    
  4. Croft, W.B.: Hypertext and information retrieval : what are the fundamental concepts? (1990) 5.02
    5.023691 = sum of:
      5.023691 = weight(author_txt:croft in 8002) [ClassicSimilarity], result of:
        5.023691 = fieldWeight in 8002, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.037906 = idf(docFreq=38, maxDocs=44421)
          0.625 = fieldNorm(doc=8002)
    
  5. Croft, W.B.: What do people want from information retrieval? : the top 10 research issues for companies that use and sell IR systems (1995) 5.02
    5.023691 = sum of:
      5.023691 = weight(author_txt:croft in 3470) [ClassicSimilarity], result of:
        5.023691 = fieldWeight in 3470, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.037906 = idf(docFreq=38, maxDocs=44421)
          0.625 = fieldNorm(doc=3470)
    

Similar documents (content)

  1. Otterbacher, J.; Radev, D.: Exploring fact-focused relevance and novelty detection (2008) 0.32
    0.31941327 = sum of:
      0.31941327 = product of:
        1.3308886 = sum of:
          0.045779362 = weight(abstract_txt:level in 3210) [ClassicSimilarity], result of:
            0.045779362 = score(doc=3210,freq=3.0), product of:
              0.094078556 = queryWeight, product of:
                2.046485 = boost
                4.4950905 = idf(docFreq=1347, maxDocs=44421)
                0.010226891 = queryNorm
              0.48660782 = fieldWeight in 3210, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.4950905 = idf(docFreq=1347, maxDocs=44421)
                0.0625 = fieldNorm(doc=3210)
          0.040632706 = weight(abstract_txt:approach in 3210) [ClassicSimilarity], result of:
            0.040632706 = score(doc=3210,freq=4.0), product of:
              0.086888306 = queryWeight, product of:
                2.27098 = boost
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.010226891 = queryNorm
              0.467643 = fieldWeight in 3210, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.0625 = fieldNorm(doc=3210)
          0.016645033 = weight(abstract_txt:information in 3210) [ClassicSimilarity], result of:
            0.016645033 = score(doc=3210,freq=3.0), product of:
              0.06356619 = queryWeight, product of:
                2.5695953 = boost
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.010226891 = queryNorm
              0.26185355 = fieldWeight in 3210, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.0625 = fieldNorm(doc=3210)
          0.17632432 = weight(abstract_txt:sentence in 3210) [ClassicSimilarity], result of:
            0.17632432 = score(doc=3210,freq=2.0), product of:
              0.29124758 = queryWeight, product of:
                4.1578016 = boost
                6.849437 = idf(docFreq=127, maxDocs=44421)
                0.010226891 = queryNorm
              0.60541046 = fieldWeight in 3210, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.849437 = idf(docFreq=127, maxDocs=44421)
                0.0625 = fieldNorm(doc=3210)
          0.26114386 = weight(abstract_txt:detection in 3210) [ClassicSimilarity], result of:
            0.26114386 = score(doc=3210,freq=3.0), product of:
              0.35610682 = queryWeight, product of:
                5.1401734 = boost
                6.774214 = idf(docFreq=137, maxDocs=44421)
                0.010226891 = queryNorm
              0.73333013 = fieldWeight in 3210, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.774214 = idf(docFreq=137, maxDocs=44421)
                0.0625 = fieldNorm(doc=3210)
          0.7903633 = weight(abstract_txt:novelty in 3210) [ClassicSimilarity], result of:
            0.7903633 = score(doc=3210,freq=5.0), product of:
              0.7350248 = queryWeight, product of:
                9.341112 = boost
                7.694134 = idf(docFreq=54, maxDocs=44421)
                0.010226891 = queryNorm
              1.0752879 = fieldWeight in 3210, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.694134 = idf(docFreq=54, maxDocs=44421)
                0.0625 = fieldNorm(doc=3210)
        0.24 = coord(6/25)
    
  2. An, X.; Huang, J.X.: geNov : a new metric for measuring novelty and relevancy in biomedical information retrieval (2017) 0.27
    0.27277252 = sum of:
      0.27277252 = product of:
        0.97418755 = sum of:
          0.009509346 = weight(abstract_txt:different in 4921) [ClassicSimilarity], result of:
            0.009509346 = score(doc=4921,freq=1.0), product of:
              0.04157395 = queryWeight, product of:
                1.1107805 = boost
                3.6597328 = idf(docFreq=3107, maxDocs=44421)
                0.010226891 = queryNorm
              0.2287333 = fieldWeight in 4921, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6597328 = idf(docFreq=3107, maxDocs=44421)
                0.0625 = fieldNorm(doc=4921)
          0.012513339 = weight(abstract_txt:based in 4921) [ClassicSimilarity], result of:
            0.012513339 = score(doc=4921,freq=1.0), product of:
              0.062899366 = queryWeight, product of:
                1.9322163 = boost
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.010226891 = queryNorm
              0.1989422 = fieldWeight in 4921, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.0625 = fieldNorm(doc=4921)
          0.058459364 = weight(abstract_txt:trec in 4921) [ClassicSimilarity], result of:
            0.058459364 = score(doc=4921,freq=1.0), product of:
              0.13951585 = queryWeight, product of:
                2.0348358 = boost
                6.704255 = idf(docFreq=147, maxDocs=44421)
                0.010226891 = queryNorm
              0.41901594 = fieldWeight in 4921, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.704255 = idf(docFreq=147, maxDocs=44421)
                0.0625 = fieldNorm(doc=4921)
          0.03737869 = weight(abstract_txt:level in 4921) [ClassicSimilarity], result of:
            0.03737869 = score(doc=4921,freq=2.0), product of:
              0.094078556 = queryWeight, product of:
                2.046485 = boost
                4.4950905 = idf(docFreq=1347, maxDocs=44421)
                0.010226891 = queryNorm
              0.39731362 = fieldWeight in 4921, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4950905 = idf(docFreq=1347, maxDocs=44421)
                0.0625 = fieldNorm(doc=4921)
          0.04931848 = weight(abstract_txt:proposed in 4921) [ClassicSimilarity], result of:
            0.04931848 = score(doc=4921,freq=3.0), product of:
              0.098866835 = queryWeight, product of:
                2.0979183 = boost
                4.608063 = idf(docFreq=1203, maxDocs=44421)
                0.010226891 = queryNorm
              0.49883747 = fieldWeight in 4921, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.608063 = idf(docFreq=1203, maxDocs=44421)
                0.0625 = fieldNorm(doc=4921)
          0.016645033 = weight(abstract_txt:information in 4921) [ClassicSimilarity], result of:
            0.016645033 = score(doc=4921,freq=3.0), product of:
              0.06356619 = queryWeight, product of:
                2.5695953 = boost
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.010226891 = queryNorm
              0.26185355 = fieldWeight in 4921, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.0625 = fieldNorm(doc=4921)
          0.7903633 = weight(abstract_txt:novelty in 4921) [ClassicSimilarity], result of:
            0.7903633 = score(doc=4921,freq=5.0), product of:
              0.7350248 = queryWeight, product of:
                9.341112 = boost
                7.694134 = idf(docFreq=54, maxDocs=44421)
                0.010226891 = queryNorm
              1.0752879 = fieldWeight in 4921, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.694134 = idf(docFreq=54, maxDocs=44421)
                0.0625 = fieldNorm(doc=4921)
        0.28 = coord(7/25)
    
  3. MacCain, K.W.: Descriptor and citation retrieval in the medical behavioral sciences literature : retrieval overlaps and novelty distribution (1989) 0.23
    0.23149014 = sum of:
      0.23149014 = product of:
        0.96454227 = sum of:
          0.011886683 = weight(abstract_txt:different in 2289) [ClassicSimilarity], result of:
            0.011886683 = score(doc=2289,freq=1.0), product of:
              0.04157395 = queryWeight, product of:
                1.1107805 = boost
                3.6597328 = idf(docFreq=3107, maxDocs=44421)
                0.010226891 = queryNorm
              0.28591663 = fieldWeight in 2289, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6597328 = idf(docFreq=3107, maxDocs=44421)
                0.078125 = fieldNorm(doc=2289)
          0.021588098 = weight(abstract_txt:types in 2289) [ClassicSimilarity], result of:
            0.021588098 = score(doc=2289,freq=1.0), product of:
              0.061885715 = queryWeight, product of:
                1.3552294 = boost
                4.4651284 = idf(docFreq=1388, maxDocs=44421)
                0.010226891 = queryNorm
              0.34883815 = fieldWeight in 2289, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4651284 = idf(docFreq=1388, maxDocs=44421)
                0.078125 = fieldNorm(doc=2289)
          0.05502231 = weight(abstract_txt:topics in 2289) [ClassicSimilarity], result of:
            0.05502231 = score(doc=2289,freq=3.0), product of:
              0.0800632 = queryWeight, product of:
                1.5414665 = boost
                5.078731 = idf(docFreq=751, maxDocs=44421)
                0.010226891 = queryNorm
              0.68723595 = fieldWeight in 2289, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.078731 = idf(docFreq=751, maxDocs=44421)
                0.078125 = fieldNorm(doc=2289)
          0.022120666 = weight(abstract_txt:based in 2289) [ClassicSimilarity], result of:
            0.022120666 = score(doc=2289,freq=2.0), product of:
              0.062899366 = queryWeight, product of:
                1.9322163 = boost
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.010226891 = queryNorm
              0.35168344 = fieldWeight in 2289, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.078125 = fieldNorm(doc=2289)
          0.08865848 = weight(abstract_txt:patterns in 2289) [ClassicSimilarity], result of:
            0.08865848 = score(doc=2289,freq=1.0), product of:
              0.21539767 = queryWeight, product of:
                3.997681 = boost
                5.2685275 = idf(docFreq=621, maxDocs=44421)
                0.010226891 = queryNorm
              0.41160372 = fieldWeight in 2289, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2685275 = idf(docFreq=621, maxDocs=44421)
                0.078125 = fieldNorm(doc=2289)
          0.765266 = weight(abstract_txt:novelty in 2289) [ClassicSimilarity], result of:
            0.765266 = score(doc=2289,freq=3.0), product of:
              0.7350248 = queryWeight, product of:
                9.341112 = boost
                7.694134 = idf(docFreq=54, maxDocs=44421)
                0.010226891 = queryNorm
              1.0411431 = fieldWeight in 2289, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.694134 = idf(docFreq=54, maxDocs=44421)
                0.078125 = fieldNorm(doc=2289)
        0.24 = coord(6/25)
    
  4. Bando, L.L.; Scholer, F.; Turpin, A.: Query-biased summary generation assisted by query expansion : temporality (2015) 0.23
    0.22871755 = sum of:
      0.22871755 = product of:
        0.8168484 = sum of:
          0.029408008 = weight(abstract_txt:improves in 2820) [ClassicSimilarity], result of:
            0.029408008 = score(doc=2820,freq=1.0), product of:
              0.07004136 = queryWeight, product of:
                1.0194827 = boost
                6.717861 = idf(docFreq=145, maxDocs=44421)
                0.010226891 = queryNorm
              0.41986632 = fieldWeight in 2820, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.717861 = idf(docFreq=145, maxDocs=44421)
                0.0625 = fieldNorm(doc=2820)
          0.060063858 = weight(abstract_txt:lengths in 2820) [ClassicSimilarity], result of:
            0.060063858 = score(doc=2820,freq=1.0), product of:
              0.1127508 = queryWeight, product of:
                1.293488 = boost
                8.523414 = idf(docFreq=23, maxDocs=44421)
                0.010226891 = queryNorm
              0.53271335 = fieldWeight in 2820, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.523414 = idf(docFreq=23, maxDocs=44421)
                0.0625 = fieldNorm(doc=2820)
          0.058459364 = weight(abstract_txt:trec in 2820) [ClassicSimilarity], result of:
            0.058459364 = score(doc=2820,freq=1.0), product of:
              0.13951585 = queryWeight, product of:
                2.0348358 = boost
                6.704255 = idf(docFreq=147, maxDocs=44421)
                0.010226891 = queryNorm
              0.41901594 = fieldWeight in 2820, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.704255 = idf(docFreq=147, maxDocs=44421)
                0.0625 = fieldNorm(doc=2820)
          0.045779362 = weight(abstract_txt:level in 2820) [ClassicSimilarity], result of:
            0.045779362 = score(doc=2820,freq=3.0), product of:
              0.094078556 = queryWeight, product of:
                2.046485 = boost
                4.4950905 = idf(docFreq=1347, maxDocs=44421)
                0.010226891 = queryNorm
              0.48660782 = fieldWeight in 2820, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.4950905 = idf(docFreq=1347, maxDocs=44421)
                0.0625 = fieldNorm(doc=2820)
          0.020316353 = weight(abstract_txt:approach in 2820) [ClassicSimilarity], result of:
            0.020316353 = score(doc=2820,freq=1.0), product of:
              0.086888306 = queryWeight, product of:
                2.27098 = boost
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.010226891 = queryNorm
              0.2338215 = fieldWeight in 2820, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.0625 = fieldNorm(doc=2820)
          0.24936025 = weight(abstract_txt:sentence in 2820) [ClassicSimilarity], result of:
            0.24936025 = score(doc=2820,freq=4.0), product of:
              0.29124758 = queryWeight, product of:
                4.1578016 = boost
                6.849437 = idf(docFreq=127, maxDocs=44421)
                0.010226891 = queryNorm
              0.85617965 = fieldWeight in 2820, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.849437 = idf(docFreq=127, maxDocs=44421)
                0.0625 = fieldNorm(doc=2820)
          0.35346124 = weight(abstract_txt:novelty in 2820) [ClassicSimilarity], result of:
            0.35346124 = score(doc=2820,freq=1.0), product of:
              0.7350248 = queryWeight, product of:
                9.341112 = boost
                7.694134 = idf(docFreq=54, maxDocs=44421)
                0.010226891 = queryNorm
              0.4808834 = fieldWeight in 2820, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.694134 = idf(docFreq=54, maxDocs=44421)
                0.0625 = fieldNorm(doc=2820)
        0.28 = coord(7/25)
    
  5. Aksoy, C.; Can, F.; Kocberber, S.: Novelty detection for topic tracking (2012) 0.22
    0.21808319 = sum of:
      0.21808319 = product of:
        0.90867996 = sum of:
          0.009509346 = weight(abstract_txt:different in 1051) [ClassicSimilarity], result of:
            0.009509346 = score(doc=1051,freq=1.0), product of:
              0.04157395 = queryWeight, product of:
                1.1107805 = boost
                3.6597328 = idf(docFreq=3107, maxDocs=44421)
                0.010226891 = queryNorm
              0.2287333 = fieldWeight in 1051, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6597328 = idf(docFreq=3107, maxDocs=44421)
                0.0625 = fieldNorm(doc=1051)
          0.035393063 = weight(abstract_txt:based in 1051) [ClassicSimilarity], result of:
            0.035393063 = score(doc=1051,freq=8.0), product of:
              0.062899366 = queryWeight, product of:
                1.9322163 = boost
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.010226891 = queryNorm
              0.5626935 = fieldWeight in 1051, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.0625 = fieldNorm(doc=1051)
          0.02873166 = weight(abstract_txt:approach in 1051) [ClassicSimilarity], result of:
            0.02873166 = score(doc=1051,freq=2.0), product of:
              0.086888306 = queryWeight, product of:
                2.27098 = boost
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.010226891 = queryNorm
              0.33067352 = fieldWeight in 1051, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.0625 = fieldNorm(doc=1051)
          0.009610015 = weight(abstract_txt:information in 1051) [ClassicSimilarity], result of:
            0.009610015 = score(doc=1051,freq=1.0), product of:
              0.06356619 = queryWeight, product of:
                2.5695953 = boost
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.010226891 = queryNorm
              0.15118122 = fieldWeight in 1051, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.0625 = fieldNorm(doc=1051)
          0.21322307 = weight(abstract_txt:detection in 1051) [ClassicSimilarity], result of:
            0.21322307 = score(doc=1051,freq=2.0), product of:
              0.35610682 = queryWeight, product of:
                5.1401734 = boost
                6.774214 = idf(docFreq=137, maxDocs=44421)
                0.010226891 = queryNorm
              0.59876156 = fieldWeight in 1051, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.774214 = idf(docFreq=137, maxDocs=44421)
                0.0625 = fieldNorm(doc=1051)
          0.6122128 = weight(abstract_txt:novelty in 1051) [ClassicSimilarity], result of:
            0.6122128 = score(doc=1051,freq=3.0), product of:
              0.7350248 = queryWeight, product of:
                9.341112 = boost
                7.694134 = idf(docFreq=54, maxDocs=44421)
                0.010226891 = queryNorm
              0.8329145 = fieldWeight in 1051, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.694134 = idf(docFreq=54, maxDocs=44421)
                0.0625 = fieldNorm(doc=1051)
        0.24 = coord(6/25)