Document (#21768)

Author
Rorvig, M.
Title
Images of similarity : a visual exploration of optimal similarity metrics and scaling properties of TREC topic-document sets
Source
Journal of the American Society for Information Science. 50(1999) no.8, S.639-651
Year
1999
Abstract
Multiple similarity measures for 5 TREC topic-document sets from the LDC TREC Collection Disk 1 are derived from the full text of documents. Each measure on each set is scaled using SAS MDS under ordinal, interval, and MLE assumptions. The resulting 75 permutations are ploted. It is suggested that cosine-vector and overlap measures for similarity appear to recover optimal data relationships among the documents of the 5 sets. MLE assumptions appear to be required to model the data adequately

Similar documents (author)

  1. Rorvig, M.E.: ¬A method for automatically abstracting visual documents (1993) 5.66
    5.664006 = sum of:
      5.664006 = weight(author_txt:rorvig in 2722) [ClassicSimilarity], result of:
        5.664006 = fieldWeight in 2722, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.06241 = idf(docFreq=13, maxDocs=44421)
          0.625 = fieldNorm(doc=2722)
    
  2. Rorvig, M.E.: Image information retrieval (1987) 5.66
    5.664006 = sum of:
      5.664006 = weight(author_txt:rorvig in 5639) [ClassicSimilarity], result of:
        5.664006 = fieldWeight in 5639, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.06241 = idf(docFreq=13, maxDocs=44421)
          0.625 = fieldNorm(doc=5639)
    
  3. Rorvig, M.E.: ¬The bibliographic control of microcomputer software (1988) 5.66
    5.664006 = sum of:
      5.664006 = weight(author_txt:rorvig in 1343) [ClassicSimilarity], result of:
        5.664006 = fieldWeight in 1343, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.06241 = idf(docFreq=13, maxDocs=44421)
          0.625 = fieldNorm(doc=1343)
    
  4. Rorvig, M.E.: Psychometric measurement and information retrieval (1989) 5.66
    5.664006 = sum of:
      5.664006 = weight(author_txt:rorvig in 333) [ClassicSimilarity], result of:
        5.664006 = fieldWeight in 333, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.06241 = idf(docFreq=13, maxDocs=44421)
          0.625 = fieldNorm(doc=333)
    
  5. Rorvig, M.: Scaled structure in visualized TREC data and query feedback (1998) 5.66
    5.664006 = sum of:
      5.664006 = weight(author_txt:rorvig in 4269) [ClassicSimilarity], result of:
        5.664006 = fieldWeight in 4269, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.06241 = idf(docFreq=13, maxDocs=44421)
          0.625 = fieldNorm(doc=4269)
    

Similar documents (content)

  1. Rorvig, M.: ¬A visual exploration of the orderliness of TREC relevance judgements (1999) 0.34
    0.33656183 = sum of:
      0.33656183 = product of:
        1.2020066 = sum of:
          0.17471305 = weight(abstract_txt:scaling in 4768) [ClassicSimilarity], result of:
            0.17471305 = score(doc=4768,freq=4.0), product of:
              0.15170313 = queryWeight, product of:
                1.1599491 = boost
                7.370734 = idf(docFreq=75, maxDocs=44421)
                0.017743729 = queryNorm
              1.1516773 = fieldWeight in 4768, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.370734 = idf(docFreq=75, maxDocs=44421)
                0.078125 = fieldNorm(doc=4768)
          0.074922204 = weight(abstract_txt:documents in 4768) [ClassicSimilarity], result of:
            0.074922204 = score(doc=4768,freq=6.0), product of:
              0.09495057 = queryWeight, product of:
                1.297793 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.017743729 = queryNorm
              0.7890653 = fieldWeight in 4768, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.078125 = fieldNorm(doc=4768)
          0.17567597 = weight(abstract_txt:scaled in 4768) [ClassicSimilarity], result of:
            0.17567597 = score(doc=4768,freq=1.0), product of:
              0.24169773 = queryWeight, product of:
                1.4641242 = boost
                9.303573 = idf(docFreq=10, maxDocs=44421)
                0.017743729 = queryNorm
              0.7268416 = fieldWeight in 4768, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.303573 = idf(docFreq=10, maxDocs=44421)
                0.078125 = fieldNorm(doc=4768)
          0.12592931 = weight(abstract_txt:topic in 4768) [ClassicSimilarity], result of:
            0.12592931 = score(doc=4768,freq=5.0), product of:
              0.1426381 = queryWeight, product of:
                1.5906492 = boost
                5.053779 = idf(docFreq=770, maxDocs=44421)
                0.017743729 = queryNorm
              0.8828589 = fieldWeight in 4768, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.053779 = idf(docFreq=770, maxDocs=44421)
                0.078125 = fieldNorm(doc=4768)
          0.12881668 = weight(abstract_txt:sets in 4768) [ClassicSimilarity], result of:
            0.12881668 = score(doc=4768,freq=2.0), product of:
              0.22497943 = queryWeight, product of:
                2.4466603 = boost
                5.18232 = idf(docFreq=677, maxDocs=44421)
                0.017743729 = queryNorm
              0.5725709 = fieldWeight in 4768, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.18232 = idf(docFreq=677, maxDocs=44421)
                0.078125 = fieldNorm(doc=4768)
          0.27890155 = weight(abstract_txt:trec in 4768) [ClassicSimilarity], result of:
            0.27890155 = score(doc=4768,freq=2.0), product of:
              0.3765264 = queryWeight, product of:
                3.1651914 = boost
                6.704255 = idf(docFreq=147, maxDocs=44421)
                0.017743729 = queryNorm
              0.7407225 = fieldWeight in 4768, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.704255 = idf(docFreq=147, maxDocs=44421)
                0.078125 = fieldNorm(doc=4768)
          0.24304783 = weight(abstract_txt:similarity in 4768) [ClassicSimilarity], result of:
            0.24304783 = score(doc=4768,freq=2.0), product of:
              0.37809607 = queryWeight, product of:
                3.6624587 = boost
                5.8181453 = idf(docFreq=358, maxDocs=44421)
                0.017743729 = queryNorm
              0.6428203 = fieldWeight in 4768, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.8181453 = idf(docFreq=358, maxDocs=44421)
                0.078125 = fieldNorm(doc=4768)
        0.28 = coord(7/25)
    
  2. Rorvig, M.: Scaled structure in visualized TREC data and query feedback (1998) 0.26
    0.25869456 = sum of:
      0.25869456 = product of:
        0.9239092 = sum of:
          0.055973 = weight(abstract_txt:exploration in 4269) [ClassicSimilarity], result of:
            0.055973 = score(doc=4269,freq=1.0), product of:
              0.112750046 = queryWeight, product of:
                6.35436 = idf(docFreq=209, maxDocs=44421)
                0.017743729 = queryNorm
              0.4964344 = fieldWeight in 4269, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.35436 = idf(docFreq=209, maxDocs=44421)
                0.078125 = fieldNorm(doc=4269)
          0.061173715 = weight(abstract_txt:documents in 4269) [ClassicSimilarity], result of:
            0.061173715 = score(doc=4269,freq=4.0), product of:
              0.09495057 = queryWeight, product of:
                1.297793 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.017743729 = queryNorm
              0.64426905 = fieldWeight in 4269, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.078125 = fieldNorm(doc=4269)
          0.084625326 = weight(abstract_txt:document in 4269) [ClassicSimilarity], result of:
            0.084625326 = score(doc=4269,freq=6.0), product of:
              0.10298108 = queryWeight, product of:
                1.3515601 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.017743729 = queryNorm
              0.821756 = fieldWeight in 4269, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.078125 = fieldNorm(doc=4269)
          0.24844334 = weight(abstract_txt:scaled in 4269) [ClassicSimilarity], result of:
            0.24844334 = score(doc=4269,freq=2.0), product of:
              0.24169773 = queryWeight, product of:
                1.4641242 = boost
                9.303573 = idf(docFreq=10, maxDocs=44421)
                0.017743729 = queryNorm
              1.0279093 = fieldWeight in 4269, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.303573 = idf(docFreq=10, maxDocs=44421)
                0.078125 = fieldNorm(doc=4269)
          0.097544424 = weight(abstract_txt:topic in 4269) [ClassicSimilarity], result of:
            0.097544424 = score(doc=4269,freq=3.0), product of:
              0.1426381 = queryWeight, product of:
                1.5906492 = boost
                5.053779 = idf(docFreq=770, maxDocs=44421)
                0.017743729 = queryNorm
              0.6838595 = fieldWeight in 4269, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.053779 = idf(docFreq=770, maxDocs=44421)
                0.078125 = fieldNorm(doc=4269)
          0.097247854 = weight(abstract_txt:appear in 4269) [ClassicSimilarity], result of:
            0.097247854 = score(doc=4269,freq=1.0), product of:
              0.20530255 = queryWeight, product of:
                1.9083315 = boost
                6.0631127 = idf(docFreq=280, maxDocs=44421)
                0.017743729 = queryNorm
              0.47368068 = fieldWeight in 4269, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0631127 = idf(docFreq=280, maxDocs=44421)
                0.078125 = fieldNorm(doc=4269)
          0.27890155 = weight(abstract_txt:trec in 4269) [ClassicSimilarity], result of:
            0.27890155 = score(doc=4269,freq=2.0), product of:
              0.3765264 = queryWeight, product of:
                3.1651914 = boost
                6.704255 = idf(docFreq=147, maxDocs=44421)
                0.017743729 = queryNorm
              0.7407225 = fieldWeight in 4269, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.704255 = idf(docFreq=147, maxDocs=44421)
                0.078125 = fieldNorm(doc=4269)
        0.28 = coord(7/25)
    
  3. Egghe, L.: Good properties of similarity measures and their complementarity (2010) 0.19
    0.19323562 = sum of:
      0.19323562 = product of:
        0.96617806 = sum of:
          0.085498355 = weight(abstract_txt:vector in 980) [ClassicSimilarity], result of:
            0.085498355 = score(doc=980,freq=2.0), product of:
              0.11869329 = queryWeight, product of:
                1.0260174 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.017743729 = queryNorm
              0.7203301 = fieldWeight in 980, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.078125 = fieldNorm(doc=980)
          0.12673022 = weight(abstract_txt:overlap in 980) [ClassicSimilarity], result of:
            0.12673022 = score(doc=980,freq=3.0), product of:
              0.13479611 = queryWeight, product of:
                1.0934031 = boost
                6.9478774 = idf(docFreq=115, maxDocs=44421)
                0.017743729 = queryNorm
              0.94016224 = fieldWeight in 980, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.9478774 = idf(docFreq=115, maxDocs=44421)
                0.078125 = fieldNorm(doc=980)
          0.17722572 = weight(abstract_txt:cosine in 980) [ClassicSimilarity], result of:
            0.17722572 = score(doc=980,freq=3.0), product of:
              0.16856799 = queryWeight, product of:
                1.2227261 = boost
                7.769642 = idf(docFreq=50, maxDocs=44421)
                0.017743729 = queryNorm
              1.0513605 = fieldWeight in 980, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.769642 = idf(docFreq=50, maxDocs=44421)
                0.078125 = fieldNorm(doc=980)
          0.15575252 = weight(abstract_txt:measures in 980) [ClassicSimilarity], result of:
            0.15575252 = score(doc=980,freq=5.0), product of:
              0.16435178 = queryWeight, product of:
                1.7074337 = boost
                5.424824 = idf(docFreq=531, maxDocs=44421)
                0.017743729 = queryNorm
              0.9476778 = fieldWeight in 980, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.424824 = idf(docFreq=531, maxDocs=44421)
                0.078125 = fieldNorm(doc=980)
          0.4209712 = weight(abstract_txt:similarity in 980) [ClassicSimilarity], result of:
            0.4209712 = score(doc=980,freq=6.0), product of:
              0.37809607 = queryWeight, product of:
                3.6624587 = boost
                5.8181453 = idf(docFreq=358, maxDocs=44421)
                0.017743729 = queryNorm
              1.1133975 = fieldWeight in 980, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.8181453 = idf(docFreq=358, maxDocs=44421)
                0.078125 = fieldNorm(doc=980)
        0.2 = coord(5/25)
    
  4. Huang, L.; Milne, D.; Frank, E.; Witten, I.H.: Learning a concept-based document similarity measure (2012) 0.18
    0.18161474 = sum of:
      0.18161474 = product of:
        0.64862406 = sum of:
          0.07316773 = weight(abstract_txt:overlap in 1372) [ClassicSimilarity], result of:
            0.07316773 = score(doc=1372,freq=1.0), product of:
              0.13479611 = queryWeight, product of:
                1.0934031 = boost
                6.9478774 = idf(docFreq=115, maxDocs=44421)
                0.017743729 = queryNorm
              0.54280293 = fieldWeight in 1372, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9478774 = idf(docFreq=115, maxDocs=44421)
                0.078125 = fieldNorm(doc=1372)
          0.030462174 = weight(abstract_txt:each in 1372) [ClassicSimilarity], result of:
            0.030462174 = score(doc=1372,freq=1.0), product of:
              0.09469236 = queryWeight, product of:
                1.2960272 = boost
                4.1177115 = idf(docFreq=1965, maxDocs=44421)
                0.017743729 = queryNorm
              0.32169622 = fieldWeight in 1372, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1177115 = idf(docFreq=1965, maxDocs=44421)
                0.078125 = fieldNorm(doc=1372)
          0.043256346 = weight(abstract_txt:documents in 1372) [ClassicSimilarity], result of:
            0.043256346 = score(doc=1372,freq=2.0), product of:
              0.09495057 = queryWeight, product of:
                1.297793 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.017743729 = queryNorm
              0.455567 = fieldWeight in 1372, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.078125 = fieldNorm(doc=1372)
          0.06909628 = weight(abstract_txt:document in 1372) [ClassicSimilarity], result of:
            0.06909628 = score(doc=1372,freq=4.0), product of:
              0.10298108 = queryWeight, product of:
                1.3515601 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.017743729 = queryNorm
              0.6709609 = fieldWeight in 1372, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.078125 = fieldNorm(doc=1372)
          0.09850655 = weight(abstract_txt:measures in 1372) [ClassicSimilarity], result of:
            0.09850655 = score(doc=1372,freq=2.0), product of:
              0.16435178 = queryWeight, product of:
                1.7074337 = boost
                5.424824 = idf(docFreq=531, maxDocs=44421)
                0.017743729 = queryNorm
              0.59936404 = fieldWeight in 1372, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.424824 = idf(docFreq=531, maxDocs=44421)
                0.078125 = fieldNorm(doc=1372)
          0.09108714 = weight(abstract_txt:sets in 1372) [ClassicSimilarity], result of:
            0.09108714 = score(doc=1372,freq=1.0), product of:
              0.22497943 = queryWeight, product of:
                2.4466603 = boost
                5.18232 = idf(docFreq=677, maxDocs=44421)
                0.017743729 = queryNorm
              0.40486875 = fieldWeight in 1372, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.18232 = idf(docFreq=677, maxDocs=44421)
                0.078125 = fieldNorm(doc=1372)
          0.24304783 = weight(abstract_txt:similarity in 1372) [ClassicSimilarity], result of:
            0.24304783 = score(doc=1372,freq=2.0), product of:
              0.37809607 = queryWeight, product of:
                3.6624587 = boost
                5.8181453 = idf(docFreq=358, maxDocs=44421)
                0.017743729 = queryNorm
              0.6428203 = fieldWeight in 1372, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.8181453 = idf(docFreq=358, maxDocs=44421)
                0.078125 = fieldNorm(doc=1372)
        0.28 = coord(7/25)
    
  5. Chen, T.T.: ¬The congruity between linkage-based factors and content-based clusters : an experimental study using multiple document corpora (2016) 0.18
    0.17808986 = sum of:
      0.17808986 = product of:
        0.55653083 = sum of:
          0.048365172 = weight(abstract_txt:vector in 3775) [ClassicSimilarity], result of:
            0.048365172 = score(doc=3775,freq=1.0), product of:
              0.11869329 = queryWeight, product of:
                1.0260174 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.017743729 = queryNorm
              0.40748024 = fieldWeight in 3775, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.0625 = fieldNorm(doc=3775)
          0.012891516 = weight(abstract_txt:data in 3775) [ClassicSimilarity], result of:
            0.012891516 = score(doc=3775,freq=1.0), product of:
              0.061937023 = queryWeight, product of:
                1.0481702 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.017743729 = queryNorm
              0.20813909 = fieldWeight in 3775, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.0625 = fieldNorm(doc=3775)
          0.081857055 = weight(abstract_txt:cosine in 3775) [ClassicSimilarity], result of:
            0.081857055 = score(doc=3775,freq=1.0), product of:
              0.16856799 = queryWeight, product of:
                1.2227261 = boost
                7.769642 = idf(docFreq=50, maxDocs=44421)
                0.017743729 = queryNorm
              0.48560262 = fieldWeight in 3775, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.769642 = idf(docFreq=50, maxDocs=44421)
                0.0625 = fieldNorm(doc=3775)
          0.024369739 = weight(abstract_txt:each in 3775) [ClassicSimilarity], result of:
            0.024369739 = score(doc=3775,freq=1.0), product of:
              0.09469236 = queryWeight, product of:
                1.2960272 = boost
                4.1177115 = idf(docFreq=1965, maxDocs=44421)
                0.017743729 = queryNorm
              0.25735697 = fieldWeight in 3775, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1177115 = idf(docFreq=1965, maxDocs=44421)
                0.0625 = fieldNorm(doc=3775)
          0.05993776 = weight(abstract_txt:documents in 3775) [ClassicSimilarity], result of:
            0.05993776 = score(doc=3775,freq=6.0), product of:
              0.09495057 = queryWeight, product of:
                1.297793 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.017743729 = queryNorm
              0.6312522 = fieldWeight in 3775, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.0625 = fieldNorm(doc=3775)
          0.06180159 = weight(abstract_txt:document in 3775) [ClassicSimilarity], result of:
            0.06180159 = score(doc=3775,freq=5.0), product of:
              0.10298108 = queryWeight, product of:
                1.3515601 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.017743729 = queryNorm
              0.6001257 = fieldWeight in 3775, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.0625 = fieldNorm(doc=3775)
          0.07286971 = weight(abstract_txt:sets in 3775) [ClassicSimilarity], result of:
            0.07286971 = score(doc=3775,freq=1.0), product of:
              0.22497943 = queryWeight, product of:
                2.4466603 = boost
                5.18232 = idf(docFreq=677, maxDocs=44421)
                0.017743729 = queryNorm
              0.323895 = fieldWeight in 3775, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.18232 = idf(docFreq=677, maxDocs=44421)
                0.0625 = fieldNorm(doc=3775)
          0.19443826 = weight(abstract_txt:similarity in 3775) [ClassicSimilarity], result of:
            0.19443826 = score(doc=3775,freq=2.0), product of:
              0.37809607 = queryWeight, product of:
                3.6624587 = boost
                5.8181453 = idf(docFreq=358, maxDocs=44421)
                0.017743729 = queryNorm
              0.51425624 = fieldWeight in 3775, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.8181453 = idf(docFreq=358, maxDocs=44421)
                0.0625 = fieldNorm(doc=3775)
        0.32 = coord(8/25)