Document (#20672)

Author
Broder, A.Z.
Title
Syntactic clustering of the Web
Source
Computer networks and ISDN systems. 29(1997) no.8, S.1157-1166
Year
1997
Abstract
Develops an efficient way to determine the syntactic similarity of files and applies it to every document on the WWW. Using this mechanism, builds a clustering of all the documents that are syntactically similar. Possible applications include a lost and found service, filtering the results of web searches, updating widely distributed web-pages, and identifying violations of intellectual property rights
Footnote
Contribution to a special issue of papers from the 6th International World Wide Web conference, held 7-11 Apr 1997, Santa Clara, California
Theme
Internet

Similar documents (content)

  1. Salton, G.: Fast document classification in automatic information retrieval (1978) 0.09
    0.089647956 = sum of:
      0.089647956 = product of:
        0.56029975 = sum of:
          0.06341222 = weight(abstract_txt:similar in 2330) [ClassicSimilarity], result of:
            0.06341222 = score(doc=2330,freq=1.0), product of:
              0.1113612 = queryWeight, product of:
                5.206202 = idf(docFreq=661, maxDocs=44421)
                0.021390103 = queryNorm
              0.5694283 = fieldWeight in 2330, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.206202 = idf(docFreq=661, maxDocs=44421)
                0.109375 = fieldNorm(doc=2330)
          0.11925537 = weight(abstract_txt:files in 2330) [ClassicSimilarity], result of:
            0.11925537 = score(doc=2330,freq=2.0), product of:
              0.1346668 = queryWeight, product of:
                1.0996724 = boost
                5.7251167 = idf(docFreq=393, maxDocs=44421)
                0.021390103 = queryNorm
              0.8855588 = fieldWeight in 2330, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.7251167 = idf(docFreq=393, maxDocs=44421)
                0.109375 = fieldNorm(doc=2330)
          0.16126838 = weight(abstract_txt:updating in 2330) [ClassicSimilarity], result of:
            0.16126838 = score(doc=2330,freq=1.0), product of:
              0.20748405 = queryWeight, product of:
                1.3649772 = boost
                7.1063476 = idf(docFreq=98, maxDocs=44421)
                0.021390103 = queryNorm
              0.7772568 = fieldWeight in 2330, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.1063476 = idf(docFreq=98, maxDocs=44421)
                0.109375 = fieldNorm(doc=2330)
          0.21636379 = weight(abstract_txt:clustering in 2330) [ClassicSimilarity], result of:
            0.21636379 = score(doc=2330,freq=1.0), product of:
              0.31799352 = queryWeight, product of:
                2.389776 = boost
                6.2208285 = idf(docFreq=239, maxDocs=44421)
                0.021390103 = queryNorm
              0.6804031 = fieldWeight in 2330, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2208285 = idf(docFreq=239, maxDocs=44421)
                0.109375 = fieldNorm(doc=2330)
        0.16 = coord(4/25)
    
  2. Corridoni, J.M.; Bimbo, A. del; Vicario, E.: Image retrieval by color semantics with incomplete knowledge (1998) 0.08
    0.083344035 = sum of:
      0.083344035 = product of:
        0.41672018 = sum of:
          0.04523226 = weight(abstract_txt:widely in 1594) [ClassicSimilarity], result of:
            0.04523226 = score(doc=1594,freq=1.0), product of:
              0.12910493 = queryWeight, product of:
                1.0767242 = boost
                5.6056433 = idf(docFreq=443, maxDocs=44421)
                0.021390103 = queryNorm
              0.3503527 = fieldWeight in 1594, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6056433 = idf(docFreq=443, maxDocs=44421)
                0.0625 = fieldNorm(doc=1594)
          0.07152215 = weight(abstract_txt:similarity in 1594) [ClassicSimilarity], result of:
            0.07152215 = score(doc=1594,freq=2.0), product of:
              0.13907881 = queryWeight, product of:
                1.1175412 = boost
                5.8181453 = idf(docFreq=358, maxDocs=44421)
                0.021390103 = queryNorm
              0.51425624 = fieldWeight in 1594, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.8181453 = idf(docFreq=358, maxDocs=44421)
                0.0625 = fieldNorm(doc=1594)
          0.0717206 = weight(abstract_txt:filtering in 1594) [ClassicSimilarity], result of:
            0.0717206 = score(doc=1594,freq=1.0), product of:
              0.17555231 = queryWeight, product of:
                1.2555567 = boost
                6.5366817 = idf(docFreq=174, maxDocs=44421)
                0.021390103 = queryNorm
              0.4085426 = fieldWeight in 1594, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5366817 = idf(docFreq=174, maxDocs=44421)
                0.0625 = fieldNorm(doc=1594)
          0.085178785 = weight(abstract_txt:applies in 1594) [ClassicSimilarity], result of:
            0.085178785 = score(doc=1594,freq=1.0), product of:
              0.19687848 = queryWeight, product of:
                1.3296342 = boost
                6.922344 = idf(docFreq=118, maxDocs=44421)
                0.021390103 = queryNorm
              0.4326465 = fieldWeight in 1594, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.922344 = idf(docFreq=118, maxDocs=44421)
                0.0625 = fieldNorm(doc=1594)
          0.14306638 = weight(abstract_txt:syntactic in 1594) [ClassicSimilarity], result of:
            0.14306638 = score(doc=1594,freq=1.0), product of:
              0.35049272 = queryWeight, product of:
                2.5089242 = boost
                6.5309834 = idf(docFreq=175, maxDocs=44421)
                0.021390103 = queryNorm
              0.40818647 = fieldWeight in 1594, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5309834 = idf(docFreq=175, maxDocs=44421)
                0.0625 = fieldNorm(doc=1594)
        0.2 = coord(5/25)
    
  3. Thompson, N.J.: Intellectual property materials online/CD-ROM : what and where (1992) 0.07
    0.07091497 = sum of:
      0.07091497 = product of:
        0.59095806 = sum of:
          0.15588039 = weight(abstract_txt:intellectual in 578) [ClassicSimilarity], result of:
            0.15588039 = score(doc=578,freq=4.0), product of:
              0.12777902 = queryWeight, product of:
                1.0711809 = boost
                5.576784 = idf(docFreq=456, maxDocs=44421)
                0.021390103 = queryNorm
              1.2199216 = fieldWeight in 578, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.576784 = idf(docFreq=456, maxDocs=44421)
                0.109375 = fieldNorm(doc=578)
          0.24528873 = weight(abstract_txt:property in 578) [ClassicSimilarity], result of:
            0.24528873 = score(doc=578,freq=4.0), product of:
              0.17286894 = queryWeight, product of:
                1.245924 = boost
                6.4865317 = idf(docFreq=183, maxDocs=44421)
                0.021390103 = queryNorm
              1.4189289 = fieldWeight in 578, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.4865317 = idf(docFreq=183, maxDocs=44421)
                0.109375 = fieldNorm(doc=578)
          0.18978894 = weight(abstract_txt:rights in 578) [ClassicSimilarity], result of:
            0.18978894 = score(doc=578,freq=2.0), product of:
              0.1835647 = queryWeight, product of:
                1.2838894 = boost
                6.684188 = idf(docFreq=150, maxDocs=44421)
                0.021390103 = queryNorm
              1.0339077 = fieldWeight in 578, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.684188 = idf(docFreq=150, maxDocs=44421)
                0.109375 = fieldNorm(doc=578)
        0.12 = coord(3/25)
    
  4. Seadle, M.: Copyright in a networked world : ethics and infringement (2004) 0.07
    0.067160755 = sum of:
      0.067160755 = product of:
        0.55967295 = sum of:
          0.12597036 = weight(abstract_txt:intellectual in 3833) [ClassicSimilarity], result of:
            0.12597036 = score(doc=3833,freq=2.0), product of:
              0.12777902 = queryWeight, product of:
                1.0711809 = boost
                5.576784 = idf(docFreq=456, maxDocs=44421)
                0.021390103 = queryNorm
              0.98584545 = fieldWeight in 3833, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.576784 = idf(docFreq=456, maxDocs=44421)
                0.125 = fieldNorm(doc=3833)
          0.28032997 = weight(abstract_txt:property in 3833) [ClassicSimilarity], result of:
            0.28032997 = score(doc=3833,freq=4.0), product of:
              0.17286894 = queryWeight, product of:
                1.245924 = boost
                6.4865317 = idf(docFreq=183, maxDocs=44421)
                0.021390103 = queryNorm
              1.6216329 = fieldWeight in 3833, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.4865317 = idf(docFreq=183, maxDocs=44421)
                0.125 = fieldNorm(doc=3833)
          0.15337262 = weight(abstract_txt:rights in 3833) [ClassicSimilarity], result of:
            0.15337262 = score(doc=3833,freq=1.0), product of:
              0.1835647 = queryWeight, product of:
                1.2838894 = boost
                6.684188 = idf(docFreq=150, maxDocs=44421)
                0.021390103 = queryNorm
              0.8355235 = fieldWeight in 3833, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.684188 = idf(docFreq=150, maxDocs=44421)
                0.125 = fieldNorm(doc=3833)
        0.12 = coord(3/25)
    
  5. Kaptelinin, V.: Distribution of cognition between minds and artifacts : augmentation of mediation? (1996) 0.06
    0.064395264 = sum of:
      0.064395264 = product of:
        0.4024704 = sum of:
          0.08022394 = weight(abstract_txt:identifying in 2261) [ClassicSimilarity], result of:
            0.08022394 = score(doc=2261,freq=1.0), product of:
              0.13026305 = queryWeight, product of:
                1.0815427 = boost
                5.6307297 = idf(docFreq=432, maxDocs=44421)
                0.021390103 = queryNorm
              0.61586106 = fieldWeight in 2261, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6307297 = idf(docFreq=432, maxDocs=44421)
                0.109375 = fieldNorm(doc=2261)
          0.086171456 = weight(abstract_txt:distributed in 2261) [ClassicSimilarity], result of:
            0.086171456 = score(doc=2261,freq=1.0), product of:
              0.13662417 = queryWeight, product of:
                1.1076354 = boost
                5.7665734 = idf(docFreq=377, maxDocs=44421)
                0.021390103 = queryNorm
              0.63071895 = fieldWeight in 2261, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7665734 = idf(docFreq=377, maxDocs=44421)
                0.109375 = fieldNorm(doc=2261)
          0.08701213 = weight(abstract_txt:efficient in 2261) [ClassicSimilarity], result of:
            0.08701213 = score(doc=2261,freq=1.0), product of:
              0.13751131 = queryWeight, product of:
                1.1112257 = boost
                5.7852654 = idf(docFreq=370, maxDocs=44421)
                0.021390103 = queryNorm
              0.6327634 = fieldWeight in 2261, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7852654 = idf(docFreq=370, maxDocs=44421)
                0.109375 = fieldNorm(doc=2261)
          0.14906287 = weight(abstract_txt:applies in 2261) [ClassicSimilarity], result of:
            0.14906287 = score(doc=2261,freq=1.0), product of:
              0.19687848 = queryWeight, product of:
                1.3296342 = boost
                6.922344 = idf(docFreq=118, maxDocs=44421)
                0.021390103 = queryNorm
              0.7571314 = fieldWeight in 2261, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.922344 = idf(docFreq=118, maxDocs=44421)
                0.109375 = fieldNorm(doc=2261)
        0.16 = coord(4/25)