Document (#20672)

Broder, A.Z.
Syntactic clustering of the Web
Computer networks and ISDN systems. 29(1997) no.8, S.1157-1166
Develops an efficient way to determine the syntactic similarity of files and applies it to every document on the WWW. Using this mechanism, builds a clustering of all the documents that are syntactically similar. Possible applications include a lost and found service, filtering the results of web searches, updating widely distributed web-pages, and identifying violations of intellectual property rights
Contribution to a special issue of papers from the 6th International World Wide Web conference, held 7-11 Apr 1997, Santa Clara, California

Similar documents (content)

  1. Salton, G.: Fast document classification in automatic information retrieval (1978) 0.09
    0.08945779 = sum of:
      0.08945779 = product of:
        0.5591112 = sum of:
          0.063467555 = weight(abstract_txt:similar in 2331) [ClassicSimilarity], result of:
            0.063467555 = score(doc=2331,freq=1.0), product of:
              0.11136159 = queryWeight, product of:
                5.2107263 = idf(docFreq=655, maxDocs=44218)
                0.021371607 = queryNorm
              0.56992316 = fieldWeight in 2331, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2107263 = idf(docFreq=655, maxDocs=44218)
                0.109375 = fieldNorm(doc=2331)
          0.11876328 = weight(abstract_txt:files in 2331) [ClassicSimilarity], result of:
            0.11876328 = score(doc=2331,freq=2.0), product of:
              0.1342185 = queryWeight, product of:
                1.0978385 = boost
                5.720536 = idf(docFreq=393, maxDocs=44218)
                0.021371607 = queryNorm
              0.8848503 = fieldWeight in 2331, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.720536 = idf(docFreq=393, maxDocs=44218)
                0.109375 = fieldNorm(doc=2331)
          0.16136804 = weight(abstract_txt:updating in 2331) [ClassicSimilarity], result of:
            0.16136804 = score(doc=2331,freq=1.0), product of:
              0.2074496 = queryWeight, product of:
                1.3648615 = boost
                7.11192 = idf(docFreq=97, maxDocs=44218)
                0.021371607 = queryNorm
              0.77786624 = fieldWeight in 2331, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.11192 = idf(docFreq=97, maxDocs=44218)
                0.109375 = fieldNorm(doc=2331)
          0.2155123 = weight(abstract_txt:clustering in 2331) [ClassicSimilarity], result of:
            0.2155123 = score(doc=2331,freq=1.0), product of:
              0.31697544 = queryWeight, product of:
                2.3859432 = boost
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.021371607 = queryNorm
              0.6799022 = fieldWeight in 2331, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.109375 = fieldNorm(doc=2331)
        0.16 = coord(4/25)
  2. Corridoni, J.M.; Bimbo, A. del; Vicario, E.: Image retrieval by color semantics with incomplete knowledge (1998) 0.08
    0.083131485 = sum of:
      0.083131485 = product of:
        0.4156574 = sum of:
          0.04520707 = weight(abstract_txt:widely in 594) [ClassicSimilarity], result of:
            0.04520707 = score(doc=594,freq=1.0), product of:
              0.12898242 = queryWeight, product of:
                1.0762113 = boost
                5.6078424 = idf(docFreq=440, maxDocs=44218)
                0.021371607 = queryNorm
              0.35049015 = fieldWeight in 594, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6078424 = idf(docFreq=440, maxDocs=44218)
                0.0625 = fieldNorm(doc=594)
          0.0714353 = weight(abstract_txt:similarity in 594) [ClassicSimilarity], result of:
            0.0714353 = score(doc=594,freq=2.0), product of:
              0.13888592 = queryWeight, product of:
                1.116764 = boost
                5.8191514 = idf(docFreq=356, maxDocs=44218)
                0.021371607 = queryNorm
              0.51434517 = fieldWeight in 594, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.8191514 = idf(docFreq=356, maxDocs=44218)
                0.0625 = fieldNorm(doc=594)
          0.07163416 = weight(abstract_txt:filtering in 594) [ClassicSimilarity], result of:
            0.07163416 = score(doc=594,freq=1.0), product of:
              0.17530988 = queryWeight, product of:
                1.2546872 = boost
                6.537832 = idf(docFreq=173, maxDocs=44218)
                0.021371607 = queryNorm
              0.4086145 = fieldWeight in 594, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.537832 = idf(docFreq=173, maxDocs=44218)
                0.0625 = fieldNorm(doc=594)
          0.08486255 = weight(abstract_txt:applies in 594) [ClassicSimilarity], result of:
            0.08486255 = score(doc=594,freq=1.0), product of:
              0.19627742 = queryWeight, product of:
                1.3276007 = boost
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.021371607 = queryNorm
              0.43236023 = fieldWeight in 594, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.0625 = fieldNorm(doc=594)
          0.14251831 = weight(abstract_txt:syntactic in 594) [ClassicSimilarity], result of:
            0.14251831 = score(doc=594,freq=1.0), product of:
              0.34939504 = queryWeight, product of:
                2.504988 = boost
                6.5264034 = idf(docFreq=175, maxDocs=44218)
                0.021371607 = queryNorm
              0.4079002 = fieldWeight in 594, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5264034 = idf(docFreq=175, maxDocs=44218)
                0.0625 = fieldNorm(doc=594)
        0.2 = coord(5/25)
  3. Thompson, N.J.: Intellectual property materials online/CD-ROM : what and where (1992) 0.07
    0.07074904 = sum of:
      0.07074904 = product of:
        0.5895753 = sum of:
          0.15559405 = weight(abstract_txt:intellectual in 578) [ClassicSimilarity], result of:
            0.15559405 = score(doc=578,freq=4.0), product of:
              0.12754877 = queryWeight, product of:
                1.0702136 = boost
                5.5765896 = idf(docFreq=454, maxDocs=44218)
                0.021371607 = queryNorm
              1.2198789 = fieldWeight in 578, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.5765896 = idf(docFreq=454, maxDocs=44218)
                0.109375 = fieldNorm(doc=578)
          0.24434546 = weight(abstract_txt:property in 578) [ClassicSimilarity], result of:
            0.24434546 = score(doc=578,freq=4.0), product of:
              0.17232586 = queryWeight, product of:
                1.2439631 = boost
                6.481951 = idf(docFreq=183, maxDocs=44218)
                0.021371607 = queryNorm
              1.4179268 = fieldWeight in 578, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.481951 = idf(docFreq=183, maxDocs=44218)
                0.109375 = fieldNorm(doc=578)
          0.18963575 = weight(abstract_txt:rights in 578) [ClassicSimilarity], result of:
            0.18963575 = score(doc=578,freq=2.0), product of:
              0.1833599 = queryWeight, product of:
                1.2831708 = boost
                6.686252 = idf(docFreq=149, maxDocs=44218)
                0.021371607 = queryNorm
              1.0342269 = fieldWeight in 578, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.686252 = idf(docFreq=149, maxDocs=44218)
                0.109375 = fieldNorm(doc=578)
        0.12 = coord(3/25)
  4. Seadle, M.: Copyright in a networked world : ethics and infringement (2004) 0.07
    0.06698877 = sum of:
      0.06698877 = product of:
        0.5582398 = sum of:
          0.125739 = weight(abstract_txt:intellectual in 2833) [ClassicSimilarity], result of:
            0.125739 = score(doc=2833,freq=2.0), product of:
              0.12754877 = queryWeight, product of:
                1.0702136 = boost
                5.5765896 = idf(docFreq=454, maxDocs=44218)
                0.021371607 = queryNorm
              0.98581105 = fieldWeight in 2833, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.5765896 = idf(docFreq=454, maxDocs=44218)
                0.125 = fieldNorm(doc=2833)
          0.27925196 = weight(abstract_txt:property in 2833) [ClassicSimilarity], result of:
            0.27925196 = score(doc=2833,freq=4.0), product of:
              0.17232586 = queryWeight, product of:
                1.2439631 = boost
                6.481951 = idf(docFreq=183, maxDocs=44218)
                0.021371607 = queryNorm
              1.6204878 = fieldWeight in 2833, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.481951 = idf(docFreq=183, maxDocs=44218)
                0.125 = fieldNorm(doc=2833)
          0.15324882 = weight(abstract_txt:rights in 2833) [ClassicSimilarity], result of:
            0.15324882 = score(doc=2833,freq=1.0), product of:
              0.1833599 = queryWeight, product of:
                1.2831708 = boost
                6.686252 = idf(docFreq=149, maxDocs=44218)
                0.021371607 = queryNorm
              0.8357815 = fieldWeight in 2833, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.686252 = idf(docFreq=149, maxDocs=44218)
                0.125 = fieldNorm(doc=2833)
        0.12 = coord(3/25)
  5. Kaptelinin, V.: Distribution of cognition between minds and artifacts : augmentation of mediation? (1996) 0.06
    0.06427376 = sum of:
      0.06427376 = product of:
        0.401711 = sum of:
          0.0804856 = weight(abstract_txt:identifying in 1261) [ClassicSimilarity], result of:
            0.0804856 = score(doc=1261,freq=1.0), product of:
              0.13047071 = queryWeight, product of:
                1.0824026 = boost
                5.6401033 = idf(docFreq=426, maxDocs=44218)
                0.021371607 = queryNorm
              0.6168863 = fieldWeight in 1261, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6401033 = idf(docFreq=426, maxDocs=44218)
                0.109375 = fieldNorm(doc=1261)
          0.08581735 = weight(abstract_txt:distributed in 1261) [ClassicSimilarity], result of:
            0.08581735 = score(doc=1261,freq=1.0), product of:
              0.13617091 = queryWeight, product of:
                1.1057945 = boost
                5.761993 = idf(docFreq=377, maxDocs=44218)
                0.021371607 = queryNorm
              0.63021797 = fieldWeight in 1261, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.761993 = idf(docFreq=377, maxDocs=44218)
                0.109375 = fieldNorm(doc=1261)
          0.086898565 = weight(abstract_txt:efficient in 1261) [ClassicSimilarity], result of:
            0.086898565 = score(doc=1261,freq=1.0), product of:
              0.13731226 = queryWeight, product of:
                1.1104192 = boost
                5.7860904 = idf(docFreq=368, maxDocs=44218)
                0.021371607 = queryNorm
              0.6328536 = fieldWeight in 1261, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7860904 = idf(docFreq=368, maxDocs=44218)
                0.109375 = fieldNorm(doc=1261)
          0.14850947 = weight(abstract_txt:applies in 1261) [ClassicSimilarity], result of:
            0.14850947 = score(doc=1261,freq=1.0), product of:
              0.19627742 = queryWeight, product of:
                1.3276007 = boost
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.021371607 = queryNorm
              0.7566304 = fieldWeight in 1261, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.109375 = fieldNorm(doc=1261)
        0.16 = coord(4/25)