Document (#21971)

Author
Brin, S.
Title
Extracting patterns and relations from the World Wide Web
Source
The World Wide Web and Databases: International Workshop WebDB'98, Valencia, Spain, March 27-28, 1998, Selected papers. Eds.: P. Atzeni et al
Imprint
Berlin : Springer
Year
1999
Pages
S.172-183
Series
Lecture notes in computer science; vol.1590
Abstract
The WWW is a vast resource for information. At the same time it is extremely distributed. A particular type of data such as restaurant lists may be scattered across thousands of independent information sources in many different formats. In this paper, we consider the problem of extracting a relation for such a data type from all of these sources automatically. We present a technique which exploits the duality between sets of patterns and relations to grow the target relation starting from a small sample. To test our technique we use it to extract a relation of (author, title) pairs from the WWW
Theme
Internet
Object
WWW

Similar documents (content)

  1. Tsuji, K.; Kageura, K.: Automatic generation of Japanese-English bilingual thesauri based on bilingual corpora (2006) 0.17
    0.1683001 = sum of:
      0.1683001 = product of:
        0.60107183 = sum of:
          0.04386141 = weight(abstract_txt:independent in 61) [ClassicSimilarity], result of:
            0.04386141 = score(doc=61,freq=1.0), product of:
              0.12079241 = queryWeight, product of:
                1.0152452 = boost
                5.8098235 = idf(docFreq=361, maxDocs=44421)
                0.02047886 = queryNorm
              0.36311397 = fieldWeight in 61, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8098235 = idf(docFreq=361, maxDocs=44421)
                0.0625 = fieldNorm(doc=61)
          0.09075201 = weight(abstract_txt:extract in 61) [ClassicSimilarity], result of:
            0.09075201 = score(doc=61,freq=2.0), product of:
              0.15567257 = queryWeight, product of:
                1.1525431 = boost
                6.595522 = idf(docFreq=164, maxDocs=44421)
                0.02047886 = queryNorm
              0.5829673 = fieldWeight in 61, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.595522 = idf(docFreq=164, maxDocs=44421)
                0.0625 = fieldNorm(doc=61)
          0.13996042 = weight(abstract_txt:pairs in 61) [ClassicSimilarity], result of:
            0.13996042 = score(doc=61,freq=4.0), product of:
              0.16493066 = queryWeight, product of:
                1.1863198 = boost
                6.7888126 = idf(docFreq=135, maxDocs=44421)
                0.02047886 = queryNorm
              0.8486016 = fieldWeight in 61, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.7888126 = idf(docFreq=135, maxDocs=44421)
                0.0625 = fieldNorm(doc=61)
          0.017909637 = weight(abstract_txt:such in 61) [ClassicSimilarity], result of:
            0.017909637 = score(doc=61,freq=1.0), product of:
              0.08376304 = queryWeight, product of:
                1.1956177 = boost
                3.42101 = idf(docFreq=3945, maxDocs=44421)
                0.02047886 = queryNorm
              0.21381313 = fieldWeight in 61, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.42101 = idf(docFreq=3945, maxDocs=44421)
                0.0625 = fieldNorm(doc=61)
          0.06541713 = weight(abstract_txt:patterns in 61) [ClassicSimilarity], result of:
            0.06541713 = score(doc=61,freq=1.0), product of:
              0.1986654 = queryWeight, product of:
                1.8413113 = boost
                5.2685275 = idf(docFreq=621, maxDocs=44421)
                0.02047886 = queryNorm
              0.32928297 = fieldWeight in 61, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2685275 = idf(docFreq=621, maxDocs=44421)
                0.0625 = fieldNorm(doc=61)
          0.03255801 = weight(abstract_txt:from in 61) [ClassicSimilarity], result of:
            0.03255801 = score(doc=61,freq=3.0), product of:
              0.10899404 = queryWeight, product of:
                1.928779 = boost
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.02047886 = queryNorm
              0.29871368 = fieldWeight in 61, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.0625 = fieldNorm(doc=61)
          0.21061322 = weight(abstract_txt:extracting in 61) [ClassicSimilarity], result of:
            0.21061322 = score(doc=61,freq=2.0), product of:
              0.34380195 = queryWeight, product of:
                2.4222572 = boost
                6.930783 = idf(docFreq=117, maxDocs=44421)
                0.02047886 = queryNorm
              0.61260045 = fieldWeight in 61, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.930783 = idf(docFreq=117, maxDocs=44421)
                0.0625 = fieldNorm(doc=61)
        0.28 = coord(7/25)
    
  2. Collovini de Abreu, S.; Vieira, R.: RelP: Portuguese open relation extraction (2017) 0.16
    0.16179585 = sum of:
      0.16179585 = product of:
        0.57784235 = sum of:
          0.016521318 = weight(abstract_txt:data in 4621) [ClassicSimilarity], result of:
            0.016521318 = score(doc=4621,freq=1.0), product of:
              0.07937633 = queryWeight, product of:
                1.1638892 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.02047886 = queryNorm
              0.20813909 = fieldWeight in 4621, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.0625 = fieldNorm(doc=4621)
          0.017909637 = weight(abstract_txt:such in 4621) [ClassicSimilarity], result of:
            0.017909637 = score(doc=4621,freq=1.0), product of:
              0.08376304 = queryWeight, product of:
                1.1956177 = boost
                3.42101 = idf(docFreq=3945, maxDocs=44421)
                0.02047886 = queryNorm
              0.21381313 = fieldWeight in 4621, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.42101 = idf(docFreq=3945, maxDocs=44421)
                0.0625 = fieldNorm(doc=4621)
          0.047729664 = weight(abstract_txt:sources in 4621) [ClassicSimilarity], result of:
            0.047729664 = score(doc=4621,freq=1.0), product of:
              0.16101024 = queryWeight, product of:
                1.6576501 = boost
                4.743019 = idf(docFreq=1051, maxDocs=44421)
                0.02047886 = queryNorm
              0.2964387 = fieldWeight in 4621, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.743019 = idf(docFreq=1051, maxDocs=44421)
                0.0625 = fieldNorm(doc=4621)
          0.026583504 = weight(abstract_txt:from in 4621) [ClassicSimilarity], result of:
            0.026583504 = score(doc=4621,freq=2.0), product of:
              0.10899404 = queryWeight, product of:
                1.928779 = boost
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.02047886 = queryNorm
              0.2438987 = fieldWeight in 4621, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.0625 = fieldNorm(doc=4621)
          0.07624444 = weight(abstract_txt:relations in 4621) [ClassicSimilarity], result of:
            0.07624444 = score(doc=4621,freq=1.0), product of:
              0.22002229 = queryWeight, product of:
                1.9377576 = boost
                5.5444884 = idf(docFreq=471, maxDocs=44421)
                0.02047886 = queryNorm
              0.34653053 = fieldWeight in 4621, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5444884 = idf(docFreq=471, maxDocs=44421)
                0.0625 = fieldNorm(doc=4621)
          0.078323975 = weight(abstract_txt:technique in 4621) [ClassicSimilarity], result of:
            0.078323975 = score(doc=4621,freq=1.0), product of:
              0.224005 = queryWeight, product of:
                1.955217 = boost
                5.5944448 = idf(docFreq=448, maxDocs=44421)
                0.02047886 = queryNorm
              0.3496528 = fieldWeight in 4621, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5944448 = idf(docFreq=448, maxDocs=44421)
                0.0625 = fieldNorm(doc=4621)
          0.31452984 = weight(abstract_txt:relation in 4621) [ClassicSimilarity], result of:
            0.31452984 = score(doc=4621,freq=9.0), product of:
              0.3114479 = queryWeight, product of:
                2.8236082 = boost
                5.38611 = idf(docFreq=552, maxDocs=44421)
                0.02047886 = queryNorm
              1.0098956 = fieldWeight in 4621, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                5.38611 = idf(docFreq=552, maxDocs=44421)
                0.0625 = fieldNorm(doc=4621)
        0.28 = coord(7/25)
    
  3. Li, J.; Zhang, Z.; Li, X.; Chen, H.: Kernel-based learning for biomedical relation extraction (2008) 0.15
    0.15107985 = sum of:
      0.15107985 = product of:
        0.7553992 = sum of:
          0.0802142 = weight(abstract_txt:extract in 2611) [ClassicSimilarity], result of:
            0.0802142 = score(doc=2611,freq=1.0), product of:
              0.15567257 = queryWeight, product of:
                1.1525431 = boost
                6.595522 = idf(docFreq=164, maxDocs=44421)
                0.02047886 = queryNorm
              0.5152751 = fieldWeight in 2611, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.595522 = idf(docFreq=164, maxDocs=44421)
                0.078125 = fieldNorm(doc=2611)
          0.03322938 = weight(abstract_txt:from in 2611) [ClassicSimilarity], result of:
            0.03322938 = score(doc=2611,freq=2.0), product of:
              0.10899404 = queryWeight, product of:
                1.928779 = boost
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.02047886 = queryNorm
              0.30487338 = fieldWeight in 2611, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.078125 = fieldNorm(doc=2611)
          0.13478239 = weight(abstract_txt:relations in 2611) [ClassicSimilarity], result of:
            0.13478239 = score(doc=2611,freq=2.0), product of:
              0.22002229 = queryWeight, product of:
                1.9377576 = boost
                5.5444884 = idf(docFreq=471, maxDocs=44421)
                0.02047886 = queryNorm
              0.6125852 = fieldWeight in 2611, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.5444884 = idf(docFreq=471, maxDocs=44421)
                0.078125 = fieldNorm(doc=2611)
          0.18615755 = weight(abstract_txt:extracting in 2611) [ClassicSimilarity], result of:
            0.18615755 = score(doc=2611,freq=1.0), product of:
              0.34380195 = queryWeight, product of:
                2.4222572 = boost
                6.930783 = idf(docFreq=117, maxDocs=44421)
                0.02047886 = queryNorm
              0.5414674 = fieldWeight in 2611, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.930783 = idf(docFreq=117, maxDocs=44421)
                0.078125 = fieldNorm(doc=2611)
          0.3210157 = weight(abstract_txt:relation in 2611) [ClassicSimilarity], result of:
            0.3210157 = score(doc=2611,freq=6.0), product of:
              0.3114479 = queryWeight, product of:
                2.8236082 = boost
                5.38611 = idf(docFreq=552, maxDocs=44421)
                0.02047886 = queryNorm
              1.0307204 = fieldWeight in 2611, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.38611 = idf(docFreq=552, maxDocs=44421)
                0.078125 = fieldNorm(doc=2611)
        0.2 = coord(5/25)
    
  4. Blanco, E.; Moldovan, D.: ¬A model for composing semantic relations (2011) 0.15
    0.14872564 = sum of:
      0.14872564 = product of:
        0.7436282 = sum of:
          0.054826766 = weight(abstract_txt:independent in 762) [ClassicSimilarity], result of:
            0.054826766 = score(doc=762,freq=1.0), product of:
              0.12079241 = queryWeight, product of:
                1.0152452 = boost
                5.8098235 = idf(docFreq=361, maxDocs=44421)
                0.02047886 = queryNorm
              0.45389247 = fieldWeight in 762, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8098235 = idf(docFreq=361, maxDocs=44421)
                0.078125 = fieldNorm(doc=762)
          0.023496723 = weight(abstract_txt:from in 762) [ClassicSimilarity], result of:
            0.023496723 = score(doc=762,freq=1.0), product of:
              0.10899404 = queryWeight, product of:
                1.928779 = boost
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.02047886 = queryNorm
              0.21557805 = fieldWeight in 762, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.078125 = fieldNorm(doc=762)
          0.2521548 = weight(abstract_txt:relations in 762) [ClassicSimilarity], result of:
            0.2521548 = score(doc=762,freq=7.0), product of:
              0.22002229 = queryWeight, product of:
                1.9377576 = boost
                5.5444884 = idf(docFreq=471, maxDocs=44421)
                0.02047886 = queryNorm
              1.146042 = fieldWeight in 762, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                5.5444884 = idf(docFreq=471, maxDocs=44421)
                0.078125 = fieldNorm(doc=762)
          0.18615755 = weight(abstract_txt:extracting in 762) [ClassicSimilarity], result of:
            0.18615755 = score(doc=762,freq=1.0), product of:
              0.34380195 = queryWeight, product of:
                2.4222572 = boost
                6.930783 = idf(docFreq=117, maxDocs=44421)
                0.02047886 = queryNorm
              0.5414674 = fieldWeight in 762, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.930783 = idf(docFreq=117, maxDocs=44421)
                0.078125 = fieldNorm(doc=762)
          0.22699237 = weight(abstract_txt:relation in 762) [ClassicSimilarity], result of:
            0.22699237 = score(doc=762,freq=3.0), product of:
              0.3114479 = queryWeight, product of:
                2.8236082 = boost
                5.38611 = idf(docFreq=552, maxDocs=44421)
                0.02047886 = queryNorm
              0.7288294 = fieldWeight in 762, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.38611 = idf(docFreq=552, maxDocs=44421)
                0.078125 = fieldNorm(doc=762)
        0.2 = coord(5/25)
    
  5. Wang, P.; Hao, T.; Yan, J.; Jin, L.: Large-scale extraction of drug-disease pairs from the medical literature (2017) 0.15
    0.14771898 = sum of:
      0.14771898 = product of:
        0.7385949 = sum of:
          0.11114806 = weight(abstract_txt:extract in 4927) [ClassicSimilarity], result of:
            0.11114806 = score(doc=4927,freq=3.0), product of:
              0.15567257 = queryWeight, product of:
                1.1525431 = boost
                6.595522 = idf(docFreq=164, maxDocs=44421)
                0.02047886 = queryNorm
              0.71398616 = fieldWeight in 4927, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.595522 = idf(docFreq=164, maxDocs=44421)
                0.0625 = fieldNorm(doc=4927)
          0.2320981 = weight(abstract_txt:pairs in 4927) [ClassicSimilarity], result of:
            0.2320981 = score(doc=4927,freq=11.0), product of:
              0.16493066 = queryWeight, product of:
                1.1863198 = boost
                6.7888126 = idf(docFreq=135, maxDocs=44421)
                0.02047886 = queryNorm
              1.4072466 = fieldWeight in 4927, product of:
                3.3166249 = tf(freq=11.0), with freq of:
                  11.0 = termFreq=11.0
                6.7888126 = idf(docFreq=135, maxDocs=44421)
                0.0625 = fieldNorm(doc=4927)
          0.03255801 = weight(abstract_txt:from in 4927) [ClassicSimilarity], result of:
            0.03255801 = score(doc=4927,freq=3.0), product of:
              0.10899404 = queryWeight, product of:
                1.928779 = boost
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.02047886 = queryNorm
              0.29871368 = fieldWeight in 4927, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.0625 = fieldNorm(doc=4927)
          0.25794747 = weight(abstract_txt:extracting in 4927) [ClassicSimilarity], result of:
            0.25794747 = score(doc=4927,freq=3.0), product of:
              0.34380195 = queryWeight, product of:
                2.4222572 = boost
                6.930783 = idf(docFreq=117, maxDocs=44421)
                0.02047886 = queryNorm
              0.75027925 = fieldWeight in 4927, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.930783 = idf(docFreq=117, maxDocs=44421)
                0.0625 = fieldNorm(doc=4927)
          0.10484328 = weight(abstract_txt:relation in 4927) [ClassicSimilarity], result of:
            0.10484328 = score(doc=4927,freq=1.0), product of:
              0.3114479 = queryWeight, product of:
                2.8236082 = boost
                5.38611 = idf(docFreq=552, maxDocs=44421)
                0.02047886 = queryNorm
              0.33663186 = fieldWeight in 4927, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.38611 = idf(docFreq=552, maxDocs=44421)
                0.0625 = fieldNorm(doc=4927)
        0.2 = coord(5/25)