Document (#20655)

Author
Lawson, M.
Title
Automatic extraction of citations from the text of English-language patents : an example of template mining
Source
Journal of information science. 22(1996) no.6, S.423-436
Year
1996
Abstract
Describes and evaluates methods for automatically isolating and extracting biliographic references from the full texts of patents, designed to facilitate the work of patent examiners who currently perform this task manually. These references include citations both to patents and to other bibliographic sources. Notes that patents are unusual as citing documents in that the citations occur maily in the body of the text, rather than as footnotes or in separate sections. Describes the natural language processing technique of template mining used to extract data directly from the text where either the data or the text surrounding the data form recognizable patterns. When text matches a template, the system extracts data according to instructions associated with that template. Examines the sub languages of citations and the development of templates for the extraction of citations to patent. Reports results of running 2 reference extraction systems against a sample of 100 European Patent Office patent documents, with recall and prescision data for patent and non patent citations, and concludes with suggestions for future improvements
Field
Patentinformation

Similar documents (author)

  1. Lawson, V.L.: Using a computer-assisted-instruction program to replace the traditional library tour : an experimental study (1989) 5.94
    5.9401517 = sum of:
      5.9401517 = weight(author_txt:lawson in 6667) [ClassicSimilarity], result of:
        5.9401517 = fieldWeight in 6667, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.504243 = idf(docFreq=8, maxDocs=44421)
          0.625 = fieldNorm(doc=6667)
    
  2. Lawson, G.T.: Software reviews : Microsoft Cinemania (1994) 5.94
    5.9401517 = sum of:
      5.9401517 = weight(author_txt:lawson in 1023) [ClassicSimilarity], result of:
        5.9401517 = fieldWeight in 1023, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.504243 = idf(docFreq=8, maxDocs=44421)
          0.625 = fieldNorm(doc=1023)
    
  3. Lawson, D.: You've come a long way, Dewey! (2001) 5.94
    5.9401517 = sum of:
      5.9401517 = weight(author_txt:lawson in 6913) [ClassicSimilarity], result of:
        5.9401517 = fieldWeight in 6913, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.504243 = idf(docFreq=8, maxDocs=44421)
          0.625 = fieldNorm(doc=6913)
    
  4. Lawson, A.E.: How do people learn? : and what does that imply about the nature of knowledge (2000) 5.94
    5.9401517 = sum of:
      5.9401517 = weight(author_txt:lawson in 139) [ClassicSimilarity], result of:
        5.9401517 = fieldWeight in 139, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.504243 = idf(docFreq=8, maxDocs=44421)
          0.625 = fieldNorm(doc=139)
    
  5. Lawson, V.; Vasconcellos, M.: Forty ways to skin a cat : users report on machine translation (1994) 4.75
    4.7521214 = sum of:
      4.7521214 = weight(author_txt:lawson in 6955) [ClassicSimilarity], result of:
        4.7521214 = fieldWeight in 6955, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.504243 = idf(docFreq=8, maxDocs=44421)
          0.5 = fieldNorm(doc=6955)
    

Similar documents (content)

  1. Tseng, Y.-H.; Lin, C.-J.; Lin, Y.-I.: Text mining techniques for patent analysis (2007) 0.34
    0.34164876 = sum of:
      0.34164876 = product of:
        1.0676523 = sum of:
          0.04640639 = weight(abstract_txt:extracts in 1935) [ClassicSimilarity], result of:
            0.04640639 = score(doc=1935,freq=1.0), product of:
              0.09823624 = queryWeight, product of:
                1.0559026 = boost
                7.558333 = idf(docFreq=62, maxDocs=44421)
                0.012308974 = queryNorm
              0.4723958 = fieldWeight in 1935, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.558333 = idf(docFreq=62, maxDocs=44421)
                0.0625 = fieldNorm(doc=1935)
          0.012055944 = weight(abstract_txt:describes in 1935) [ClassicSimilarity], result of:
            0.012055944 = score(doc=1935,freq=1.0), product of:
              0.05039228 = queryWeight, product of:
                1.0695103 = boost
                3.82787 = idf(docFreq=2626, maxDocs=44421)
                0.012308974 = queryNorm
              0.23924187 = fieldWeight in 1935, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.82787 = idf(docFreq=2626, maxDocs=44421)
                0.0625 = fieldNorm(doc=1935)
          0.015068548 = weight(abstract_txt:documents in 1935) [ClassicSimilarity], result of:
            0.015068548 = score(doc=1935,freq=1.0), product of:
              0.058471486 = queryWeight, product of:
                1.1520599 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.012308974 = queryNorm
              0.25770763 = fieldWeight in 1935, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.0625 = fieldNorm(doc=1935)
          0.050537717 = weight(abstract_txt:mining in 1935) [ClassicSimilarity], result of:
            0.050537717 = score(doc=1935,freq=1.0), product of:
              0.13101076 = queryWeight, product of:
                1.7244732 = boost
                6.1720386 = idf(docFreq=251, maxDocs=44421)
                0.012308974 = queryNorm
              0.3857524 = fieldWeight in 1935, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1720386 = idf(docFreq=251, maxDocs=44421)
                0.0625 = fieldNorm(doc=1935)
          0.10825436 = weight(abstract_txt:extraction in 1935) [ClassicSimilarity], result of:
            0.10825436 = score(doc=1935,freq=2.0), product of:
              0.19779436 = queryWeight, product of:
                2.5951087 = boost
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.012308974 = queryNorm
              0.5473076 = fieldWeight in 1935, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.0625 = fieldNorm(doc=1935)
          0.05014334 = weight(abstract_txt:text in 1935) [ClassicSimilarity], result of:
            0.05014334 = score(doc=1935,freq=2.0), product of:
              0.1403919 = queryWeight, product of:
                2.8225653 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.012308974 = queryNorm
              0.3571669 = fieldWeight in 1935, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0625 = fieldNorm(doc=1935)
          0.17797059 = weight(abstract_txt:patents in 1935) [ClassicSimilarity], result of:
            0.17797059 = score(doc=1935,freq=1.0), product of:
              0.38206628 = queryWeight, product of:
                4.164735 = boost
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.012308974 = queryNorm
              0.46581078 = fieldWeight in 1935, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.0625 = fieldNorm(doc=1935)
          0.60721546 = weight(abstract_txt:patent in 1935) [ClassicSimilarity], result of:
            0.60721546 = score(doc=1935,freq=8.0), product of:
              0.4956048 = queryWeight, product of:
                5.8094015 = boost
                6.930783 = idf(docFreq=117, maxDocs=44421)
                0.012308974 = queryNorm
              1.2252009 = fieldWeight in 1935, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                6.930783 = idf(docFreq=117, maxDocs=44421)
                0.0625 = fieldNorm(doc=1935)
        0.32 = coord(8/25)
    
  2. Perez-Molina, E.: ¬The role of patent citations as a footprint of technology (2018) 0.34
    0.3410932 = sum of:
      0.3410932 = product of:
        1.0659163 = sum of:
          0.0062681255 = weight(abstract_txt:with in 187) [ClassicSimilarity], result of:
            0.0062681255 = score(doc=187,freq=1.0), product of:
              0.032142434 = queryWeight, product of:
                1.0461357 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.012308974 = queryNorm
              0.19501092 = fieldWeight in 187, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.078125 = fieldNorm(doc=187)
          0.018835684 = weight(abstract_txt:documents in 187) [ClassicSimilarity], result of:
            0.018835684 = score(doc=187,freq=1.0), product of:
              0.058471486 = queryWeight, product of:
                1.1520599 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.012308974 = queryNorm
              0.32213452 = fieldWeight in 187, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.078125 = fieldNorm(doc=187)
          0.011975382 = weight(abstract_txt:from in 187) [ClassicSimilarity], result of:
            0.011975382 = score(doc=187,freq=2.0), product of:
              0.039279856 = queryWeight, product of:
                1.1564678 = boost
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.012308974 = queryNorm
              0.30487338 = fieldWeight in 187, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.078125 = fieldNorm(doc=187)
          0.047157403 = weight(abstract_txt:references in 187) [ClassicSimilarity], result of:
            0.047157403 = score(doc=187,freq=1.0), product of:
              0.10780936 = queryWeight, product of:
                1.5643402 = boost
                5.598909 = idf(docFreq=446, maxDocs=44421)
                0.012308974 = queryNorm
              0.43741477 = fieldWeight in 187, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.598909 = idf(docFreq=446, maxDocs=44421)
                0.078125 = fieldNorm(doc=187)
          0.035084523 = weight(abstract_txt:data in 187) [ClassicSimilarity], result of:
            0.035084523 = score(doc=187,freq=2.0), product of:
              0.09535356 = queryWeight, product of:
                2.32617 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.012308974 = queryNorm
              0.3679414 = fieldWeight in 187, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.078125 = fieldNorm(doc=187)
          0.44492647 = weight(abstract_txt:patents in 187) [ClassicSimilarity], result of:
            0.44492647 = score(doc=187,freq=4.0), product of:
              0.38206628 = queryWeight, product of:
                4.164735 = boost
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.012308974 = queryNorm
              1.1645269 = fieldWeight in 187, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.078125 = fieldNorm(doc=187)
          0.12215901 = weight(abstract_txt:citations in 187) [ClassicSimilarity], result of:
            0.12215901 = score(doc=187,freq=1.0), product of:
              0.29327878 = queryWeight, product of:
                4.468934 = boost
                5.331567 = idf(docFreq=583, maxDocs=44421)
                0.012308974 = queryNorm
              0.41652864 = fieldWeight in 187, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.331567 = idf(docFreq=583, maxDocs=44421)
                0.078125 = fieldNorm(doc=187)
          0.37950966 = weight(abstract_txt:patent in 187) [ClassicSimilarity], result of:
            0.37950966 = score(doc=187,freq=2.0), product of:
              0.4956048 = queryWeight, product of:
                5.8094015 = boost
                6.930783 = idf(docFreq=117, maxDocs=44421)
                0.012308974 = queryNorm
              0.7657505 = fieldWeight in 187, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.930783 = idf(docFreq=117, maxDocs=44421)
                0.078125 = fieldNorm(doc=187)
        0.32 = coord(8/25)
    
  3. Azagra-Caro, J.M.; Mattsson, P.; Perruchas, F.: Smoothing the lies : the distinctive effects of patent characteristics on examiner and applicant citations (2011) 0.28
    0.28001666 = sum of:
      0.28001666 = product of:
        1.4000833 = sum of:
          0.18488738 = weight(abstract_txt:examiners in 747) [ClassicSimilarity], result of:
            0.18488738 = score(doc=747,freq=2.0), product of:
              0.16886567 = queryWeight, product of:
                1.3843907 = boost
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.012308974 = queryNorm
              1.0948784 = fieldWeight in 747, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.078125 = fieldNorm(doc=747)
          0.047157403 = weight(abstract_txt:references in 747) [ClassicSimilarity], result of:
            0.047157403 = score(doc=747,freq=1.0), product of:
              0.10780936 = queryWeight, product of:
                1.5643402 = boost
                5.598909 = idf(docFreq=446, maxDocs=44421)
                0.012308974 = queryNorm
              0.43741477 = fieldWeight in 747, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.598909 = idf(docFreq=446, maxDocs=44421)
                0.078125 = fieldNorm(doc=747)
          0.22246324 = weight(abstract_txt:patents in 747) [ClassicSimilarity], result of:
            0.22246324 = score(doc=747,freq=1.0), product of:
              0.38206628 = queryWeight, product of:
                4.164735 = boost
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.012308974 = queryNorm
              0.58226347 = fieldWeight in 747, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.078125 = fieldNorm(doc=747)
          0.34551784 = weight(abstract_txt:citations in 747) [ClassicSimilarity], result of:
            0.34551784 = score(doc=747,freq=8.0), product of:
              0.29327878 = queryWeight, product of:
                4.468934 = boost
                5.331567 = idf(docFreq=583, maxDocs=44421)
                0.012308974 = queryNorm
              1.1781209 = fieldWeight in 747, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                5.331567 = idf(docFreq=583, maxDocs=44421)
                0.078125 = fieldNorm(doc=747)
          0.6000575 = weight(abstract_txt:patent in 747) [ClassicSimilarity], result of:
            0.6000575 = score(doc=747,freq=5.0), product of:
              0.4956048 = queryWeight, product of:
                5.8094015 = boost
                6.930783 = idf(docFreq=117, maxDocs=44421)
                0.012308974 = queryNorm
              1.210758 = fieldWeight in 747, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.930783 = idf(docFreq=117, maxDocs=44421)
                0.078125 = fieldNorm(doc=747)
        0.2 = coord(5/25)
    
  4. Karki, M.M.S.: Patent citation analysis : a policy analysis tool (1997) 0.25
    0.25286156 = sum of:
      0.25286156 = product of:
        1.5803847 = sum of:
          0.029836938 = weight(abstract_txt:describes in 3076) [ClassicSimilarity], result of:
            0.029836938 = score(doc=3076,freq=2.0), product of:
              0.05039228 = queryWeight, product of:
                1.0695103 = boost
                3.82787 = idf(docFreq=2626, maxDocs=44421)
                0.012308974 = queryNorm
              0.5920934 = fieldWeight in 3076, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.82787 = idf(docFreq=2626, maxDocs=44421)
                0.109375 = fieldNorm(doc=3076)
          0.5394447 = weight(abstract_txt:patents in 3076) [ClassicSimilarity], result of:
            0.5394447 = score(doc=3076,freq=3.0), product of:
              0.38206628 = queryWeight, product of:
                4.164735 = boost
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.012308974 = queryNorm
              1.4119139 = fieldWeight in 3076, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.109375 = fieldNorm(doc=3076)
          0.17102262 = weight(abstract_txt:citations in 3076) [ClassicSimilarity], result of:
            0.17102262 = score(doc=3076,freq=1.0), product of:
              0.29327878 = queryWeight, product of:
                4.468934 = boost
                5.331567 = idf(docFreq=583, maxDocs=44421)
                0.012308974 = queryNorm
              0.58314013 = fieldWeight in 3076, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.331567 = idf(docFreq=583, maxDocs=44421)
                0.109375 = fieldNorm(doc=3076)
          0.84008044 = weight(abstract_txt:patent in 3076) [ClassicSimilarity], result of:
            0.84008044 = score(doc=3076,freq=5.0), product of:
              0.4956048 = queryWeight, product of:
                5.8094015 = boost
                6.930783 = idf(docFreq=117, maxDocs=44421)
                0.012308974 = queryNorm
              1.6950611 = fieldWeight in 3076, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.930783 = idf(docFreq=117, maxDocs=44421)
                0.109375 = fieldNorm(doc=3076)
        0.16 = coord(4/25)
    
  5. Huang, M.-H.; Huang, W.-T.; Chang, C.-C.; Chen, D. Z.; Lin, C.-P.: The greater scattering phenomenon beyond Bradford's law in patent citation (2014) 0.24
    0.24198344 = sum of:
      0.24198344 = product of:
        1.2099172 = sum of:
          0.0062681255 = weight(abstract_txt:with in 2352) [ClassicSimilarity], result of:
            0.0062681255 = score(doc=2352,freq=1.0), product of:
              0.032142434 = queryWeight, product of:
                1.0461357 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.012308974 = queryNorm
              0.19501092 = fieldWeight in 2352, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.078125 = fieldNorm(doc=2352)
          0.0084678745 = weight(abstract_txt:from in 2352) [ClassicSimilarity], result of:
            0.0084678745 = score(doc=2352,freq=1.0), product of:
              0.039279856 = queryWeight, product of:
                1.1564678 = boost
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.012308974 = queryNorm
              0.21557805 = fieldWeight in 2352, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.078125 = fieldNorm(doc=2352)
          0.38531762 = weight(abstract_txt:patents in 2352) [ClassicSimilarity], result of:
            0.38531762 = score(doc=2352,freq=3.0), product of:
              0.38206628 = queryWeight, product of:
                4.164735 = boost
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.012308974 = queryNorm
              1.0085099 = fieldWeight in 2352, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.078125 = fieldNorm(doc=2352)
          0.27315587 = weight(abstract_txt:citations in 2352) [ClassicSimilarity], result of:
            0.27315587 = score(doc=2352,freq=5.0), product of:
              0.29327878 = queryWeight, product of:
                4.468934 = boost
                5.331567 = idf(docFreq=583, maxDocs=44421)
                0.012308974 = queryNorm
              0.9313864 = fieldWeight in 2352, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.331567 = idf(docFreq=583, maxDocs=44421)
                0.078125 = fieldNorm(doc=2352)
          0.5367077 = weight(abstract_txt:patent in 2352) [ClassicSimilarity], result of:
            0.5367077 = score(doc=2352,freq=4.0), product of:
              0.4956048 = queryWeight, product of:
                5.8094015 = boost
                6.930783 = idf(docFreq=117, maxDocs=44421)
                0.012308974 = queryNorm
              1.0829349 = fieldWeight in 2352, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.930783 = idf(docFreq=117, maxDocs=44421)
                0.078125 = fieldNorm(doc=2352)
        0.2 = coord(5/25)