Document (#43065)

Author
Yang, T.-H.
Hsieh, Y.-L.
Liu, S.-H.
Chang, Y.-C.
Hsu, W.-L.
Title
¬A flexible template generation and matching method with applications for publication reference metadata extraction
Source
Journal of the Association for Information Science and Technology. 72(2021) no.1, S.32-45
Year
2021
Abstract
Conventional rule-based approaches use exact template matching to capture linguistic information and necessarily need to enumerate all variations. We propose a novel flexible template generation and matching scheme called the principle-based approach (PBA) based on sequence alignment, and employ it for reference metadata extraction (RME) to demonstrate its effectiveness. The main contributions of this research are threefold. First, we propose an automatic template generation that can capture prominent patterns using the dominating set algorithm. Second, we devise an alignment-based template-matching technique that uses a logistic regression model, which makes it more general and flexible than pure rule-based approaches. Last, we apply PBA to RME on extensive cross-domain corpora and demonstrate its robustness and generality. Experiments reveal that the same set of templates produced by the PBA framework not only deliver consistent performance on various unseen domains, but also surpass hand-crafted knowledge (templates). We use four independent journal style test sets and one conference style test set in the experiments. When compared to renowned machine learning methods, such as conditional random fields (CRF), as well as recent deep learning methods (i.e., bi-directional long short-term memory with a CRF layer, Bi-LSTM-CRF), PBA has the best performance for all datasets.
Content
Vgl.: https://asistdl.onlinelibrary.wiley.com/doi/10.1002/asi.24391.
Theme
Automatisches Indexieren
Metadaten

Similar documents (author)

  1. Hsu, C.-N.; Chang, C.-H.; Hsieh, C.-H.; Lu, J.-J.; Chang, C.-C.: Reconfigurable Web wrapper agents for biological information integration (2005) 1.96
    1.96169 = sum of:
      1.96169 = product of:
        2.942535 = sum of:
          1.4900533 = weight(author_txt:chang in 263) [ClassicSimilarity], result of:
            1.4900533 = score(doc=263,freq=2.0), product of:
              0.55031055 = queryWeight, product of:
                1.0682881 = boost
                7.6584163 = idf(docFreq=56, maxDocs=44421)
                0.06726366 = queryNorm
              2.707659 = fieldWeight in 263, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.6584163 = idf(docFreq=56, maxDocs=44421)
                0.25 = fieldNorm(doc=263)
          1.4524815 = weight(author_txt:hsieh in 263) [ClassicSimilarity], result of:
            1.4524815 = score(doc=263,freq=1.0), product of:
              0.68164307 = queryWeight, product of:
                1.1889483 = boost
                8.523414 = idf(docFreq=23, maxDocs=44421)
                0.06726366 = queryNorm
              2.1308534 = fieldWeight in 263, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.523414 = idf(docFreq=23, maxDocs=44421)
                0.25 = fieldNorm(doc=263)
        0.6666667 = coord(2/3)
    
  2. Hsieh-Yee, I.: ¬The cataloging practices of special libraries and their relationship with OCLC (1996) 0.97
    0.968321 = sum of:
      0.968321 = product of:
        2.904963 = sum of:
          2.904963 = weight(author_txt:hsieh in 4988) [ClassicSimilarity], result of:
            2.904963 = score(doc=4988,freq=1.0), product of:
              0.68164307 = queryWeight, product of:
                1.1889483 = boost
                8.523414 = idf(docFreq=23, maxDocs=44421)
                0.06726366 = queryNorm
              4.261707 = fieldWeight in 4988, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.523414 = idf(docFreq=23, maxDocs=44421)
                0.5 = fieldNorm(doc=4988)
        0.33333334 = coord(1/3)
    
  3. Hsieh-Yee, I.: Student use of online catalogs and other information channels (1996) 0.97
    0.968321 = sum of:
      0.968321 = product of:
        2.904963 = sum of:
          2.904963 = weight(author_txt:hsieh in 5611) [ClassicSimilarity], result of:
            2.904963 = score(doc=5611,freq=1.0), product of:
              0.68164307 = queryWeight, product of:
                1.1889483 = boost
                8.523414 = idf(docFreq=23, maxDocs=44421)
                0.06726366 = queryNorm
              4.261707 = fieldWeight in 5611, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.523414 = idf(docFreq=23, maxDocs=44421)
                0.5 = fieldNorm(doc=5611)
        0.33333334 = coord(1/3)
    
  4. Hsieh-Yee, I.: ¬The retrieval power of selected search engines : how well do they address general reference questions and subject questions? (1998) 0.97
    0.968321 = sum of:
      0.968321 = product of:
        2.904963 = sum of:
          2.904963 = weight(author_txt:hsieh in 3186) [ClassicSimilarity], result of:
            2.904963 = score(doc=3186,freq=1.0), product of:
              0.68164307 = queryWeight, product of:
                1.1889483 = boost
                8.523414 = idf(docFreq=23, maxDocs=44421)
                0.06726366 = queryNorm
              4.261707 = fieldWeight in 3186, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.523414 = idf(docFreq=23, maxDocs=44421)
                0.5 = fieldNorm(doc=3186)
        0.33333334 = coord(1/3)
    
  5. Hsieh-Yee, I.: Search tactics of Web users in searching for texts, graphics, known items and subjects : a search simulation study (1998) 0.97
    0.968321 = sum of:
      0.968321 = product of:
        2.904963 = sum of:
          2.904963 = weight(author_txt:hsieh in 3404) [ClassicSimilarity], result of:
            2.904963 = score(doc=3404,freq=1.0), product of:
              0.68164307 = queryWeight, product of:
                1.1889483 = boost
                8.523414 = idf(docFreq=23, maxDocs=44421)
                0.06726366 = queryNorm
              4.261707 = fieldWeight in 3404, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.523414 = idf(docFreq=23, maxDocs=44421)
                0.5 = fieldNorm(doc=3404)
        0.33333334 = coord(1/3)
    

Similar documents (content)

  1. Li, J.; Zhang, Z.; Li, X.; Chen, H.: Kernel-based learning for biomedical relation extraction (2008) 0.15
    0.14772242 = sum of:
      0.14772242 = product of:
        0.5275801 = sum of:
          0.043964796 = weight(abstract_txt:learning in 2611) [ClassicSimilarity], result of:
            0.043964796 = score(doc=2611,freq=2.0), product of:
              0.083913565 = queryWeight, product of:
                1.0380473 = boost
                4.7420692 = idf(docFreq=1052, maxDocs=44421)
                0.017046968 = queryNorm
              0.52392954 = fieldWeight in 2611, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.7420692 = idf(docFreq=1052, maxDocs=44421)
                0.078125 = fieldNorm(doc=2611)
          0.039631933 = weight(abstract_txt:propose in 2611) [ClassicSimilarity], result of:
            0.039631933 = score(doc=2611,freq=1.0), product of:
              0.09865875 = queryWeight, product of:
                1.1255605 = boost
                5.1418524 = idf(docFreq=705, maxDocs=44421)
                0.017046968 = queryNorm
              0.40170723 = fieldWeight in 2611, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1418524 = idf(docFreq=705, maxDocs=44421)
                0.078125 = fieldNorm(doc=2611)
          0.04401309 = weight(abstract_txt:experiments in 2611) [ClassicSimilarity], result of:
            0.04401309 = score(doc=2611,freq=1.0), product of:
              0.10580187 = queryWeight, product of:
                1.1655952 = boost
                5.324741 = idf(docFreq=587, maxDocs=44421)
                0.017046968 = queryNorm
              0.4159954 = fieldWeight in 2611, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.324741 = idf(docFreq=587, maxDocs=44421)
                0.078125 = fieldNorm(doc=2611)
          0.1198827 = weight(abstract_txt:extraction in 2611) [ClassicSimilarity], result of:
            0.1198827 = score(doc=2611,freq=3.0), product of:
              0.14307685 = queryWeight, product of:
                1.3554571 = boost
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.017046968 = queryNorm
              0.83789027 = fieldWeight in 2611, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.078125 = fieldNorm(doc=2611)
          0.074632406 = weight(abstract_txt:capture in 2611) [ClassicSimilarity], result of:
            0.074632406 = score(doc=2611,freq=1.0), product of:
              0.15044938 = queryWeight, product of:
                1.3899407 = boost
                6.3496094 = idf(docFreq=210, maxDocs=44421)
                0.017046968 = queryNorm
              0.49606323 = fieldWeight in 2611, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3496094 = idf(docFreq=210, maxDocs=44421)
                0.078125 = fieldNorm(doc=2611)
          0.040712498 = weight(abstract_txt:based in 2611) [ClassicSimilarity], result of:
            0.040712498 = score(doc=2611,freq=3.0), product of:
              0.09452141 = queryWeight, product of:
                1.741952 = boost
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.017046968 = queryNorm
              0.4307225 = fieldWeight in 2611, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.078125 = fieldNorm(doc=2611)
          0.16474263 = weight(abstract_txt:templates in 2611) [ClassicSimilarity], result of:
            0.16474263 = score(doc=2611,freq=1.0), product of:
              0.25506026 = queryWeight, product of:
                1.8097662 = boost
                8.267481 = idf(docFreq=30, maxDocs=44421)
                0.017046968 = queryNorm
              0.6458969 = fieldWeight in 2611, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.267481 = idf(docFreq=30, maxDocs=44421)
                0.078125 = fieldNorm(doc=2611)
        0.28 = coord(7/25)
    
  2. Yim, W.-w.; Kwan, S.W.; Yetisgen, M.: Classifying tumor event attributes in radiology reports (2017) 0.14
    0.14287816 = sum of:
      0.14287816 = product of:
        0.71439075 = sum of:
          0.024870247 = weight(abstract_txt:learning in 4929) [ClassicSimilarity], result of:
            0.024870247 = score(doc=4929,freq=1.0), product of:
              0.083913565 = queryWeight, product of:
                1.0380473 = boost
                4.7420692 = idf(docFreq=1052, maxDocs=44421)
                0.017046968 = queryNorm
              0.29637933 = fieldWeight in 4929, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7420692 = idf(docFreq=1052, maxDocs=44421)
                0.0625 = fieldNorm(doc=4929)
          0.07830705 = weight(abstract_txt:extraction in 4929) [ClassicSimilarity], result of:
            0.07830705 = score(doc=4929,freq=2.0), product of:
              0.14307685 = queryWeight, product of:
                1.3554571 = boost
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.017046968 = queryNorm
              0.5473076 = fieldWeight in 4929, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.0625 = fieldNorm(doc=4929)
          0.018804297 = weight(abstract_txt:based in 4929) [ClassicSimilarity], result of:
            0.018804297 = score(doc=4929,freq=1.0), product of:
              0.09452141 = queryWeight, product of:
                1.741952 = boost
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.017046968 = queryNorm
              0.1989422 = fieldWeight in 4929, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.0625 = fieldNorm(doc=4929)
          0.13179411 = weight(abstract_txt:templates in 4929) [ClassicSimilarity], result of:
            0.13179411 = score(doc=4929,freq=1.0), product of:
              0.25506026 = queryWeight, product of:
                1.8097662 = boost
                8.267481 = idf(docFreq=30, maxDocs=44421)
                0.017046968 = queryNorm
              0.51671755 = fieldWeight in 4929, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.267481 = idf(docFreq=30, maxDocs=44421)
                0.0625 = fieldNorm(doc=4929)
          0.460615 = weight(abstract_txt:template in 4929) [ClassicSimilarity], result of:
            0.460615 = score(doc=4929,freq=2.0), product of:
              0.6327627 = queryWeight, product of:
                4.507041 = boost
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.017046968 = queryNorm
              0.72794276 = fieldWeight in 4929, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.0625 = fieldNorm(doc=4929)
        0.2 = coord(5/25)
    
  3. Taniguchi, S.: ¬A system for analyzing cataloguing rules : a feasibility study (1996) 0.13
    0.13151948 = sum of:
      0.13151948 = product of:
        0.8219968 = sum of:
          0.20299442 = weight(abstract_txt:rule in 4266) [ClassicSimilarity], result of:
            0.20299442 = score(doc=4266,freq=9.0), product of:
              0.16353875 = queryWeight, product of:
                1.4491435 = boost
                6.6200633 = idf(docFreq=160, maxDocs=44421)
                0.017046968 = queryNorm
              1.2412618 = fieldWeight in 4266, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                6.6200633 = idf(docFreq=160, maxDocs=44421)
                0.0625 = fieldNorm(doc=4266)
          0.02659329 = weight(abstract_txt:based in 4266) [ClassicSimilarity], result of:
            0.02659329 = score(doc=4266,freq=2.0), product of:
              0.09452141 = queryWeight, product of:
                1.741952 = boost
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.017046968 = queryNorm
              0.28134674 = fieldWeight in 4266, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.0625 = fieldNorm(doc=4266)
          0.13179411 = weight(abstract_txt:templates in 4266) [ClassicSimilarity], result of:
            0.13179411 = score(doc=4266,freq=1.0), product of:
              0.25506026 = queryWeight, product of:
                1.8097662 = boost
                8.267481 = idf(docFreq=30, maxDocs=44421)
                0.017046968 = queryNorm
              0.51671755 = fieldWeight in 4266, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.267481 = idf(docFreq=30, maxDocs=44421)
                0.0625 = fieldNorm(doc=4266)
          0.460615 = weight(abstract_txt:template in 4266) [ClassicSimilarity], result of:
            0.460615 = score(doc=4266,freq=2.0), product of:
              0.6327627 = queryWeight, product of:
                4.507041 = boost
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.017046968 = queryNorm
              0.72794276 = fieldWeight in 4266, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.0625 = fieldNorm(doc=4266)
        0.16 = coord(4/25)
    
  4. Kuikka, E.; Salminen, A.: Two-dimensional filters for structured text (1997) 0.12
    0.11755995 = sum of:
      0.11755995 = product of:
        0.97966623 = sum of:
          0.16474263 = weight(abstract_txt:templates in 320) [ClassicSimilarity], result of:
            0.16474263 = score(doc=320,freq=1.0), product of:
              0.25506026 = queryWeight, product of:
                1.8097662 = boost
                8.267481 = idf(docFreq=30, maxDocs=44421)
                0.017046968 = queryNorm
              0.6458969 = fieldWeight in 320, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.267481 = idf(docFreq=30, maxDocs=44421)
                0.078125 = fieldNorm(doc=320)
          0.109753825 = weight(abstract_txt:flexible in 320) [ClassicSimilarity], result of:
            0.109753825 = score(doc=320,freq=1.0), product of:
              0.22271474 = queryWeight, product of:
                2.0711958 = boost
                6.30784 = idf(docFreq=219, maxDocs=44421)
                0.017046968 = queryNorm
              0.4928 = fieldWeight in 320, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.30784 = idf(docFreq=219, maxDocs=44421)
                0.078125 = fieldNorm(doc=320)
          0.7051698 = weight(abstract_txt:template in 320) [ClassicSimilarity], result of:
            0.7051698 = score(doc=320,freq=3.0), product of:
              0.6327627 = queryWeight, product of:
                4.507041 = boost
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.017046968 = queryNorm
              1.1144302 = fieldWeight in 320, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.078125 = fieldNorm(doc=320)
        0.12 = coord(3/25)
    
  5. Cheng, Y.-Y.; Xia, Y.: ¬A systematic review of methods for aligning, mapping, merging taxonomies in information sciences (2023) 0.11
    0.10538458 = sum of:
      0.10538458 = product of:
        0.5269229 = sum of:
          0.08893826 = weight(abstract_txt:threefold in 2031) [ClassicSimilarity], result of:
            0.08893826 = score(doc=2031,freq=1.0), product of:
              0.15574993 = queryWeight, product of:
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.017046968 = queryNorm
              0.5710324 = fieldWeight in 2031, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.0625 = fieldNorm(doc=2031)
          0.022673814 = weight(abstract_txt:approaches in 2031) [ClassicSimilarity], result of:
            0.022673814 = score(doc=2031,freq=1.0), product of:
              0.078897245 = queryWeight, product of:
                1.0065422 = boost
                4.5981455 = idf(docFreq=1215, maxDocs=44421)
                0.017046968 = queryNorm
              0.2873841 = fieldWeight in 2031, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5981455 = idf(docFreq=1215, maxDocs=44421)
                0.0625 = fieldNorm(doc=2031)
          0.24321835 = weight(abstract_txt:alignment in 2031) [ClassicSimilarity], result of:
            0.24321835 = score(doc=2031,freq=7.0), product of:
              0.20060588 = queryWeight, product of:
                1.604992 = boost
                7.33202 = idf(docFreq=78, maxDocs=44421)
                0.017046968 = queryNorm
              1.2124188 = fieldWeight in 2031, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                7.33202 = idf(docFreq=78, maxDocs=44421)
                0.0625 = fieldNorm(doc=2031)
          0.02659329 = weight(abstract_txt:based in 2031) [ClassicSimilarity], result of:
            0.02659329 = score(doc=2031,freq=2.0), product of:
              0.09452141 = queryWeight, product of:
                1.741952 = boost
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.017046968 = queryNorm
              0.28134674 = fieldWeight in 2031, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.0625 = fieldNorm(doc=2031)
          0.14549915 = weight(abstract_txt:matching in 2031) [ClassicSimilarity], result of:
            0.14549915 = score(doc=2031,freq=2.0), product of:
              0.27244934 = queryWeight, product of:
                2.6452026 = boost
                6.0419855 = idf(docFreq=286, maxDocs=44421)
                0.017046968 = queryNorm
              0.5340411 = fieldWeight in 2031, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.0419855 = idf(docFreq=286, maxDocs=44421)
                0.0625 = fieldNorm(doc=2031)
        0.2 = coord(5/25)