Document (#44142)

Author
Chou, C.
Chu, T.
Title
¬An analysis of BERT (NLP) for assisted subject indexing for Project Gutenberg
Source
Cataloging and classification quarterly. 60(2022) no.8, p.807-835
Year
2022
Abstract
In light of AI (Artificial Intelligence) and NLP (Natural language processing) technologies, this article examines the feasibility of using AI/NLP models to enhance the subject indexing of digital resources. While BERT (Bidirectional Encoder Representations from Transformers) models are widely used in scholarly communities, the authors assess whether BERT models can be used in machine-assisted indexing in the Project Gutenberg collection, through suggesting Library of Congress subject headings filtered by certain Library of Congress Classification subclass labels. The findings of this study are informative for further research on BERT models to assist with automatic subject indexing for digital library collections.
Content
Vgl.: https://www.tandfonline.com/doi/full/10.1080/01639374.2022.2138666.
Theme
Computerlinguistik
Automatisches Indexieren
Object
BERT
Projekt Gutenberg
LCSH
LCC

Similar documents (author)

  1. Chou, D.D.: Developing an Intranet : tool selection and management issues (1998) 5.87
    5.874302 = sum of:
      5.874302 = weight(author_txt:chou in 3425) [ClassicSimilarity], result of:
        5.874302 = fieldWeight in 3425, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.398883 = idf(docFreq=9, maxDocs=44421)
          0.625 = fieldNorm(doc=3425)
    
  2. Chou, L.: Informativ, interaktiv, kollaborativ und selbstbestimmt : Mit digitalen Lernumgebungen verändern sich die Lernprozesse (2000) 5.87
    5.874302 = sum of:
      5.874302 = weight(author_txt:chou in 6211) [ClassicSimilarity], result of:
        5.874302 = fieldWeight in 6211, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.398883 = idf(docFreq=9, maxDocs=44421)
          0.625 = fieldNorm(doc=6211)
    
  3. Chou, C.: Purpose-driven assessment of cataloging and metadata services : transforming broken links into linked data (2019) 5.87
    5.874302 = sum of:
      5.874302 = weight(author_txt:chou in 280) [ClassicSimilarity], result of:
        5.874302 = fieldWeight in 280, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.398883 = idf(docFreq=9, maxDocs=44421)
          0.625 = fieldNorm(doc=280)
    
  4. Chou, S.W.; Tsai, Y.H.: Knowledge creation : individual and organizational perspectives (2005) 4.70
    4.6994414 = sum of:
      4.6994414 = weight(author_txt:chou in 5648) [ClassicSimilarity], result of:
        4.6994414 = fieldWeight in 5648, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.398883 = idf(docFreq=9, maxDocs=44421)
          0.5 = fieldNorm(doc=5648)
    
  5. Kalczynski, P.J.; Chou, A.: Temporal Document Retrieval Model for business news archives (2005) 4.70
    4.6994414 = sum of:
      4.6994414 = weight(author_txt:chou in 2030) [ClassicSimilarity], result of:
        4.6994414 = fieldWeight in 2030, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.398883 = idf(docFreq=9, maxDocs=44421)
          0.5 = fieldNorm(doc=2030)
    

Similar documents (content)

  1. Lowe, D.B.; Dollinger, I.; Koster, T.; Herbert, B.E.: Text mining for type of research classification (2021) 0.24
    0.24323498 = sum of:
      0.24323498 = product of:
        1.0134791 = sum of:
          0.034846846 = weight(abstract_txt:representations in 1721) [ClassicSimilarity], result of:
            0.034846846 = score(doc=1721,freq=1.0), product of:
              0.07436863 = queryWeight, product of:
                1.0129986 = boost
                5.997685 = idf(docFreq=299, maxDocs=44421)
                0.012240448 = queryNorm
              0.46856913 = fieldWeight in 1721, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.997685 = idf(docFreq=299, maxDocs=44421)
                0.078125 = fieldNorm(doc=1721)
          0.03837657 = weight(abstract_txt:project in 1721) [ClassicSimilarity], result of:
            0.03837657 = score(doc=1721,freq=2.0), product of:
              0.07930945 = queryWeight, product of:
                1.4794197 = boost
                4.3796177 = idf(docFreq=1512, maxDocs=44421)
                0.012240448 = queryNorm
              0.48388395 = fieldWeight in 1721, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3796177 = idf(docFreq=1512, maxDocs=44421)
                0.078125 = fieldNorm(doc=1721)
          0.12021085 = weight(abstract_txt:bidirectional in 1721) [ClassicSimilarity], result of:
            0.12021085 = score(doc=1721,freq=1.0), product of:
              0.16978915 = queryWeight, product of:
                1.5306253 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.012240448 = queryNorm
              0.7080008 = fieldWeight in 1721, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.078125 = fieldNorm(doc=1721)
          0.12645014 = weight(abstract_txt:encoder in 1721) [ClassicSimilarity], result of:
            0.12645014 = score(doc=1721,freq=1.0), product of:
              0.17561449 = queryWeight, product of:
                1.5566612 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.012240448 = queryNorm
              0.72004384 = fieldWeight in 1721, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.078125 = fieldNorm(doc=1721)
          0.1571792 = weight(abstract_txt:transformers in 1721) [ClassicSimilarity], result of:
            0.1571792 = score(doc=1721,freq=1.0), product of:
              0.20302252 = queryWeight, product of:
                1.6737325 = boost
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.012240448 = queryNorm
              0.7741959 = fieldWeight in 1721, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.078125 = fieldNorm(doc=1721)
          0.53641546 = weight(abstract_txt:bert in 1721) [ClassicSimilarity], result of:
            0.53641546 = score(doc=1721,freq=1.0), product of:
              0.73052484 = queryWeight, product of:
                6.34982 = boost
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.012240448 = queryNorm
              0.73428774 = fieldWeight in 1721, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.078125 = fieldNorm(doc=1721)
        0.24 = coord(6/25)
    
  2. Xiang, R.; Chersoni, E.; Lu, Q.; Huang, C.-R.; Li, W.; Long, Y.: Lexical data augmentation for sentiment analysis (2021) 0.21
    0.21121079 = sum of:
      0.21121079 = product of:
        0.75432426 = sum of:
          0.024392793 = weight(abstract_txt:representations in 1393) [ClassicSimilarity], result of:
            0.024392793 = score(doc=1393,freq=1.0), product of:
              0.07436863 = queryWeight, product of:
                1.0129986 = boost
                5.997685 = idf(docFreq=299, maxDocs=44421)
                0.012240448 = queryNorm
              0.3279984 = fieldWeight in 1393, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.997685 = idf(docFreq=299, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1393)
          0.00855603 = weight(abstract_txt:used in 1393) [ClassicSimilarity], result of:
            0.00855603 = score(doc=1393,freq=1.0), product of:
              0.04660226 = queryWeight, product of:
                1.1340506 = boost
                3.3572001 = idf(docFreq=4205, maxDocs=44421)
                0.012240448 = queryNorm
              0.18359688 = fieldWeight in 1393, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3572001 = idf(docFreq=4205, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1393)
          0.0841476 = weight(abstract_txt:bidirectional in 1393) [ClassicSimilarity], result of:
            0.0841476 = score(doc=1393,freq=1.0), product of:
              0.16978915 = queryWeight, product of:
                1.5306253 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.012240448 = queryNorm
              0.49560058 = fieldWeight in 1393, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1393)
          0.088515095 = weight(abstract_txt:encoder in 1393) [ClassicSimilarity], result of:
            0.088515095 = score(doc=1393,freq=1.0), product of:
              0.17561449 = queryWeight, product of:
                1.5566612 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.012240448 = queryNorm
              0.5040307 = fieldWeight in 1393, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1393)
          0.11002545 = weight(abstract_txt:transformers in 1393) [ClassicSimilarity], result of:
            0.11002545 = score(doc=1393,freq=1.0), product of:
              0.20302252 = queryWeight, product of:
                1.6737325 = boost
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.012240448 = queryNorm
              0.5419372 = fieldWeight in 1393, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1393)
          0.06319646 = weight(abstract_txt:models in 1393) [ClassicSimilarity], result of:
            0.06319646 = score(doc=1393,freq=2.0), product of:
              0.17674777 = queryWeight, product of:
                3.1233518 = boost
                4.623126 = idf(docFreq=1185, maxDocs=44421)
                0.012240448 = queryNorm
              0.35755166 = fieldWeight in 1393, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.623126 = idf(docFreq=1185, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1393)
          0.3754908 = weight(abstract_txt:bert in 1393) [ClassicSimilarity], result of:
            0.3754908 = score(doc=1393,freq=1.0), product of:
              0.73052484 = queryWeight, product of:
                6.34982 = boost
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.012240448 = queryNorm
              0.5140014 = fieldWeight in 1393, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1393)
        0.28 = coord(7/25)
    
  3. Meng, K.; Ba, Z.; Ma, Y.; Li, G.: ¬A network coupling approach to detecting hierarchical linkages between science and technology (2024) 0.12
    0.12035272 = sum of:
      0.12035272 = product of:
        0.75220454 = sum of:
          0.09616868 = weight(abstract_txt:bidirectional in 2207) [ClassicSimilarity], result of:
            0.09616868 = score(doc=2207,freq=1.0), product of:
              0.16978915 = queryWeight, product of:
                1.5306253 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.012240448 = queryNorm
              0.56640065 = fieldWeight in 2207, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.0625 = fieldNorm(doc=2207)
          0.10116011 = weight(abstract_txt:encoder in 2207) [ClassicSimilarity], result of:
            0.10116011 = score(doc=2207,freq=1.0), product of:
              0.17561449 = queryWeight, product of:
                1.5566612 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.012240448 = queryNorm
              0.5760351 = fieldWeight in 2207, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.0625 = fieldNorm(doc=2207)
          0.12574337 = weight(abstract_txt:transformers in 2207) [ClassicSimilarity], result of:
            0.12574337 = score(doc=2207,freq=1.0), product of:
              0.20302252 = queryWeight, product of:
                1.6737325 = boost
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.012240448 = queryNorm
              0.61935675 = fieldWeight in 2207, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.0625 = fieldNorm(doc=2207)
          0.42913234 = weight(abstract_txt:bert in 2207) [ClassicSimilarity], result of:
            0.42913234 = score(doc=2207,freq=1.0), product of:
              0.73052484 = queryWeight, product of:
                6.34982 = boost
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.012240448 = queryNorm
              0.5874302 = fieldWeight in 2207, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.0625 = fieldNorm(doc=2207)
        0.16 = coord(4/25)
    
  4. Chen, K.; Zhao, Y.; Song, N.; Han, Y.; Peng, J.; Wang, J.: You are not alone: : characterizing users' relationship-layer identities in online health communities (2024) 0.12
    0.12035272 = sum of:
      0.12035272 = product of:
        0.75220454 = sum of:
          0.09616868 = weight(abstract_txt:bidirectional in 2300) [ClassicSimilarity], result of:
            0.09616868 = score(doc=2300,freq=1.0), product of:
              0.16978915 = queryWeight, product of:
                1.5306253 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.012240448 = queryNorm
              0.56640065 = fieldWeight in 2300, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.0625 = fieldNorm(doc=2300)
          0.10116011 = weight(abstract_txt:encoder in 2300) [ClassicSimilarity], result of:
            0.10116011 = score(doc=2300,freq=1.0), product of:
              0.17561449 = queryWeight, product of:
                1.5566612 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.012240448 = queryNorm
              0.5760351 = fieldWeight in 2300, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.0625 = fieldNorm(doc=2300)
          0.12574337 = weight(abstract_txt:transformers in 2300) [ClassicSimilarity], result of:
            0.12574337 = score(doc=2300,freq=1.0), product of:
              0.20302252 = queryWeight, product of:
                1.6737325 = boost
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.012240448 = queryNorm
              0.61935675 = fieldWeight in 2300, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.0625 = fieldNorm(doc=2300)
          0.42913234 = weight(abstract_txt:bert in 2300) [ClassicSimilarity], result of:
            0.42913234 = score(doc=2300,freq=1.0), product of:
              0.73052484 = queryWeight, product of:
                6.34982 = boost
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.012240448 = queryNorm
              0.5874302 = fieldWeight in 2300, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.0625 = fieldNorm(doc=2300)
        0.16 = coord(4/25)
    
  5. Humphrey, S.M.: Use and management of classification systems for knowledge-based indexing (1992) 0.09
    0.08808122 = sum of:
      0.08808122 = product of:
        0.44040608 = sum of:
          0.049874622 = weight(abstract_txt:artificial in 2093) [ClassicSimilarity], result of:
            0.049874622 = score(doc=2093,freq=1.0), product of:
              0.075471304 = queryWeight, product of:
                1.0204809 = boost
                6.0419855 = idf(docFreq=286, maxDocs=44421)
                0.012240448 = queryNorm
              0.6608422 = fieldWeight in 2093, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0419855 = idf(docFreq=286, maxDocs=44421)
                0.109375 = fieldNorm(doc=2093)
          0.0537272 = weight(abstract_txt:project in 2093) [ClassicSimilarity], result of:
            0.0537272 = score(doc=2093,freq=2.0), product of:
              0.07930945 = queryWeight, product of:
                1.4794197 = boost
                4.3796177 = idf(docFreq=1512, maxDocs=44421)
                0.012240448 = queryNorm
              0.67743754 = fieldWeight in 2093, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3796177 = idf(docFreq=1512, maxDocs=44421)
                0.109375 = fieldNorm(doc=2093)
          0.021997757 = weight(abstract_txt:library in 2093) [ClassicSimilarity], result of:
            0.021997757 = score(doc=2093,freq=1.0), product of:
              0.063069806 = queryWeight, product of:
                1.6157914 = boost
                3.188885 = idf(docFreq=4976, maxDocs=44421)
                0.012240448 = queryNorm
              0.3487843 = fieldWeight in 2093, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.188885 = idf(docFreq=4976, maxDocs=44421)
                0.109375 = fieldNorm(doc=2093)
          0.16587353 = weight(abstract_txt:assisted in 2093) [ClassicSimilarity], result of:
            0.16587353 = score(doc=2093,freq=1.0), product of:
              0.21186385 = queryWeight, product of:
                2.418006 = boost
                7.1581726 = idf(docFreq=93, maxDocs=44421)
                0.012240448 = queryNorm
              0.7829251 = fieldWeight in 2093, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.1581726 = idf(docFreq=93, maxDocs=44421)
                0.109375 = fieldNorm(doc=2093)
          0.14893301 = weight(abstract_txt:indexing in 2093) [ClassicSimilarity], result of:
            0.14893301 = score(doc=2093,freq=4.0), product of:
              0.15650304 = queryWeight, product of:
                2.9390388 = boost
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.012240448 = queryNorm
              0.9516302 = fieldWeight in 2093, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.109375 = fieldNorm(doc=2093)
        0.2 = coord(5/25)