Document (#43720)

Author
Golub, K.
Title
Automated subject indexing : an overview
Source
Cataloging and classification quarterly. 59(2021) no.8, p.702-719
Year
2021
Abstract
In the face of the ever-increasing document volume, libraries around the globe are more and more exploring (semi-) automated approaches to subject indexing. This helps sustain bibliographic objectives, enrich metadata, and establish more connections across documents from various collections, effectively leading to improved information retrieval and access. However, generally accepted automated approaches that are functional in operative systems are lacking. This article aims to provide an overview of basic principles used for automated subject indexing, major approaches in relation to their possible application in actual library systems, existing working examples, as well as related challenges calling for further research.
Content
Vgl.: https://doi.org/10.1080/01639374.2021.2012311.
Footnote
Teil eines Themenheftes: Artificial intelligence (AI) and automated processes for subject sccess
Theme
Automatisches Indexieren

Similar documents (author)

  1. Golub, K.: Automated subject classification of textual web documents (2006) 5.28
    5.277107 = sum of:
      5.277107 = weight(author_txt:golub in 600) [ClassicSimilarity], result of:
        5.277107 = fieldWeight in 600, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.443371 = idf(docFreq=25, maxDocs=44421)
          0.625 = fieldNorm(doc=600)
    
  2. Golub, K.: Automated subject classification of textual Web pages, based on a controlled vocabulary : challenges and recommendations (2006) 5.28
    5.277107 = sum of:
      5.277107 = weight(author_txt:golub in 897) [ClassicSimilarity], result of:
        5.277107 = fieldWeight in 897, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.443371 = idf(docFreq=25, maxDocs=44421)
          0.625 = fieldNorm(doc=897)
    
  3. Golub, K.: Subject access to information : an interdisciplinary approach (2015) 5.28
    5.277107 = sum of:
      5.277107 = weight(author_txt:golub in 1134) [ClassicSimilarity], result of:
        5.277107 = fieldWeight in 1134, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.443371 = idf(docFreq=25, maxDocs=44421)
          0.625 = fieldNorm(doc=1134)
    
  4. Golub, K.: Automated subject classification of textual documents in the context of Web-based hierarchical browsing (2011) 5.28
    5.277107 = sum of:
      5.277107 = weight(author_txt:golub in 558) [ClassicSimilarity], result of:
        5.277107 = fieldWeight in 558, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.443371 = idf(docFreq=25, maxDocs=44421)
          0.625 = fieldNorm(doc=558)
    
  5. Golub, K.: Subject access in Swedish discovery services (2018) 5.28
    5.277107 = sum of:
      5.277107 = weight(author_txt:golub in 379) [ClassicSimilarity], result of:
        5.277107 = fieldWeight in 379, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.443371 = idf(docFreq=25, maxDocs=44421)
          0.625 = fieldNorm(doc=379)
    

Similar documents (content)

  1. Golub, K.: Automatic subject indexing of text (2019) 0.25
    0.2525572 = sum of:
      0.2525572 = product of:
        0.70154774 = sum of:
          0.054967303 = weight(abstract_txt:establish in 268) [ClassicSimilarity], result of:
            0.054967303 = score(doc=268,freq=1.0), product of:
              0.14021999 = queryWeight, product of:
                1.0604527 = boost
                6.272122 = idf(docFreq=227, maxDocs=44421)
                0.021081628 = queryNorm
              0.39200762 = fieldWeight in 268, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.272122 = idf(docFreq=227, maxDocs=44421)
                0.0625 = fieldNorm(doc=268)
          0.058899775 = weight(abstract_txt:connections in 268) [ClassicSimilarity], result of:
            0.058899775 = score(doc=268,freq=1.0), product of:
              0.14683042 = queryWeight, product of:
                1.0851614 = boost
                6.418264 = idf(docFreq=196, maxDocs=44421)
                0.021081628 = queryNorm
              0.4011415 = fieldWeight in 268, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.418264 = idf(docFreq=196, maxDocs=44421)
                0.0625 = fieldNorm(doc=268)
          0.03536983 = weight(abstract_txt:systems in 268) [ClassicSimilarity], result of:
            0.03536983 = score(doc=268,freq=4.0), product of:
              0.08295049 = queryWeight, product of:
                1.153482 = boost
                3.411175 = idf(docFreq=3984, maxDocs=44421)
                0.021081628 = queryNorm
              0.42639688 = fieldWeight in 268, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.411175 = idf(docFreq=3984, maxDocs=44421)
                0.0625 = fieldNorm(doc=268)
          0.08826604 = weight(abstract_txt:enrich in 268) [ClassicSimilarity], result of:
            0.08826604 = score(doc=268,freq=1.0), product of:
              0.19228086 = queryWeight, product of:
                1.2418077 = boost
                7.344759 = idf(docFreq=77, maxDocs=44421)
                0.021081628 = queryNorm
              0.45904744 = fieldWeight in 268, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.344759 = idf(docFreq=77, maxDocs=44421)
                0.0625 = fieldNorm(doc=268)
          0.13794364 = weight(abstract_txt:operative in 268) [ClassicSimilarity], result of:
            0.13794364 = score(doc=268,freq=1.0), product of:
              0.25894535 = queryWeight, product of:
                1.4410876 = boost
                8.523414 = idf(docFreq=23, maxDocs=44421)
                0.021081628 = queryNorm
              0.53271335 = fieldWeight in 268, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.523414 = idf(docFreq=23, maxDocs=44421)
                0.0625 = fieldNorm(doc=268)
          0.03702446 = weight(abstract_txt:more in 268) [ClassicSimilarity], result of:
            0.03702446 = score(doc=268,freq=2.0), product of:
              0.12333791 = queryWeight, product of:
                1.7226428 = boost
                3.3962307 = idf(docFreq=4044, maxDocs=44421)
                0.021081628 = queryNorm
              0.3001872 = fieldWeight in 268, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.3962307 = idf(docFreq=4044, maxDocs=44421)
                0.0625 = fieldNorm(doc=268)
          0.08932626 = weight(abstract_txt:subject in 268) [ClassicSimilarity], result of:
            0.08932626 = score(doc=268,freq=5.0), product of:
              0.16347204 = queryWeight, product of:
                1.98321 = boost
                3.9099448 = idf(docFreq=2419, maxDocs=44421)
                0.021081628 = queryNorm
              0.5464314 = fieldWeight in 268, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.9099448 = idf(docFreq=2419, maxDocs=44421)
                0.0625 = fieldNorm(doc=268)
          0.13477777 = weight(abstract_txt:indexing in 268) [ClassicSimilarity], result of:
            0.13477777 = score(doc=268,freq=6.0), product of:
              0.2023683 = queryWeight, product of:
                2.2065725 = boost
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.021081628 = queryNorm
              0.6660024 = fieldWeight in 268, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.0625 = fieldNorm(doc=268)
          0.06497262 = weight(abstract_txt:approaches in 268) [ClassicSimilarity], result of:
            0.06497262 = score(doc=268,freq=1.0), product of:
              0.22608286 = queryWeight, product of:
                2.3322804 = boost
                4.5981455 = idf(docFreq=1215, maxDocs=44421)
                0.021081628 = queryNorm
              0.2873841 = fieldWeight in 268, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5981455 = idf(docFreq=1215, maxDocs=44421)
                0.0625 = fieldNorm(doc=268)
        0.36 = coord(9/25)
    
  2. Golub, K.; Hansson, J.; Soergel, D.; Tudhope, D.: Managing classification in libraries : a methodological outline for evaluating automatic subject indexing and classification in Swedish library catalogues (2015) 0.19
    0.19015066 = sum of:
      0.19015066 = product of:
        0.5942208 = sum of:
          0.054967303 = weight(abstract_txt:establish in 3300) [ClassicSimilarity], result of:
            0.054967303 = score(doc=3300,freq=1.0), product of:
              0.14021999 = queryWeight, product of:
                1.0604527 = boost
                6.272122 = idf(docFreq=227, maxDocs=44421)
                0.021081628 = queryNorm
              0.39200762 = fieldWeight in 3300, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.272122 = idf(docFreq=227, maxDocs=44421)
                0.0625 = fieldNorm(doc=3300)
          0.058899775 = weight(abstract_txt:connections in 3300) [ClassicSimilarity], result of:
            0.058899775 = score(doc=3300,freq=1.0), product of:
              0.14683042 = queryWeight, product of:
                1.0851614 = boost
                6.418264 = idf(docFreq=196, maxDocs=44421)
                0.021081628 = queryNorm
              0.4011415 = fieldWeight in 3300, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.418264 = idf(docFreq=196, maxDocs=44421)
                0.0625 = fieldNorm(doc=3300)
          0.08826604 = weight(abstract_txt:enrich in 3300) [ClassicSimilarity], result of:
            0.08826604 = score(doc=3300,freq=1.0), product of:
              0.19228086 = queryWeight, product of:
                1.2418077 = boost
                7.344759 = idf(docFreq=77, maxDocs=44421)
                0.021081628 = queryNorm
              0.45904744 = fieldWeight in 3300, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.344759 = idf(docFreq=77, maxDocs=44421)
                0.0625 = fieldNorm(doc=3300)
          0.02618025 = weight(abstract_txt:more in 3300) [ClassicSimilarity], result of:
            0.02618025 = score(doc=3300,freq=1.0), product of:
              0.12333791 = queryWeight, product of:
                1.7226428 = boost
                3.3962307 = idf(docFreq=4044, maxDocs=44421)
                0.021081628 = queryNorm
              0.21226442 = fieldWeight in 3300, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3962307 = idf(docFreq=4044, maxDocs=44421)
                0.0625 = fieldNorm(doc=3300)
          0.08932626 = weight(abstract_txt:subject in 3300) [ClassicSimilarity], result of:
            0.08932626 = score(doc=3300,freq=5.0), product of:
              0.16347204 = queryWeight, product of:
                1.98321 = boost
                3.9099448 = idf(docFreq=2419, maxDocs=44421)
                0.021081628 = queryNorm
              0.5464314 = fieldWeight in 3300, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.9099448 = idf(docFreq=2419, maxDocs=44421)
                0.0625 = fieldNorm(doc=3300)
          0.055022795 = weight(abstract_txt:indexing in 3300) [ClassicSimilarity], result of:
            0.055022795 = score(doc=3300,freq=1.0), product of:
              0.2023683 = queryWeight, product of:
                2.2065725 = boost
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.021081628 = queryNorm
              0.27189434 = fieldWeight in 3300, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.0625 = fieldNorm(doc=3300)
          0.06497262 = weight(abstract_txt:approaches in 3300) [ClassicSimilarity], result of:
            0.06497262 = score(doc=3300,freq=1.0), product of:
              0.22608286 = queryWeight, product of:
                2.3322804 = boost
                4.5981455 = idf(docFreq=1215, maxDocs=44421)
                0.021081628 = queryNorm
              0.2873841 = fieldWeight in 3300, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5981455 = idf(docFreq=1215, maxDocs=44421)
                0.0625 = fieldNorm(doc=3300)
          0.15658575 = weight(abstract_txt:automated in 3300) [ClassicSimilarity], result of:
            0.15658575 = score(doc=3300,freq=1.0), product of:
              0.4472961 = queryWeight, product of:
                3.7880342 = boost
                5.6011486 = idf(docFreq=445, maxDocs=44421)
                0.021081628 = queryNorm
              0.3500718 = fieldWeight in 3300, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6011486 = idf(docFreq=445, maxDocs=44421)
                0.0625 = fieldNorm(doc=3300)
        0.32 = coord(8/25)
    
  3. Thiel, T.J.: Automated indexing of document image management systems (1992) 0.16
    0.16168436 = sum of:
      0.16168436 = product of:
        0.6736849 = sum of:
          0.09777703 = weight(abstract_txt:effectively in 3048) [ClassicSimilarity], result of:
            0.09777703 = score(doc=3048,freq=2.0), product of:
              0.124688774 = queryWeight, product of:
                5.9145703 = idf(docFreq=325, maxDocs=44421)
                0.021081628 = queryNorm
              0.78416866 = fieldWeight in 3048, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.9145703 = idf(docFreq=325, maxDocs=44421)
                0.09375 = fieldNorm(doc=3048)
          0.09074371 = weight(abstract_txt:ever in 3048) [ClassicSimilarity], result of:
            0.09074371 = score(doc=3048,freq=1.0), product of:
              0.14947107 = queryWeight, product of:
                1.0948759 = boost
                6.475721 = idf(docFreq=185, maxDocs=44421)
                0.021081628 = queryNorm
              0.6070988 = fieldWeight in 3048, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.475721 = idf(docFreq=185, maxDocs=44421)
                0.09375 = fieldNorm(doc=3048)
          0.045946755 = weight(abstract_txt:systems in 3048) [ClassicSimilarity], result of:
            0.045946755 = score(doc=3048,freq=3.0), product of:
              0.08295049 = queryWeight, product of:
                1.153482 = boost
                3.411175 = idf(docFreq=3984, maxDocs=44421)
                0.021081628 = queryNorm
              0.5539058 = fieldWeight in 3048, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.411175 = idf(docFreq=3984, maxDocs=44421)
                0.09375 = fieldNorm(doc=3048)
          0.039270375 = weight(abstract_txt:more in 3048) [ClassicSimilarity], result of:
            0.039270375 = score(doc=3048,freq=1.0), product of:
              0.12333791 = queryWeight, product of:
                1.7226428 = boost
                3.3962307 = idf(docFreq=4044, maxDocs=44421)
                0.021081628 = queryNorm
              0.31839663 = fieldWeight in 3048, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3962307 = idf(docFreq=4044, maxDocs=44421)
                0.09375 = fieldNorm(doc=3048)
          0.16506839 = weight(abstract_txt:indexing in 3048) [ClassicSimilarity], result of:
            0.16506839 = score(doc=3048,freq=4.0), product of:
              0.2023683 = queryWeight, product of:
                2.2065725 = boost
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.021081628 = queryNorm
              0.815683 = fieldWeight in 3048, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.09375 = fieldNorm(doc=3048)
          0.23487863 = weight(abstract_txt:automated in 3048) [ClassicSimilarity], result of:
            0.23487863 = score(doc=3048,freq=1.0), product of:
              0.4472961 = queryWeight, product of:
                3.7880342 = boost
                5.6011486 = idf(docFreq=445, maxDocs=44421)
                0.021081628 = queryNorm
              0.5251077 = fieldWeight in 3048, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6011486 = idf(docFreq=445, maxDocs=44421)
                0.09375 = fieldNorm(doc=3048)
        0.24 = coord(6/25)
    
  4. Rugg, G.: ¬The future of smart systems in information science (1993) 0.12
    0.12034901 = sum of:
      0.12034901 = product of:
        0.50145423 = sum of:
          0.068853684 = weight(abstract_txt:face in 6712) [ClassicSimilarity], result of:
            0.068853684 = score(doc=6712,freq=1.0), product of:
              0.14041659 = queryWeight, product of:
                1.0611959 = boost
                6.2765174 = idf(docFreq=226, maxDocs=44421)
                0.021081628 = queryNorm
              0.49035293 = fieldWeight in 6712, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2765174 = idf(docFreq=226, maxDocs=44421)
                0.078125 = fieldNorm(doc=6712)
          0.05414877 = weight(abstract_txt:systems in 6712) [ClassicSimilarity], result of:
            0.05414877 = score(doc=6712,freq=6.0), product of:
              0.08295049 = queryWeight, product of:
                1.153482 = boost
                3.411175 = idf(docFreq=3984, maxDocs=44421)
                0.021081628 = queryNorm
              0.6527842 = fieldWeight in 6712, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                3.411175 = idf(docFreq=3984, maxDocs=44421)
                0.078125 = fieldNorm(doc=6712)
          0.03272531 = weight(abstract_txt:more in 6712) [ClassicSimilarity], result of:
            0.03272531 = score(doc=6712,freq=1.0), product of:
              0.12333791 = queryWeight, product of:
                1.7226428 = boost
                3.3962307 = idf(docFreq=4044, maxDocs=44421)
                0.021081628 = queryNorm
              0.26533052 = fieldWeight in 6712, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3962307 = idf(docFreq=4044, maxDocs=44421)
                0.078125 = fieldNorm(doc=6712)
          0.06877849 = weight(abstract_txt:indexing in 6712) [ClassicSimilarity], result of:
            0.06877849 = score(doc=6712,freq=1.0), product of:
              0.2023683 = queryWeight, product of:
                2.2065725 = boost
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.021081628 = queryNorm
              0.33986792 = fieldWeight in 6712, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.078125 = fieldNorm(doc=6712)
          0.08121577 = weight(abstract_txt:approaches in 6712) [ClassicSimilarity], result of:
            0.08121577 = score(doc=6712,freq=1.0), product of:
              0.22608286 = queryWeight, product of:
                2.3322804 = boost
                4.5981455 = idf(docFreq=1215, maxDocs=44421)
                0.021081628 = queryNorm
              0.3592301 = fieldWeight in 6712, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5981455 = idf(docFreq=1215, maxDocs=44421)
                0.078125 = fieldNorm(doc=6712)
          0.19573219 = weight(abstract_txt:automated in 6712) [ClassicSimilarity], result of:
            0.19573219 = score(doc=6712,freq=1.0), product of:
              0.4472961 = queryWeight, product of:
                3.7880342 = boost
                5.6011486 = idf(docFreq=445, maxDocs=44421)
                0.021081628 = queryNorm
              0.43758973 = fieldWeight in 6712, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6011486 = idf(docFreq=445, maxDocs=44421)
                0.078125 = fieldNorm(doc=6712)
        0.24 = coord(6/25)
    
  5. Hahn, J.: Semi-automated methods for BIBFRAME work entity description (2021) 0.11
    0.112918854 = sum of:
      0.112918854 = product of:
        0.70574284 = sum of:
          0.13164458 = weight(abstract_txt:semi in 1726) [ClassicSimilarity], result of:
            0.13164458 = score(doc=1726,freq=2.0), product of:
              0.15203309 = queryWeight, product of:
                1.1042194 = boost
                6.5309834 = idf(docFreq=175, maxDocs=44421)
                0.021081628 = queryNorm
              0.8658942 = fieldWeight in 1726, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.5309834 = idf(docFreq=175, maxDocs=44421)
                0.09375 = fieldNorm(doc=1726)
          0.08474232 = weight(abstract_txt:subject in 1726) [ClassicSimilarity], result of:
            0.08474232 = score(doc=1726,freq=2.0), product of:
              0.16347204 = queryWeight, product of:
                1.98321 = boost
                3.9099448 = idf(docFreq=2419, maxDocs=44421)
                0.021081628 = queryNorm
              0.5183903 = fieldWeight in 1726, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.9099448 = idf(docFreq=2419, maxDocs=44421)
                0.09375 = fieldNorm(doc=1726)
          0.082534194 = weight(abstract_txt:indexing in 1726) [ClassicSimilarity], result of:
            0.082534194 = score(doc=1726,freq=1.0), product of:
              0.2023683 = queryWeight, product of:
                2.2065725 = boost
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.021081628 = queryNorm
              0.4078415 = fieldWeight in 1726, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.09375 = fieldNorm(doc=1726)
          0.40682173 = weight(abstract_txt:automated in 1726) [ClassicSimilarity], result of:
            0.40682173 = score(doc=1726,freq=3.0), product of:
              0.4472961 = queryWeight, product of:
                3.7880342 = boost
                5.6011486 = idf(docFreq=445, maxDocs=44421)
                0.021081628 = queryNorm
              0.90951324 = fieldWeight in 1726, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.6011486 = idf(docFreq=445, maxDocs=44421)
                0.09375 = fieldNorm(doc=1726)
        0.16 = coord(4/25)