Document (#43725)

Author
Asula, M.
Makke, J.
Freienthal, L.
Kuulmets, H.-A.
Sirel, R.
Title
Kratt: developing an automatic subject indexing tool for the National Library of Estonia : how to transfer metadata information among work cluster members
Source
Cataloging and classification quarterly. 59(2021) no.8, p.775-793
Year
2021
Abstract
Manual subject indexing in libraries is a time-consuming and costly process and the quality of the assigned subjects is affected by the cataloger's knowledge on the specific topics contained in the book. Trying to solve these issues, we exploited the opportunities arising from artificial intelligence to develop Kratt: a prototype of an automatic subject indexing tool. Kratt is able to subject index a book independent of its extent and genre with a set of keywords present in the Estonian Subject Thesaurus. It takes Kratt approximately one minute to subject index a book, outperforming humans 10-15 times. Although the resulting keywords were not considered satisfactory by the catalogers, the ratings of a small sample of regular library users showed more promise. We also argue that the results can be enhanced by including a bigger corpus for training the model and applying more careful preprocessing techniques.
Content
Vgl.: https://doi.org/10.1080/01639374.2021.1998283.
Footnote
Teil eines Themenheftes: Artificial intelligence (AI) and automated processes for subject sccess
Theme
Automatisches Indexieren
Location
Estland

Similar documents (content)

  1. Fugmann, R.: Book indexing : the classificatory approach (1994) 0.15
    0.152532 = sum of:
      0.152532 = product of:
        0.63555 = sum of:
          0.019495757 = weight(abstract_txt:more in 6919) [ClassicSimilarity], result of:
            0.019495757 = score(doc=6919,freq=1.0), product of:
              0.07347725 = queryWeight, product of:
                3.3962307 = idf(docFreq=4044, maxDocs=44421)
                0.021634942 = queryNorm
              0.26533052 = fieldWeight in 6919, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3962307 = idf(docFreq=4044, maxDocs=44421)
                0.078125 = fieldNorm(doc=6919)
          0.1125494 = weight(abstract_txt:careful in 6919) [ClassicSimilarity], result of:
            0.1125494 = score(doc=6919,freq=1.0), product of:
              0.18767725 = queryWeight, product of:
                1.1300935 = boost
                7.676116 = idf(docFreq=55, maxDocs=44421)
                0.021634942 = queryNorm
              0.5996966 = fieldWeight in 6919, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.676116 = idf(docFreq=55, maxDocs=44421)
                0.078125 = fieldNorm(doc=6919)
          0.075415544 = weight(abstract_txt:index in 6919) [ClassicSimilarity], result of:
            0.075415544 = score(doc=6919,freq=2.0), product of:
              0.14371102 = queryWeight, product of:
                1.3985196 = boost
                4.7496953 = idf(docFreq=1044, maxDocs=44421)
                0.021634942 = queryNorm
              0.52477217 = fieldWeight in 6919, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.7496953 = idf(docFreq=1044, maxDocs=44421)
                0.078125 = fieldNorm(doc=6919)
          0.10645374 = weight(abstract_txt:indexing in 6919) [ClassicSimilarity], result of:
            0.10645374 = score(doc=6919,freq=3.0), product of:
              0.18083817 = queryWeight, product of:
                1.9213842 = boost
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.021634942 = queryNorm
              0.5886685 = fieldWeight in 6919, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.078125 = fieldNorm(doc=6919)
          0.085516684 = weight(abstract_txt:book in 6919) [ClassicSimilarity], result of:
            0.085516684 = score(doc=6919,freq=1.0), product of:
              0.22538438 = queryWeight, product of:
                2.1450188 = boost
                4.8566523 = idf(docFreq=938, maxDocs=44421)
                0.021634942 = queryNorm
              0.37942594 = fieldWeight in 6919, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8566523 = idf(docFreq=938, maxDocs=44421)
                0.078125 = fieldNorm(doc=6919)
          0.23611887 = weight(abstract_txt:subject in 6919) [ClassicSimilarity], result of:
            0.23611887 = score(doc=6919,freq=7.0), product of:
              0.29216018 = queryWeight, product of:
                3.4537802 = boost
                3.9099448 = idf(docFreq=2419, maxDocs=44421)
                0.021634942 = queryNorm
              0.8081829 = fieldWeight in 6919, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                3.9099448 = idf(docFreq=2419, maxDocs=44421)
                0.078125 = fieldNorm(doc=6919)
        0.24 = coord(6/25)
    
  2. Collier, H.: Cool, cool searching (1996) 0.13
    0.1346434 = sum of:
      0.1346434 = product of:
        0.5610142 = sum of:
          0.098692335 = weight(abstract_txt:humans in 4604) [ClassicSimilarity], result of:
            0.098692335 = score(doc=4604,freq=1.0), product of:
              0.15225945 = queryWeight, product of:
                1.0178896 = boost
                6.9139757 = idf(docFreq=119, maxDocs=44421)
                0.021634942 = queryNorm
              0.64818525 = fieldWeight in 4604, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9139757 = idf(docFreq=119, maxDocs=44421)
                0.09375 = fieldNorm(doc=4604)
          0.06399221 = weight(abstract_txt:index in 4604) [ClassicSimilarity], result of:
            0.06399221 = score(doc=4604,freq=1.0), product of:
              0.14371102 = queryWeight, product of:
                1.3985196 = boost
                4.7496953 = idf(docFreq=1044, maxDocs=44421)
                0.021634942 = queryNorm
              0.44528395 = fieldWeight in 4604, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7496953 = idf(docFreq=1044, maxDocs=44421)
                0.09375 = fieldNorm(doc=4604)
          0.07239925 = weight(abstract_txt:tool in 4604) [ClassicSimilarity], result of:
            0.07239925 = score(doc=4604,freq=1.0), product of:
              0.15603717 = queryWeight, product of:
                1.4572618 = boost
                4.9491973 = idf(docFreq=855, maxDocs=44421)
                0.021634942 = queryNorm
              0.46398723 = fieldWeight in 4604, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9491973 = idf(docFreq=855, maxDocs=44421)
                0.09375 = fieldNorm(doc=4604)
          0.14508365 = weight(abstract_txt:automatic in 4604) [ClassicSimilarity], result of:
            0.14508365 = score(doc=4604,freq=3.0), product of:
              0.17196652 = queryWeight, product of:
                1.5298382 = boost
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.021634942 = queryNorm
              0.8436738 = fieldWeight in 4604, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.09375 = fieldNorm(doc=4604)
          0.07375331 = weight(abstract_txt:indexing in 4604) [ClassicSimilarity], result of:
            0.07375331 = score(doc=4604,freq=1.0), product of:
              0.18083817 = queryWeight, product of:
                1.9213842 = boost
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.021634942 = queryNorm
              0.4078415 = fieldWeight in 4604, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.09375 = fieldNorm(doc=4604)
          0.10709345 = weight(abstract_txt:subject in 4604) [ClassicSimilarity], result of:
            0.10709345 = score(doc=4604,freq=1.0), product of:
              0.29216018 = queryWeight, product of:
                3.4537802 = boost
                3.9099448 = idf(docFreq=2419, maxDocs=44421)
                0.021634942 = queryNorm
              0.36655733 = fieldWeight in 4604, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9099448 = idf(docFreq=2419, maxDocs=44421)
                0.09375 = fieldNorm(doc=4604)
        0.24 = coord(6/25)
    
  3. Taylor, A.G.: Enhancing subject access in online systems : the year's work in subject analysis, 1991 (1992) 0.12
    0.12139147 = sum of:
      0.12139147 = product of:
        0.6069573 = sum of:
          0.023394909 = weight(abstract_txt:more in 1503) [ClassicSimilarity], result of:
            0.023394909 = score(doc=1503,freq=1.0), product of:
              0.07347725 = queryWeight, product of:
                3.3962307 = idf(docFreq=4044, maxDocs=44421)
                0.021634942 = queryNorm
              0.31839663 = fieldWeight in 1503, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3962307 = idf(docFreq=4044, maxDocs=44421)
                0.09375 = fieldNorm(doc=1503)
          0.119572714 = weight(abstract_txt:promise in 1503) [ClassicSimilarity], result of:
            0.119572714 = score(doc=1503,freq=1.0), product of:
              0.17304142 = queryWeight, product of:
                1.0851345 = boost
                7.370734 = idf(docFreq=75, maxDocs=44421)
                0.021634942 = queryNorm
              0.6910063 = fieldWeight in 1503, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.370734 = idf(docFreq=75, maxDocs=44421)
                0.09375 = fieldNorm(doc=1503)
          0.12021849 = weight(abstract_txt:trying in 1503) [ClassicSimilarity], result of:
            0.12021849 = score(doc=1503,freq=1.0), product of:
              0.1736639 = queryWeight, product of:
                1.0870845 = boost
                7.3839793 = idf(docFreq=74, maxDocs=44421)
                0.021634942 = queryNorm
              0.69224805 = fieldWeight in 1503, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3839793 = idf(docFreq=74, maxDocs=44421)
                0.09375 = fieldNorm(doc=1503)
          0.10430293 = weight(abstract_txt:indexing in 1503) [ClassicSimilarity], result of:
            0.10430293 = score(doc=1503,freq=2.0), product of:
              0.18083817 = queryWeight, product of:
                1.9213842 = boost
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.021634942 = queryNorm
              0.57677495 = fieldWeight in 1503, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.09375 = fieldNorm(doc=1503)
          0.23946826 = weight(abstract_txt:subject in 1503) [ClassicSimilarity], result of:
            0.23946826 = score(doc=1503,freq=5.0), product of:
              0.29216018 = queryWeight, product of:
                3.4537802 = boost
                3.9099448 = idf(docFreq=2419, maxDocs=44421)
                0.021634942 = queryNorm
              0.81964713 = fieldWeight in 1503, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.9099448 = idf(docFreq=2419, maxDocs=44421)
                0.09375 = fieldNorm(doc=1503)
        0.2 = coord(5/25)
    
  4. Langridge, D.W.: Subject analysis : principles and procedures (1989) 0.11
    0.10658558 = sum of:
      0.10658558 = product of:
        0.66615987 = sum of:
          0.111685455 = weight(abstract_txt:automatic in 3021) [ClassicSimilarity], result of:
            0.111685455 = score(doc=3021,freq=1.0), product of:
              0.17196652 = queryWeight, product of:
                1.5298382 = boost
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.021634942 = queryNorm
              0.64946043 = fieldWeight in 3021, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.125 = fieldNorm(doc=3021)
          0.17032598 = weight(abstract_txt:indexing in 3021) [ClassicSimilarity], result of:
            0.17032598 = score(doc=3021,freq=3.0), product of:
              0.18083817 = queryWeight, product of:
                1.9213842 = boost
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.021634942 = queryNorm
              0.9418696 = fieldWeight in 3021, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.125 = fieldNorm(doc=3021)
          0.1368267 = weight(abstract_txt:book in 3021) [ClassicSimilarity], result of:
            0.1368267 = score(doc=3021,freq=1.0), product of:
              0.22538438 = queryWeight, product of:
                2.1450188 = boost
                4.8566523 = idf(docFreq=938, maxDocs=44421)
                0.021634942 = queryNorm
              0.60708153 = fieldWeight in 3021, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8566523 = idf(docFreq=938, maxDocs=44421)
                0.125 = fieldNorm(doc=3021)
          0.24732174 = weight(abstract_txt:subject in 3021) [ClassicSimilarity], result of:
            0.24732174 = score(doc=3021,freq=3.0), product of:
              0.29216018 = queryWeight, product of:
                3.4537802 = boost
                3.9099448 = idf(docFreq=2419, maxDocs=44421)
                0.021634942 = queryNorm
              0.8465279 = fieldWeight in 3021, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.9099448 = idf(docFreq=2419, maxDocs=44421)
                0.125 = fieldNorm(doc=3021)
        0.16 = coord(4/25)
    
  5. Academic research on the Internet : options for scholars & librarians (2001) 0.10
    0.098528825 = sum of:
      0.098528825 = product of:
        0.61580515 = sum of:
          0.23965481 = weight(abstract_txt:minute in 1686) [ClassicSimilarity], result of:
            0.23965481 = score(doc=1686,freq=1.0), product of:
              0.22707027 = queryWeight, product of:
                1.2430502 = boost
                8.443371 = idf(docFreq=25, maxDocs=44421)
                0.021634942 = queryNorm
              1.0554214 = fieldWeight in 1686, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.443371 = idf(docFreq=25, maxDocs=44421)
                0.125 = fieldNorm(doc=1686)
          0.09653234 = weight(abstract_txt:tool in 1686) [ClassicSimilarity], result of:
            0.09653234 = score(doc=1686,freq=1.0), product of:
              0.15603717 = queryWeight, product of:
                1.4572618 = boost
                4.9491973 = idf(docFreq=855, maxDocs=44421)
                0.021634942 = queryNorm
              0.61864966 = fieldWeight in 1686, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9491973 = idf(docFreq=855, maxDocs=44421)
                0.125 = fieldNorm(doc=1686)
          0.1368267 = weight(abstract_txt:book in 1686) [ClassicSimilarity], result of:
            0.1368267 = score(doc=1686,freq=1.0), product of:
              0.22538438 = queryWeight, product of:
                2.1450188 = boost
                4.8566523 = idf(docFreq=938, maxDocs=44421)
                0.021634942 = queryNorm
              0.60708153 = fieldWeight in 1686, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8566523 = idf(docFreq=938, maxDocs=44421)
                0.125 = fieldNorm(doc=1686)
          0.14279127 = weight(abstract_txt:subject in 1686) [ClassicSimilarity], result of:
            0.14279127 = score(doc=1686,freq=1.0), product of:
              0.29216018 = queryWeight, product of:
                3.4537802 = boost
                3.9099448 = idf(docFreq=2419, maxDocs=44421)
                0.021634942 = queryNorm
              0.4887431 = fieldWeight in 1686, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9099448 = idf(docFreq=2419, maxDocs=44421)
                0.125 = fieldNorm(doc=1686)
        0.16 = coord(4/25)