Document (#42237)

Author
Wolfe, EW.
Title
a case study in automated metadata enhancement : Natural Language Processing in the humanities
Source
Code4Lib journal. Issue 46(2019), [http://journal.code4lib.org]
Year
2019
Abstract
The Black Book Interactive Project at the University of Kansas (KU) is developing an expanded corpus of novels by African American authors, with an emphasis on lesser known writers and a goal of expanding research in this field. Using a custom metadata schema with an emphasis on race-related elements, each novel is analyzed for a variety of elements such as literary style, targeted content analysis, historical context, and other areas. Librarians at KU have worked to develop a variety of computational text analysis processes designed to assist with specific aspects of this metadata collection, including text mining and natural language processing, automated subject extraction based on word sense disambiguation, harvesting data from Wikidata, and other actions.
Content
Vgl.: https://journal.code4lib.org/articles/14834.
Theme
Metadaten
Automatisches Indexieren
Field
Geisteswissenschaften

Similar documents (content)

  1. Shaalan, K.; Raza, H.: NERA: Named Entity Recognition for Arabic (2009) 0.15
    0.14902262 = sum of:
      0.14902262 = product of:
        0.4656957 = sum of:
          0.07079173 = weight(abstract_txt:worked in 3953) [ClassicSimilarity], result of:
            0.07079173 = score(doc=3953,freq=1.0), product of:
              0.17804147 = queryWeight, product of:
                1.0231206 = boost
                7.270651 = idf(docFreq=83, maxDocs=44421)
                0.023934316 = queryNorm
              0.3976137 = fieldWeight in 3953, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.270651 = idf(docFreq=83, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3953)
          0.072599486 = weight(abstract_txt:disambiguation in 3953) [ClassicSimilarity], result of:
            0.072599486 = score(doc=3953,freq=1.0), product of:
              0.18105972 = queryWeight, product of:
                1.0317564 = boost
                7.33202 = idf(docFreq=78, maxDocs=44421)
                0.023934316 = queryNorm
              0.40096983 = fieldWeight in 3953, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.33202 = idf(docFreq=78, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3953)
          0.024306431 = weight(abstract_txt:text in 3953) [ClassicSimilarity], result of:
            0.024306431 = score(doc=3953,freq=1.0), product of:
              0.10999095 = queryWeight, product of:
                1.13726 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.023934316 = queryNorm
              0.22098574 = fieldWeight in 3953, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3953)
          0.05346128 = weight(abstract_txt:language in 3953) [ClassicSimilarity], result of:
            0.05346128 = score(doc=3953,freq=4.0), product of:
              0.1171878 = queryWeight, product of:
                1.1738766 = boost
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.023934316 = queryNorm
              0.45620176 = fieldWeight in 3953, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3953)
          0.062232308 = weight(abstract_txt:processing in 3953) [ClassicSimilarity], result of:
            0.062232308 = score(doc=3953,freq=2.0), product of:
              0.16338421 = queryWeight, product of:
                1.3860737 = boost
                4.9249606 = idf(docFreq=876, maxDocs=44421)
                0.023934316 = queryNorm
              0.38089547 = fieldWeight in 3953, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.9249606 = idf(docFreq=876, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3953)
          0.067872554 = weight(abstract_txt:natural in 3953) [ClassicSimilarity], result of:
            0.067872554 = score(doc=3953,freq=2.0), product of:
              0.1731127 = queryWeight, product of:
                1.4267429 = boost
                5.0694656 = idf(docFreq=758, maxDocs=44421)
                0.023934316 = queryNorm
              0.39207146 = fieldWeight in 3953, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.0694656 = idf(docFreq=758, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3953)
          0.05024472 = weight(abstract_txt:variety in 3953) [ClassicSimilarity], result of:
            0.05024472 = score(doc=3953,freq=1.0), product of:
              0.17848556 = queryWeight, product of:
                1.4487145 = boost
                5.1475344 = idf(docFreq=701, maxDocs=44421)
                0.023934316 = queryNorm
              0.2815058 = fieldWeight in 3953, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1475344 = idf(docFreq=701, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3953)
          0.06418721 = weight(abstract_txt:metadata in 3953) [ClassicSimilarity], result of:
            0.06418721 = score(doc=3953,freq=1.0), product of:
              0.2405501 = queryWeight, product of:
                2.0598218 = boost
                4.87927 = idf(docFreq=917, maxDocs=44421)
                0.023934316 = queryNorm
              0.2668351 = fieldWeight in 3953, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.87927 = idf(docFreq=917, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3953)
        0.32 = coord(8/25)
    
  2. Christel, M.G.: Automated metadata in multimedia information systems : creation, refinement, use in surrogates, and evaluation (2009) 0.14
    0.14128718 = sum of:
      0.14128718 = product of:
        0.44152242 = sum of:
          0.020399453 = weight(abstract_txt:analysis in 73) [ClassicSimilarity], result of:
            0.020399453 = score(doc=73,freq=1.0), product of:
              0.08952834 = queryWeight, product of:
                1.0260334 = boost
                3.6456752 = idf(docFreq=3151, maxDocs=44421)
                0.023934316 = queryNorm
              0.2278547 = fieldWeight in 73, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6456752 = idf(docFreq=3151, maxDocs=44421)
                0.0625 = fieldNorm(doc=73)
          0.019643215 = weight(abstract_txt:with in 73) [ClassicSimilarity], result of:
            0.019643215 = score(doc=73,freq=4.0), product of:
              0.0629555 = queryWeight, product of:
                1.0537649 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.023934316 = queryNorm
              0.31201747 = fieldWeight in 73, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.0625 = fieldNorm(doc=73)
          0.027778778 = weight(abstract_txt:text in 73) [ClassicSimilarity], result of:
            0.027778778 = score(doc=73,freq=1.0), product of:
              0.10999095 = queryWeight, product of:
                1.13726 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.023934316 = queryNorm
              0.25255513 = fieldWeight in 73, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0625 = fieldNorm(doc=73)
          0.030549303 = weight(abstract_txt:language in 73) [ClassicSimilarity], result of:
            0.030549303 = score(doc=73,freq=1.0), product of:
              0.1171878 = queryWeight, product of:
                1.1738766 = boost
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.023934316 = queryNorm
              0.26068673 = fieldWeight in 73, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.0625 = fieldNorm(doc=73)
          0.0502913 = weight(abstract_txt:processing in 73) [ClassicSimilarity], result of:
            0.0502913 = score(doc=73,freq=1.0), product of:
              0.16338421 = queryWeight, product of:
                1.3860737 = boost
                4.9249606 = idf(docFreq=876, maxDocs=44421)
                0.023934316 = queryNorm
              0.30781004 = fieldWeight in 73, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9249606 = idf(docFreq=876, maxDocs=44421)
                0.0625 = fieldNorm(doc=73)
          0.054849308 = weight(abstract_txt:natural in 73) [ClassicSimilarity], result of:
            0.054849308 = score(doc=73,freq=1.0), product of:
              0.1731127 = queryWeight, product of:
                1.4267429 = boost
                5.0694656 = idf(docFreq=758, maxDocs=44421)
                0.023934316 = queryNorm
              0.3168416 = fieldWeight in 73, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0694656 = idf(docFreq=758, maxDocs=44421)
                0.0625 = fieldNorm(doc=73)
          0.073980264 = weight(abstract_txt:automated in 73) [ClassicSimilarity], result of:
            0.073980264 = score(doc=73,freq=1.0), product of:
              0.21132885 = queryWeight, product of:
                1.5763791 = boost
                5.6011486 = idf(docFreq=445, maxDocs=44421)
                0.023934316 = queryNorm
              0.3500718 = fieldWeight in 73, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6011486 = idf(docFreq=445, maxDocs=44421)
                0.0625 = fieldNorm(doc=73)
          0.1640308 = weight(abstract_txt:metadata in 73) [ClassicSimilarity], result of:
            0.1640308 = score(doc=73,freq=5.0), product of:
              0.2405501 = queryWeight, product of:
                2.0598218 = boost
                4.87927 = idf(docFreq=917, maxDocs=44421)
                0.023934316 = queryNorm
              0.6818987 = fieldWeight in 73, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.87927 = idf(docFreq=917, maxDocs=44421)
                0.0625 = fieldNorm(doc=73)
        0.32 = coord(8/25)
    
  3. Jurafsky, D.; Martin, J.H.: Speech and language processing : ani ntroduction to natural language processing, computational linguistics and speech recognition (2009) 0.12
    0.124741115 = sum of:
      0.124741115 = product of:
        0.51975465 = sum of:
          0.017362313 = weight(abstract_txt:with in 2081) [ClassicSimilarity], result of:
            0.017362313 = score(doc=2081,freq=2.0), product of:
              0.0629555 = queryWeight, product of:
                1.0537649 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.023934316 = queryNorm
              0.2757871 = fieldWeight in 2081, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.078125 = fieldNorm(doc=2081)
          0.03472347 = weight(abstract_txt:text in 2081) [ClassicSimilarity], result of:
            0.03472347 = score(doc=2081,freq=1.0), product of:
              0.10999095 = queryWeight, product of:
                1.13726 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.023934316 = queryNorm
              0.3156939 = fieldWeight in 2081, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.078125 = fieldNorm(doc=2081)
          0.108008094 = weight(abstract_txt:language in 2081) [ClassicSimilarity], result of:
            0.108008094 = score(doc=2081,freq=8.0), product of:
              0.1171878 = queryWeight, product of:
                1.1738766 = boost
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.023934316 = queryNorm
              0.92166674 = fieldWeight in 2081, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.078125 = fieldNorm(doc=2081)
          0.15398504 = weight(abstract_txt:processing in 2081) [ClassicSimilarity], result of:
            0.15398504 = score(doc=2081,freq=6.0), product of:
              0.16338421 = queryWeight, product of:
                1.3860737 = boost
                4.9249606 = idf(docFreq=876, maxDocs=44421)
                0.023934316 = queryNorm
              0.9424719 = fieldWeight in 2081, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.9249606 = idf(docFreq=876, maxDocs=44421)
                0.078125 = fieldNorm(doc=2081)
          0.09696079 = weight(abstract_txt:natural in 2081) [ClassicSimilarity], result of:
            0.09696079 = score(doc=2081,freq=2.0), product of:
              0.1731127 = queryWeight, product of:
                1.4267429 = boost
                5.0694656 = idf(docFreq=758, maxDocs=44421)
                0.023934316 = queryNorm
              0.5601021 = fieldWeight in 2081, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.0694656 = idf(docFreq=758, maxDocs=44421)
                0.078125 = fieldNorm(doc=2081)
          0.10871497 = weight(abstract_txt:emphasis in 2081) [ClassicSimilarity], result of:
            0.10871497 = score(doc=2081,freq=1.0), product of:
              0.23539709 = queryWeight, product of:
                1.663726 = boost
                5.9115076 = idf(docFreq=326, maxDocs=44421)
                0.023934316 = queryNorm
              0.46183652 = fieldWeight in 2081, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9115076 = idf(docFreq=326, maxDocs=44421)
                0.078125 = fieldNorm(doc=2081)
        0.24 = coord(6/25)
    
  4. Taylor, S.L.: Integrating natural language understanding with document structure analysis (1994) 0.12
    0.11753834 = sum of:
      0.11753834 = product of:
        0.48974308 = sum of:
          0.05299933 = weight(abstract_txt:analysis in 1862) [ClassicSimilarity], result of:
            0.05299933 = score(doc=1862,freq=3.0), product of:
              0.08952834 = queryWeight, product of:
                1.0260334 = boost
                3.6456752 = idf(docFreq=3151, maxDocs=44421)
                0.023934316 = queryNorm
              0.59198385 = fieldWeight in 1862, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.6456752 = idf(docFreq=3151, maxDocs=44421)
                0.09375 = fieldNorm(doc=1862)
          0.014732412 = weight(abstract_txt:with in 1862) [ClassicSimilarity], result of:
            0.014732412 = score(doc=1862,freq=1.0), product of:
              0.0629555 = queryWeight, product of:
                1.0537649 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.023934316 = queryNorm
              0.23401311 = fieldWeight in 1862, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.09375 = fieldNorm(doc=1862)
          0.07217138 = weight(abstract_txt:text in 1862) [ClassicSimilarity], result of:
            0.07217138 = score(doc=1862,freq=3.0), product of:
              0.10999095 = queryWeight, product of:
                1.13726 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.023934316 = queryNorm
              0.6561575 = fieldWeight in 1862, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.09375 = fieldNorm(doc=1862)
          0.06480486 = weight(abstract_txt:language in 1862) [ClassicSimilarity], result of:
            0.06480486 = score(doc=1862,freq=2.0), product of:
              0.1171878 = queryWeight, product of:
                1.1738766 = boost
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.023934316 = queryNorm
              0.5530001 = fieldWeight in 1862, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.09375 = fieldNorm(doc=1862)
          0.16868214 = weight(abstract_txt:processing in 1862) [ClassicSimilarity], result of:
            0.16868214 = score(doc=1862,freq=5.0), product of:
              0.16338421 = queryWeight, product of:
                1.3860737 = boost
                4.9249606 = idf(docFreq=876, maxDocs=44421)
                0.023934316 = queryNorm
              1.0324262 = fieldWeight in 1862, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.9249606 = idf(docFreq=876, maxDocs=44421)
                0.09375 = fieldNorm(doc=1862)
          0.11635294 = weight(abstract_txt:natural in 1862) [ClassicSimilarity], result of:
            0.11635294 = score(doc=1862,freq=2.0), product of:
              0.1731127 = queryWeight, product of:
                1.4267429 = boost
                5.0694656 = idf(docFreq=758, maxDocs=44421)
                0.023934316 = queryNorm
              0.6721225 = fieldWeight in 1862, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.0694656 = idf(docFreq=758, maxDocs=44421)
                0.09375 = fieldNorm(doc=1862)
        0.24 = coord(6/25)
    
  5. Vledutz-Stokolov, N.: Concept recognition in an automatic text-processing system for the life sciences (1987) 0.11
    0.11354714 = sum of:
      0.11354714 = product of:
        0.4055255 = sum of:
          0.020399453 = weight(abstract_txt:analysis in 2848) [ClassicSimilarity], result of:
            0.020399453 = score(doc=2848,freq=1.0), product of:
              0.08952834 = queryWeight, product of:
                1.0260334 = boost
                3.6456752 = idf(docFreq=3151, maxDocs=44421)
                0.023934316 = queryNorm
              0.2278547 = fieldWeight in 2848, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6456752 = idf(docFreq=3151, maxDocs=44421)
                0.0625 = fieldNorm(doc=2848)
          0.117338486 = weight(abstract_txt:disambiguation in 2848) [ClassicSimilarity], result of:
            0.117338486 = score(doc=2848,freq=2.0), product of:
              0.18105972 = queryWeight, product of:
                1.0317564 = boost
                7.33202 = idf(docFreq=78, maxDocs=44421)
                0.023934316 = queryNorm
              0.6480651 = fieldWeight in 2848, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.33202 = idf(docFreq=78, maxDocs=44421)
                0.0625 = fieldNorm(doc=2848)
          0.01388985 = weight(abstract_txt:with in 2848) [ClassicSimilarity], result of:
            0.01388985 = score(doc=2848,freq=2.0), product of:
              0.0629555 = queryWeight, product of:
                1.0537649 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.023934316 = queryNorm
              0.22062966 = fieldWeight in 2848, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.0625 = fieldNorm(doc=2848)
          0.027778778 = weight(abstract_txt:text in 2848) [ClassicSimilarity], result of:
            0.027778778 = score(doc=2848,freq=1.0), product of:
              0.10999095 = queryWeight, product of:
                1.13726 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.023934316 = queryNorm
              0.25255513 = fieldWeight in 2848, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0625 = fieldNorm(doc=2848)
          0.08082586 = weight(abstract_txt:language in 2848) [ClassicSimilarity], result of:
            0.08082586 = score(doc=2848,freq=7.0), product of:
              0.1171878 = queryWeight, product of:
                1.1738766 = boost
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.023934316 = queryNorm
              0.6897122 = fieldWeight in 2848, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.0625 = fieldNorm(doc=2848)
          0.0502913 = weight(abstract_txt:processing in 2848) [ClassicSimilarity], result of:
            0.0502913 = score(doc=2848,freq=1.0), product of:
              0.16338421 = queryWeight, product of:
                1.3860737 = boost
                4.9249606 = idf(docFreq=876, maxDocs=44421)
                0.023934316 = queryNorm
              0.30781004 = fieldWeight in 2848, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9249606 = idf(docFreq=876, maxDocs=44421)
                0.0625 = fieldNorm(doc=2848)
          0.09500179 = weight(abstract_txt:natural in 2848) [ClassicSimilarity], result of:
            0.09500179 = score(doc=2848,freq=3.0), product of:
              0.1731127 = queryWeight, product of:
                1.4267429 = boost
                5.0694656 = idf(docFreq=758, maxDocs=44421)
                0.023934316 = queryNorm
              0.54878575 = fieldWeight in 2848, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.0694656 = idf(docFreq=758, maxDocs=44421)
                0.0625 = fieldNorm(doc=2848)
        0.28 = coord(7/25)