Document (#42482)

Author
Short, M.
Title
Text mining and subject analysis for fiction; or, using machine learning and information extraction to assign subject headings to dime novels
Source
Cataloging and classification quarterly. 57(2019) no.5, S.315-336
Year
2019
Abstract
This article describes multiple experiments in text mining at Northern Illinois University that were undertaken to improve the efficiency and accuracy of cataloging. It focuses narrowly on subject analysis of dime novels, a format of inexpensive fiction that was popular in the United States between 1860 and 1915. NIU holds more than 55,000 dime novels in its collections, which it is in the process of comprehensively digitizing. Classification, keyword extraction, named-entity recognition, clustering, and topic modeling are discussed as means of assigning subject headings to improve their discoverability by researchers and to increase the productivity of digitization workflows.
Content
Vgl.: https://doi.org/10.1080/01639374.2019.1653413.
Theme
Schöne Literatur
Automatisches Indexieren
Data Mining
Inhaltsanalyse

Similar documents (content)

  1. Wolfe, EW.: a case study in automated metadata enhancement : Natural Language Processing in the humanities (2019) 0.16
    0.1626242 = sum of:
      0.1626242 = product of:
        0.6776008 = sum of:
          0.028429527 = weight(abstract_txt:analysis in 236) [ClassicSimilarity], result of:
            0.028429527 = score(doc=236,freq=2.0), product of:
              0.07058081 = queryWeight, product of:
                1.1625013 = boost
                3.6456752 = idf(docFreq=3151, maxDocs=44421)
                0.016653871 = queryNorm
              0.402794 = fieldWeight in 236, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.6456752 = idf(docFreq=3151, maxDocs=44421)
                0.078125 = fieldNorm(doc=236)
          0.038713664 = weight(abstract_txt:text in 236) [ClassicSimilarity], result of:
            0.038713664 = score(doc=236,freq=2.0), product of:
              0.08671277 = queryWeight, product of:
                1.2885215 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.016653871 = queryNorm
              0.4464586 = fieldWeight in 236, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.078125 = fieldNorm(doc=236)
          0.09754536 = weight(abstract_txt:mining in 236) [ClassicSimilarity], result of:
            0.09754536 = score(doc=236,freq=1.0), product of:
              0.20229632 = queryWeight, product of:
                1.9680862 = boost
                6.1720386 = idf(docFreq=251, maxDocs=44421)
                0.016653871 = queryNorm
              0.48219052 = fieldWeight in 236, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1720386 = idf(docFreq=251, maxDocs=44421)
                0.078125 = fieldNorm(doc=236)
          0.098498635 = weight(abstract_txt:extraction in 236) [ClassicSimilarity], result of:
            0.098498635 = score(doc=236,freq=1.0), product of:
              0.20361215 = queryWeight, product of:
                1.9744766 = boost
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.016653871 = queryNorm
              0.48375618 = fieldWeight in 236, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.078125 = fieldNorm(doc=236)
          0.04959783 = weight(abstract_txt:subject in 236) [ClassicSimilarity], result of:
            0.04959783 = score(doc=236,freq=1.0), product of:
              0.16236858 = queryWeight, product of:
                2.4935389 = boost
                3.9099448 = idf(docFreq=2419, maxDocs=44421)
                0.016653871 = queryNorm
              0.30546445 = fieldWeight in 236, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9099448 = idf(docFreq=2419, maxDocs=44421)
                0.078125 = fieldNorm(doc=236)
          0.36481577 = weight(abstract_txt:novels in 236) [ClassicSimilarity], result of:
            0.36481577 = score(doc=236,freq=1.0), product of:
              0.5579514 = queryWeight, product of:
                4.003077 = boost
                8.369263 = idf(docFreq=27, maxDocs=44421)
                0.016653871 = queryNorm
              0.65384865 = fieldWeight in 236, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.369263 = idf(docFreq=27, maxDocs=44421)
                0.078125 = fieldNorm(doc=236)
        0.24 = coord(6/25)
    
  2. Sauperl, A.: Four views of a novel : characteristics of novels as described by publishers, librarians, literary theorists, and readers (2013) 0.15
    0.14803156 = sum of:
      0.14803156 = product of:
        0.92519724 = sum of:
          0.06813107 = weight(abstract_txt:headings in 2952) [ClassicSimilarity], result of:
            0.06813107 = score(doc=2952,freq=1.0), product of:
              0.14102393 = queryWeight, product of:
                1.6432233 = boost
                5.1532483 = idf(docFreq=697, maxDocs=44421)
                0.016653871 = queryNorm
              0.48311704 = fieldWeight in 2952, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1532483 = idf(docFreq=697, maxDocs=44421)
                0.09375 = fieldNorm(doc=2952)
          0.15378304 = weight(abstract_txt:fiction in 2952) [ClassicSimilarity], result of:
            0.15378304 = score(doc=2952,freq=1.0), product of:
              0.24266194 = queryWeight, product of:
                2.1555147 = boost
                6.759825 = idf(docFreq=139, maxDocs=44421)
                0.016653871 = queryNorm
              0.63373363 = fieldWeight in 2952, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.759825 = idf(docFreq=139, maxDocs=44421)
                0.09375 = fieldNorm(doc=2952)
          0.0841703 = weight(abstract_txt:subject in 2952) [ClassicSimilarity], result of:
            0.0841703 = score(doc=2952,freq=2.0), product of:
              0.16236858 = queryWeight, product of:
                2.4935389 = boost
                3.9099448 = idf(docFreq=2419, maxDocs=44421)
                0.016653871 = queryNorm
              0.5183903 = fieldWeight in 2952, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.9099448 = idf(docFreq=2419, maxDocs=44421)
                0.09375 = fieldNorm(doc=2952)
          0.61911285 = weight(abstract_txt:novels in 2952) [ClassicSimilarity], result of:
            0.61911285 = score(doc=2952,freq=2.0), product of:
              0.5579514 = queryWeight, product of:
                4.003077 = boost
                8.369263 = idf(docFreq=27, maxDocs=44421)
                0.016653871 = queryNorm
              1.109618 = fieldWeight in 2952, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.369263 = idf(docFreq=27, maxDocs=44421)
                0.09375 = fieldNorm(doc=2952)
        0.16 = coord(4/25)
    
  3. Moulaison-Sandy, H.; Adkins, D.; Bossaller, J.; Cho, H.: ¬An automated approach to describing fiction : a methodology to use book reviews to identify affect (2021) 0.15
    0.1454167 = sum of:
      0.1454167 = product of:
        0.60590297 = sum of:
          0.024123253 = weight(abstract_txt:analysis in 1711) [ClassicSimilarity], result of:
            0.024123253 = score(doc=1711,freq=1.0), product of:
              0.07058081 = queryWeight, product of:
                1.1625013 = boost
                3.6456752 = idf(docFreq=3151, maxDocs=44421)
                0.016653871 = queryNorm
              0.34178203 = fieldWeight in 1711, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6456752 = idf(docFreq=3151, maxDocs=44421)
                0.09375 = fieldNorm(doc=1711)
          0.046456397 = weight(abstract_txt:text in 1711) [ClassicSimilarity], result of:
            0.046456397 = score(doc=1711,freq=2.0), product of:
              0.08671277 = queryWeight, product of:
                1.2885215 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.016653871 = queryNorm
              0.5357503 = fieldWeight in 1711, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.09375 = fieldNorm(doc=1711)
          0.06813107 = weight(abstract_txt:headings in 1711) [ClassicSimilarity], result of:
            0.06813107 = score(doc=1711,freq=1.0), product of:
              0.14102393 = queryWeight, product of:
                1.6432233 = boost
                5.1532483 = idf(docFreq=697, maxDocs=44421)
                0.016653871 = queryNorm
              0.48311704 = fieldWeight in 1711, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1532483 = idf(docFreq=697, maxDocs=44421)
                0.09375 = fieldNorm(doc=1711)
          0.16553997 = weight(abstract_txt:mining in 1711) [ClassicSimilarity], result of:
            0.16553997 = score(doc=1711,freq=2.0), product of:
              0.20229632 = queryWeight, product of:
                1.9680862 = boost
                6.1720386 = idf(docFreq=251, maxDocs=44421)
                0.016653871 = queryNorm
              0.8183044 = fieldWeight in 1711, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.1720386 = idf(docFreq=251, maxDocs=44421)
                0.09375 = fieldNorm(doc=1711)
          0.21748203 = weight(abstract_txt:fiction in 1711) [ClassicSimilarity], result of:
            0.21748203 = score(doc=1711,freq=2.0), product of:
              0.24266194 = queryWeight, product of:
                2.1555147 = boost
                6.759825 = idf(docFreq=139, maxDocs=44421)
                0.016653871 = queryNorm
              0.89623463 = fieldWeight in 1711, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.759825 = idf(docFreq=139, maxDocs=44421)
                0.09375 = fieldNorm(doc=1711)
          0.0841703 = weight(abstract_txt:subject in 1711) [ClassicSimilarity], result of:
            0.0841703 = score(doc=1711,freq=2.0), product of:
              0.16236858 = queryWeight, product of:
                2.4935389 = boost
                3.9099448 = idf(docFreq=2419, maxDocs=44421)
                0.016653871 = queryNorm
              0.5183903 = fieldWeight in 1711, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.9099448 = idf(docFreq=2419, maxDocs=44421)
                0.09375 = fieldNorm(doc=1711)
        0.24 = coord(6/25)
    
  4. Becnel, K.; Moeller, R.A.: Graphic novels in the school library : questions of cataloging, classification, and arrangement (2022) 0.10
    0.10046774 = sum of:
      0.10046774 = product of:
        0.8372312 = sum of:
          0.12686452 = weight(abstract_txt:fiction in 2109) [ClassicSimilarity], result of:
            0.12686452 = score(doc=2109,freq=2.0), product of:
              0.24266194 = queryWeight, product of:
                2.1555147 = boost
                6.759825 = idf(docFreq=139, maxDocs=44421)
                0.016653871 = queryNorm
              0.52280354 = fieldWeight in 2109, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.759825 = idf(docFreq=139, maxDocs=44421)
                0.0546875 = fieldNorm(doc=2109)
          0.03471848 = weight(abstract_txt:subject in 2109) [ClassicSimilarity], result of:
            0.03471848 = score(doc=2109,freq=1.0), product of:
              0.16236858 = queryWeight, product of:
                2.4935389 = boost
                3.9099448 = idf(docFreq=2419, maxDocs=44421)
                0.016653871 = queryNorm
              0.2138251 = fieldWeight in 2109, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9099448 = idf(docFreq=2419, maxDocs=44421)
                0.0546875 = fieldNorm(doc=2109)
          0.6756482 = weight(abstract_txt:novels in 2109) [ClassicSimilarity], result of:
            0.6756482 = score(doc=2109,freq=7.0), product of:
              0.5579514 = queryWeight, product of:
                4.003077 = boost
                8.369263 = idf(docFreq=27, maxDocs=44421)
                0.016653871 = queryNorm
              1.2109445 = fieldWeight in 2109, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                8.369263 = idf(docFreq=27, maxDocs=44421)
                0.0546875 = fieldNorm(doc=2109)
        0.12 = coord(3/25)
    
  5. Lowe, D.B.; Dollinger, I.; Koster, T.; Herbert, B.E.: Text mining for type of research classification (2021) 0.10
    0.096945934 = sum of:
      0.096945934 = product of:
        0.48472965 = sum of:
          0.09317677 = weight(abstract_txt:workflows in 1721) [ClassicSimilarity], result of:
            0.09317677 = score(doc=1721,freq=1.0), product of:
              0.15573229 = queryWeight, product of:
                1.2210248 = boost
                7.6584163 = idf(docFreq=56, maxDocs=44421)
                0.016653871 = queryNorm
              0.59831375 = fieldWeight in 1721, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.6584163 = idf(docFreq=56, maxDocs=44421)
                0.078125 = fieldNorm(doc=1721)
          0.027374694 = weight(abstract_txt:text in 1721) [ClassicSimilarity], result of:
            0.027374694 = score(doc=1721,freq=1.0), product of:
              0.08671277 = queryWeight, product of:
                1.2885215 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.016653871 = queryNorm
              0.3156939 = fieldWeight in 1721, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.078125 = fieldNorm(doc=1721)
          0.1446789 = weight(abstract_txt:discoverability in 1721) [ClassicSimilarity], result of:
            0.1446789 = score(doc=1721,freq=1.0), product of:
              0.20882237 = queryWeight, product of:
                1.4139162 = boost
                8.868255 = idf(docFreq=16, maxDocs=44421)
                0.016653871 = queryNorm
              0.6928324 = fieldWeight in 1721, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.868255 = idf(docFreq=16, maxDocs=44421)
                0.078125 = fieldNorm(doc=1721)
          0.05054577 = weight(abstract_txt:improve in 1721) [ClassicSimilarity], result of:
            0.05054577 = score(doc=1721,freq=1.0), product of:
              0.13050888 = queryWeight, product of:
                1.5807755 = boost
                4.9574084 = idf(docFreq=848, maxDocs=44421)
                0.016653871 = queryNorm
              0.38729754 = fieldWeight in 1721, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9574084 = idf(docFreq=848, maxDocs=44421)
                0.078125 = fieldNorm(doc=1721)
          0.16895352 = weight(abstract_txt:mining in 1721) [ClassicSimilarity], result of:
            0.16895352 = score(doc=1721,freq=3.0), product of:
              0.20229632 = queryWeight, product of:
                1.9680862 = boost
                6.1720386 = idf(docFreq=251, maxDocs=44421)
                0.016653871 = queryNorm
              0.83517843 = fieldWeight in 1721, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.1720386 = idf(docFreq=251, maxDocs=44421)
                0.078125 = fieldNorm(doc=1721)
        0.2 = coord(5/25)