Document (#42750)

Author
Morris, V.
Title
Automated language identification of bibliographic resources
Source
Cataloging and classification quarterly. 58(2020) no.1, S.1-27
Year
2020
Abstract
This article describes experiments in the use of machine learning techniques at the British Library to assign language codes to catalog records, in order to provide information about the language of content of the resources described. In the first phase of the project, language codes were assigned to 1.15 million records with 99.7% confidence. The automated language identification tools developed will be used to contribute to future enhancement of over 4 million legacy records.
Content
Vgl.: https://doi.org/10.1080/01639374.2019.1700201.
Theme
Formalerschließung
Computerlinguistik
Location
GB

Similar documents (author)

  1. Morris, L.R.: ¬The frequency of use of Library of Congress Classification numbers and Dewey Decimal Classification numbers in the MARC file in the field of library science (1991) 4.96
    4.9626675 = sum of:
      4.9626675 = weight(author_txt:morris in 2307) [ClassicSimilarity], result of:
        4.9626675 = fieldWeight in 2307, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.9402676 = idf(docFreq=42, maxDocs=44421)
          0.625 = fieldNorm(doc=2307)
    
  2. Morris, S.: Metadata and rights (2000) 4.96
    4.9626675 = sum of:
      4.9626675 = weight(author_txt:morris in 2627) [ClassicSimilarity], result of:
        4.9626675 = fieldWeight in 2627, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.9402676 = idf(docFreq=42, maxDocs=44421)
          0.625 = fieldNorm(doc=2627)
    
  3. Morris, K.: Software reviews: RediReference Plus (1990) 4.96
    4.9626675 = sum of:
      4.9626675 = weight(author_txt:morris in 3225) [ClassicSimilarity], result of:
        4.9626675 = fieldWeight in 3225, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.9402676 = idf(docFreq=42, maxDocs=44421)
          0.625 = fieldNorm(doc=3225)
    
  4. Morris, L.R.: Choosing a bibliographic utility (1989) 4.96
    4.9626675 = sum of:
      4.9626675 = weight(author_txt:morris in 3726) [ClassicSimilarity], result of:
        4.9626675 = fieldWeight in 3726, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.9402676 = idf(docFreq=42, maxDocs=44421)
          0.625 = fieldNorm(doc=3726)
    
  5. Morris, S.A.: Mapping research specialties (2008) 4.96
    4.9626675 = sum of:
      4.9626675 = weight(author_txt:morris in 3961) [ClassicSimilarity], result of:
        4.9626675 = fieldWeight in 3961, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.9402676 = idf(docFreq=42, maxDocs=44421)
          0.625 = fieldNorm(doc=3961)
    

Similar documents (content)

  1. Bardenheier, P.; Wilkinson, E.H.; Dale, H.: Ki te Tika te Hanga, Ka Pakari te Kete : with the right structure we weave a strong basket (2015) 0.17
    0.165629 = sum of:
      0.165629 = product of:
        0.6901208 = sum of:
          0.03352743 = weight(abstract_txt:project in 3176) [ClassicSimilarity], result of:
            0.03352743 = score(doc=3176,freq=1.0), product of:
              0.08165688 = queryWeight, product of:
                1.0322702 = boost
                4.3796177 = idf(docFreq=1512, maxDocs=44421)
                0.01806189 = queryNorm
              0.41058916 = fieldWeight in 3176, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3796177 = idf(docFreq=1512, maxDocs=44421)
                0.09375 = fieldNorm(doc=3176)
          0.049186543 = weight(abstract_txt:described in 3176) [ClassicSimilarity], result of:
            0.049186543 = score(doc=3176,freq=1.0), product of:
              0.105428204 = queryWeight, product of:
                1.172939 = boost
                4.9764338 = idf(docFreq=832, maxDocs=44421)
                0.01806189 = queryNorm
              0.46654066 = fieldWeight in 3176, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9764338 = idf(docFreq=832, maxDocs=44421)
                0.09375 = fieldNorm(doc=3176)
          0.143229 = weight(abstract_txt:enhancement in 3176) [ClassicSimilarity], result of:
            0.143229 = score(doc=3176,freq=1.0), product of:
              0.21498752 = queryWeight, product of:
                1.6749568 = boost
                7.1063476 = idf(docFreq=98, maxDocs=44421)
                0.01806189 = queryNorm
              0.66622007 = fieldWeight in 3176, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.1063476 = idf(docFreq=98, maxDocs=44421)
                0.09375 = fieldNorm(doc=3176)
          0.15570845 = weight(abstract_txt:assign in 3176) [ClassicSimilarity], result of:
            0.15570845 = score(doc=3176,freq=1.0), product of:
              0.22730067 = queryWeight, product of:
                1.7222546 = boost
                7.3070183 = idf(docFreq=80, maxDocs=44421)
                0.01806189 = queryNorm
              0.68503296 = fieldWeight in 3176, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3070183 = idf(docFreq=80, maxDocs=44421)
                0.09375 = fieldNorm(doc=3176)
          0.1036867 = weight(abstract_txt:records in 3176) [ClassicSimilarity], result of:
            0.1036867 = score(doc=3176,freq=1.0), product of:
              0.24998564 = queryWeight, product of:
                3.1283486 = boost
                4.42422 = idf(docFreq=1446, maxDocs=44421)
                0.01806189 = queryNorm
              0.41477063 = fieldWeight in 3176, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.42422 = idf(docFreq=1446, maxDocs=44421)
                0.09375 = fieldNorm(doc=3176)
          0.20478274 = weight(abstract_txt:language in 3176) [ClassicSimilarity], result of:
            0.20478274 = score(doc=3176,freq=2.0), product of:
              0.3703123 = queryWeight, product of:
                4.9154816 = boost
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.01806189 = queryNorm
              0.5530001 = fieldWeight in 3176, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.09375 = fieldNorm(doc=3176)
        0.24 = coord(6/25)
    
  2. Parka, A.L.; Panchyshyn, R.S.: ¬The path to an RDA hybridized catalog : lessons from the Kent State University Libraries' RDA enrichment project (2016) 0.16
    0.15782638 = sum of:
      0.15782638 = product of:
        0.65760994 = sum of:
          0.03048036 = weight(abstract_txt:over in 3632) [ClassicSimilarity], result of:
            0.03048036 = score(doc=3632,freq=1.0), product of:
              0.07663126 = queryWeight, product of:
                4.242705 = idf(docFreq=1734, maxDocs=44421)
                0.01806189 = queryNorm
              0.3977536 = fieldWeight in 3632, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.242705 = idf(docFreq=1734, maxDocs=44421)
                0.09375 = fieldNorm(doc=3632)
          0.05807121 = weight(abstract_txt:project in 3632) [ClassicSimilarity], result of:
            0.05807121 = score(doc=3632,freq=3.0), product of:
              0.08165688 = queryWeight, product of:
                1.0322702 = boost
                4.3796177 = idf(docFreq=1512, maxDocs=44421)
                0.01806189 = queryNorm
              0.71116126 = fieldWeight in 3632, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.3796177 = idf(docFreq=1512, maxDocs=44421)
                0.09375 = fieldNorm(doc=3632)
          0.17125852 = weight(abstract_txt:legacy in 3632) [ClassicSimilarity], result of:
            0.17125852 = score(doc=3632,freq=1.0), product of:
              0.24219252 = queryWeight, product of:
                1.7777773 = boost
                7.5425844 = idf(docFreq=63, maxDocs=44421)
                0.01806189 = queryNorm
              0.7071173 = fieldWeight in 3632, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5425844 = idf(docFreq=63, maxDocs=44421)
                0.09375 = fieldNorm(doc=3632)
          0.060152173 = weight(abstract_txt:resources in 3632) [ClassicSimilarity], result of:
            0.060152173 = score(doc=3632,freq=1.0), product of:
              0.15190433 = queryWeight, product of:
                1.9911183 = boost
                4.2238636 = idf(docFreq=1767, maxDocs=44421)
                0.01806189 = queryNorm
              0.3959872 = fieldWeight in 3632, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2238636 = idf(docFreq=1767, maxDocs=44421)
                0.09375 = fieldNorm(doc=3632)
          0.19101258 = weight(abstract_txt:million in 3632) [ClassicSimilarity], result of:
            0.19101258 = score(doc=3632,freq=1.0), product of:
              0.32817885 = queryWeight, product of:
                2.9266264 = boost
                6.208406 = idf(docFreq=242, maxDocs=44421)
                0.01806189 = queryNorm
              0.58203804 = fieldWeight in 3632, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.208406 = idf(docFreq=242, maxDocs=44421)
                0.09375 = fieldNorm(doc=3632)
          0.14663514 = weight(abstract_txt:records in 3632) [ClassicSimilarity], result of:
            0.14663514 = score(doc=3632,freq=2.0), product of:
              0.24998564 = queryWeight, product of:
                3.1283486 = boost
                4.42422 = idf(docFreq=1446, maxDocs=44421)
                0.01806189 = queryNorm
              0.58657426 = fieldWeight in 3632, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.42422 = idf(docFreq=1446, maxDocs=44421)
                0.09375 = fieldNorm(doc=3632)
        0.24 = coord(6/25)
    
  3. Fischer, T.; Neuroth, H.: SSG-FI - special subject gateways to high quality Internet resources for scientific users (2000) 0.13
    0.13448054 = sum of:
      0.13448054 = product of:
        0.4802876 = sum of:
          0.0254003 = weight(abstract_txt:over in 5873) [ClassicSimilarity], result of:
            0.0254003 = score(doc=5873,freq=1.0), product of:
              0.07663126 = queryWeight, product of:
                4.242705 = idf(docFreq=1734, maxDocs=44421)
                0.01806189 = queryNorm
              0.3314613 = fieldWeight in 5873, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.242705 = idf(docFreq=1734, maxDocs=44421)
                0.078125 = fieldNorm(doc=5873)
          0.048392676 = weight(abstract_txt:project in 5873) [ClassicSimilarity], result of:
            0.048392676 = score(doc=5873,freq=3.0), product of:
              0.08165688 = queryWeight, product of:
                1.0322702 = boost
                4.3796177 = idf(docFreq=1512, maxDocs=44421)
                0.01806189 = queryNorm
              0.5926344 = fieldWeight in 5873, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.3796177 = idf(docFreq=1512, maxDocs=44421)
                0.078125 = fieldNorm(doc=5873)
          0.040988784 = weight(abstract_txt:described in 5873) [ClassicSimilarity], result of:
            0.040988784 = score(doc=5873,freq=1.0), product of:
              0.105428204 = queryWeight, product of:
                1.172939 = boost
                4.9764338 = idf(docFreq=832, maxDocs=44421)
                0.01806189 = queryNorm
              0.38878387 = fieldWeight in 5873, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9764338 = idf(docFreq=832, maxDocs=44421)
                0.078125 = fieldNorm(doc=5873)
          0.0697963 = weight(abstract_txt:contribute in 5873) [ClassicSimilarity], result of:
            0.0697963 = score(doc=5873,freq=1.0), product of:
              0.15033786 = queryWeight, product of:
                1.400655 = boost
                5.942566 = idf(docFreq=316, maxDocs=44421)
                0.01806189 = queryNorm
              0.46426296 = fieldWeight in 5873, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.942566 = idf(docFreq=316, maxDocs=44421)
                0.078125 = fieldNorm(doc=5873)
          0.05012681 = weight(abstract_txt:resources in 5873) [ClassicSimilarity], result of:
            0.05012681 = score(doc=5873,freq=1.0), product of:
              0.15190433 = queryWeight, product of:
                1.9911183 = boost
                4.2238636 = idf(docFreq=1767, maxDocs=44421)
                0.01806189 = queryNorm
              0.32998934 = fieldWeight in 5873, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2238636 = idf(docFreq=1767, maxDocs=44421)
                0.078125 = fieldNorm(doc=5873)
          0.15917715 = weight(abstract_txt:million in 5873) [ClassicSimilarity], result of:
            0.15917715 = score(doc=5873,freq=1.0), product of:
              0.32817885 = queryWeight, product of:
                2.9266264 = boost
                6.208406 = idf(docFreq=242, maxDocs=44421)
                0.01806189 = queryNorm
              0.48503172 = fieldWeight in 5873, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.208406 = idf(docFreq=242, maxDocs=44421)
                0.078125 = fieldNorm(doc=5873)
          0.08640559 = weight(abstract_txt:records in 5873) [ClassicSimilarity], result of:
            0.08640559 = score(doc=5873,freq=1.0), product of:
              0.24998564 = queryWeight, product of:
                3.1283486 = boost
                4.42422 = idf(docFreq=1446, maxDocs=44421)
                0.01806189 = queryNorm
              0.3456422 = fieldWeight in 5873, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.42422 = idf(docFreq=1446, maxDocs=44421)
                0.078125 = fieldNorm(doc=5873)
        0.28 = coord(7/25)
    
  4. Kent, C.; Deliot, C.; Martyn, C.: Management information from classification : methods of collection analysis using DDC (2008) 0.13
    0.12866756 = sum of:
      0.12866756 = product of:
        0.45952702 = sum of:
          0.0254003 = weight(abstract_txt:over in 3165) [ClassicSimilarity], result of:
            0.0254003 = score(doc=3165,freq=1.0), product of:
              0.07663126 = queryWeight, product of:
                4.242705 = idf(docFreq=1734, maxDocs=44421)
                0.01806189 = queryNorm
              0.3314613 = fieldWeight in 3165, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.242705 = idf(docFreq=1734, maxDocs=44421)
                0.078125 = fieldNorm(doc=3165)
          0.027939525 = weight(abstract_txt:project in 3165) [ClassicSimilarity], result of:
            0.027939525 = score(doc=3165,freq=1.0), product of:
              0.08165688 = queryWeight, product of:
                1.0322702 = boost
                4.3796177 = idf(docFreq=1512, maxDocs=44421)
                0.01806189 = queryNorm
              0.34215763 = fieldWeight in 3165, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3796177 = idf(docFreq=1512, maxDocs=44421)
                0.078125 = fieldNorm(doc=3165)
          0.04881717 = weight(abstract_txt:machine in 3165) [ClassicSimilarity], result of:
            0.04881717 = score(doc=3165,freq=1.0), product of:
              0.1184573 = queryWeight, product of:
                1.2433057 = boost
                5.274979 = idf(docFreq=617, maxDocs=44421)
                0.01806189 = queryNorm
              0.41210774 = fieldWeight in 3165, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.274979 = idf(docFreq=617, maxDocs=44421)
                0.078125 = fieldNorm(doc=3165)
          0.069685385 = weight(abstract_txt:british in 3165) [ClassicSimilarity], result of:
            0.069685385 = score(doc=3165,freq=1.0), product of:
              0.15017855 = queryWeight, product of:
                1.3999127 = boost
                5.9394164 = idf(docFreq=317, maxDocs=44421)
                0.01806189 = queryNorm
              0.4640169 = fieldWeight in 3165, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9394164 = idf(docFreq=317, maxDocs=44421)
                0.078125 = fieldNorm(doc=3165)
          0.05012681 = weight(abstract_txt:resources in 3165) [ClassicSimilarity], result of:
            0.05012681 = score(doc=3165,freq=1.0), product of:
              0.15190433 = queryWeight, product of:
                1.9911183 = boost
                4.2238636 = idf(docFreq=1767, maxDocs=44421)
                0.01806189 = queryNorm
              0.32998934 = fieldWeight in 3165, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2238636 = idf(docFreq=1767, maxDocs=44421)
                0.078125 = fieldNorm(doc=3165)
          0.11688846 = weight(abstract_txt:automated in 3165) [ClassicSimilarity], result of:
            0.11688846 = score(doc=3165,freq=1.0), product of:
              0.26711884 = queryWeight, product of:
                2.6403668 = boost
                5.6011486 = idf(docFreq=445, maxDocs=44421)
                0.01806189 = queryNorm
              0.43758973 = fieldWeight in 3165, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6011486 = idf(docFreq=445, maxDocs=44421)
                0.078125 = fieldNorm(doc=3165)
          0.12066938 = weight(abstract_txt:language in 3165) [ClassicSimilarity], result of:
            0.12066938 = score(doc=3165,freq=1.0), product of:
              0.3703123 = queryWeight, product of:
                4.9154816 = boost
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.01806189 = queryNorm
              0.3258584 = fieldWeight in 3165, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.078125 = fieldNorm(doc=3165)
        0.28 = coord(7/25)
    
  5. Hodges, D.W.; Schlottmann, K.: better archival migration outcomes with Python and the Google Sheets API : Reporting from the archives (2019) 0.13
    0.12648325 = sum of:
      0.12648325 = product of:
        0.4517259 = sum of:
          0.02032024 = weight(abstract_txt:over in 444) [ClassicSimilarity], result of:
            0.02032024 = score(doc=444,freq=1.0), product of:
              0.07663126 = queryWeight, product of:
                4.242705 = idf(docFreq=1734, maxDocs=44421)
                0.01806189 = queryNorm
              0.26516905 = fieldWeight in 444, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.242705 = idf(docFreq=1734, maxDocs=44421)
                0.0625 = fieldNorm(doc=444)
          0.03871414 = weight(abstract_txt:project in 444) [ClassicSimilarity], result of:
            0.03871414 = score(doc=444,freq=3.0), product of:
              0.08165688 = queryWeight, product of:
                1.0322702 = boost
                4.3796177 = idf(docFreq=1512, maxDocs=44421)
                0.01806189 = queryNorm
              0.4741075 = fieldWeight in 444, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.3796177 = idf(docFreq=1512, maxDocs=44421)
                0.0625 = fieldNorm(doc=444)
          0.023426171 = weight(abstract_txt:tools in 444) [ClassicSimilarity], result of:
            0.023426171 = score(doc=444,freq=1.0), product of:
              0.084253445 = queryWeight, product of:
                1.0485541 = boost
                4.448705 = idf(docFreq=1411, maxDocs=44421)
                0.01806189 = queryNorm
              0.27804407 = fieldWeight in 444, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.448705 = idf(docFreq=1411, maxDocs=44421)
                0.0625 = fieldNorm(doc=444)
          0.09245776 = weight(abstract_txt:phase in 444) [ClassicSimilarity], result of:
            0.09245776 = score(doc=444,freq=2.0), product of:
              0.16700867 = queryWeight, product of:
                1.4762725 = boost
                6.263388 = idf(docFreq=229, maxDocs=44421)
                0.01806189 = queryNorm
              0.5536105 = fieldWeight in 444, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.263388 = idf(docFreq=229, maxDocs=44421)
                0.0625 = fieldNorm(doc=444)
          0.11417235 = weight(abstract_txt:legacy in 444) [ClassicSimilarity], result of:
            0.11417235 = score(doc=444,freq=1.0), product of:
              0.24219252 = queryWeight, product of:
                1.7777773 = boost
                7.5425844 = idf(docFreq=63, maxDocs=44421)
                0.01806189 = queryNorm
              0.47141153 = fieldWeight in 444, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5425844 = idf(docFreq=63, maxDocs=44421)
                0.0625 = fieldNorm(doc=444)
          0.09351077 = weight(abstract_txt:automated in 444) [ClassicSimilarity], result of:
            0.09351077 = score(doc=444,freq=1.0), product of:
              0.26711884 = queryWeight, product of:
                2.6403668 = boost
                5.6011486 = idf(docFreq=445, maxDocs=44421)
                0.01806189 = queryNorm
              0.3500718 = fieldWeight in 444, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6011486 = idf(docFreq=445, maxDocs=44421)
                0.0625 = fieldNorm(doc=444)
          0.06912447 = weight(abstract_txt:records in 444) [ClassicSimilarity], result of:
            0.06912447 = score(doc=444,freq=1.0), product of:
              0.24998564 = queryWeight, product of:
                3.1283486 = boost
                4.42422 = idf(docFreq=1446, maxDocs=44421)
                0.01806189 = queryNorm
              0.27651376 = fieldWeight in 444, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.42422 = idf(docFreq=1446, maxDocs=44421)
                0.0625 = fieldNorm(doc=444)
        0.28 = coord(7/25)