Document (#42721)

Author
Lynch, J.D.
Gibson, J.
Han, M.-J.
Title
Analyzing and normalizing type metadata for a large aggregated digital library
Source
Code4Lib journal. Issue 47(2020), [http://journal.code4lib.org]
Year
2020
Abstract
The Illinois Digital Heritage Hub (IDHH) gathers and enhances metadata from contributing institutions around the state of Illinois and provides this metadata to th Digital Public Library of America (DPLA) for greater access. The IDHH helps contributors shape their metadata to the standards recommended and required by the DPLA in part by analyzing and enhancing aggregated metadata. In late 2018, the IDHH undertook a project to address a particularly problematic field, Type metadata. This paper walks through the project, detailing the process of gathering and analyzing metadata using the DPLA API and OpenRefine, data remediation through XSL transformations in conjunction with local improvements by contributing institutions, and the DPLA ingestion system's quality controls.
Content
Vgl.: https://journal.code4lib.org/articles/14995.
Theme
Metadaten

Similar documents (author)

  1. Gibson, J.J.: Wahrnehmung und Umwelt : der ökologische Ansatz in der visuellen Wahrnehmung (1982) 2.24
    2.2414484 = sum of:
      2.2414484 = product of:
        4.482897 = sum of:
          4.482897 = weight(author_txt:gibson in 2558) [ClassicSimilarity], result of:
            4.482897 = score(doc=2558,freq=1.0), product of:
              0.79147106 = queryWeight, product of:
                1.1379508 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.07674813 = queryNorm
              5.664006 = fieldWeight in 2558, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.625 = fieldNorm(doc=2558)
        0.5 = coord(1/2)
    
  2. Gibson, P.: Professionals' perfect Web world in sight : users want more information on the Web, and vendors attempt to provide (1998) 2.24
    2.2414484 = sum of:
      2.2414484 = product of:
        4.482897 = sum of:
          4.482897 = weight(author_txt:gibson in 2656) [ClassicSimilarity], result of:
            4.482897 = score(doc=2656,freq=1.0), product of:
              0.79147106 = queryWeight, product of:
                1.1379508 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.07674813 = queryNorm
              5.664006 = fieldWeight in 2656, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.625 = fieldNorm(doc=2656)
        0.5 = coord(1/2)
    
  3. Gibson, P.: Navigating the Internet road to riches (1998) 2.24
    2.2414484 = sum of:
      2.2414484 = product of:
        4.482897 = sum of:
          4.482897 = weight(author_txt:gibson in 4521) [ClassicSimilarity], result of:
            4.482897 = score(doc=4521,freq=1.0), product of:
              0.79147106 = queryWeight, product of:
                1.1379508 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.07674813 = queryNorm
              5.664006 = fieldWeight in 4521, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.625 = fieldNorm(doc=4521)
        0.5 = coord(1/2)
    
  4. Gibson, P.: HotBot's future is in Lycos' hands : users hope that the search engine won't be hobbled by an acquisition (1999) 2.24
    2.2414484 = sum of:
      2.2414484 = product of:
        4.482897 = sum of:
          4.482897 = weight(author_txt:gibson in 6195) [ClassicSimilarity], result of:
            4.482897 = score(doc=6195,freq=1.0), product of:
              0.79147106 = queryWeight, product of:
                1.1379508 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.07674813 = queryNorm
              5.664006 = fieldWeight in 6195, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.625 = fieldNorm(doc=6195)
        0.5 = coord(1/2)
    
  5. Gibson, R.; Ward, S.: ¬A proposed methodology for studying the function and effectiveness of party and candidate Web sites (2000) 1.79
    1.7931589 = sum of:
      1.7931589 = product of:
        3.5863178 = sum of:
          3.5863178 = weight(author_txt:gibson in 4335) [ClassicSimilarity], result of:
            3.5863178 = score(doc=4335,freq=1.0), product of:
              0.79147106 = queryWeight, product of:
                1.1379508 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.07674813 = queryNorm
              4.531205 = fieldWeight in 4335, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.5 = fieldNorm(doc=4335)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Shreeves, S.L.; Kaczmarek, J.S.; Cole, T.W.: Harvesting cultural heritage metadata using OAI Protocol (2003) 0.19
    0.19281785 = sum of:
      0.19281785 = product of:
        0.9640893 = sum of:
          0.12234387 = weight(abstract_txt:undertook in 5775) [ClassicSimilarity], result of:
            0.12234387 = score(doc=5775,freq=1.0), product of:
              0.18281654 = queryWeight, product of:
                1.2219439 = boost
                8.565973 = idf(docFreq=22, maxDocs=44421)
                0.017465761 = queryNorm
              0.66921663 = fieldWeight in 5775, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.565973 = idf(docFreq=22, maxDocs=44421)
                0.078125 = fieldNorm(doc=5775)
          0.065406464 = weight(abstract_txt:project in 5775) [ClassicSimilarity], result of:
            0.065406464 = score(doc=5775,freq=4.0), product of:
              0.09557943 = queryWeight, product of:
                1.2495129 = boost
                4.3796177 = idf(docFreq=1512, maxDocs=44421)
                0.017465761 = queryNorm
              0.68431526 = fieldWeight in 5775, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.3796177 = idf(docFreq=1512, maxDocs=44421)
                0.078125 = fieldNorm(doc=5775)
          0.06692512 = weight(abstract_txt:digital in 5775) [ClassicSimilarity], result of:
            0.06692512 = score(doc=5775,freq=2.0), product of:
              0.13997503 = queryWeight, product of:
                1.8519508 = boost
                4.3274655 = idf(docFreq=1593, maxDocs=44421)
                0.017465761 = queryNorm
              0.4781219 = fieldWeight in 5775, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3274655 = idf(docFreq=1593, maxDocs=44421)
                0.078125 = fieldNorm(doc=5775)
          0.2617421 = weight(abstract_txt:illinois in 5775) [ClassicSimilarity], result of:
            0.2617421 = score(doc=5775,freq=3.0), product of:
              0.26516283 = queryWeight, product of:
                2.081205 = boost
                7.2947483 = idf(docFreq=81, maxDocs=44421)
                0.017465761 = queryNorm
              0.9870995 = fieldWeight in 5775, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.2947483 = idf(docFreq=81, maxDocs=44421)
                0.078125 = fieldNorm(doc=5775)
          0.44767177 = weight(abstract_txt:metadata in 5775) [ClassicSimilarity], result of:
            0.44767177 = score(doc=5775,freq=8.0), product of:
              0.41521195 = queryWeight, product of:
                4.8722267 = boost
                4.87927 = idf(docFreq=917, maxDocs=44421)
                0.017465761 = queryNorm
              1.0781765 = fieldWeight in 5775, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                4.87927 = idf(docFreq=917, maxDocs=44421)
                0.078125 = fieldNorm(doc=5775)
        0.2 = coord(5/25)
    
  2. Isaac, A.; Raemy, J.A.; Meijers, E.; Valk, S. De; Freire, N.: Metadata aggregation via linked data : results of the Europeana Common Culture project (2020) 0.15
    0.15384741 = sum of:
      0.15384741 = product of:
        0.6410309 = sum of:
          0.024911614 = weight(abstract_txt:through in 1040) [ClassicSimilarity], result of:
            0.024911614 = score(doc=1040,freq=1.0), product of:
              0.07972085 = queryWeight, product of:
                1.1411545 = boost
                3.9998152 = idf(docFreq=2211, maxDocs=44421)
                0.017465761 = queryNorm
              0.31248558 = fieldWeight in 1040, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9998152 = idf(docFreq=2211, maxDocs=44421)
                0.078125 = fieldNorm(doc=1040)
          0.032703232 = weight(abstract_txt:project in 1040) [ClassicSimilarity], result of:
            0.032703232 = score(doc=1040,freq=1.0), product of:
              0.09557943 = queryWeight, product of:
                1.2495129 = boost
                4.3796177 = idf(docFreq=1512, maxDocs=44421)
                0.017465761 = queryNorm
              0.34215763 = fieldWeight in 1040, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3796177 = idf(docFreq=1512, maxDocs=44421)
                0.078125 = fieldNorm(doc=1040)
          0.086460516 = weight(abstract_txt:institutions in 1040) [ClassicSimilarity], result of:
            0.086460516 = score(doc=1040,freq=2.0), product of:
              0.14504604 = queryWeight, product of:
                1.5392581 = boost
                5.395192 = idf(docFreq=547, maxDocs=44421)
                0.017465761 = queryNorm
              0.59609014 = fieldWeight in 1040, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.395192 = idf(docFreq=547, maxDocs=44421)
                0.078125 = fieldNorm(doc=1040)
          0.06692512 = weight(abstract_txt:digital in 1040) [ClassicSimilarity], result of:
            0.06692512 = score(doc=1040,freq=2.0), product of:
              0.13997503 = queryWeight, product of:
                1.8519508 = boost
                4.3274655 = idf(docFreq=1593, maxDocs=44421)
                0.017465761 = queryNorm
              0.4781219 = fieldWeight in 1040, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3274655 = idf(docFreq=1593, maxDocs=44421)
                0.078125 = fieldNorm(doc=1040)
          0.1558886 = weight(abstract_txt:aggregated in 1040) [ClassicSimilarity], result of:
            0.1558886 = score(doc=1040,freq=1.0), product of:
              0.27071577 = queryWeight, product of:
                2.1028838 = boost
                7.370734 = idf(docFreq=75, maxDocs=44421)
                0.017465761 = queryNorm
              0.5758386 = fieldWeight in 1040, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.370734 = idf(docFreq=75, maxDocs=44421)
                0.078125 = fieldNorm(doc=1040)
          0.27414185 = weight(abstract_txt:metadata in 1040) [ClassicSimilarity], result of:
            0.27414185 = score(doc=1040,freq=3.0), product of:
              0.41521195 = queryWeight, product of:
                4.8722267 = boost
                4.87927 = idf(docFreq=917, maxDocs=44421)
                0.017465761 = queryNorm
              0.66024554 = fieldWeight in 1040, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.87927 = idf(docFreq=917, maxDocs=44421)
                0.078125 = fieldNorm(doc=1040)
        0.24 = coord(6/25)
    
  3. McElfresh, L.K.: Creator name standardization using faceted vocabularies in the BTAA geoportal : Michigan State University libraries digital repository case study (2023) 0.12
    0.12411932 = sum of:
      0.12411932 = product of:
        0.77574575 = sum of:
          0.055499226 = weight(abstract_txt:project in 2180) [ClassicSimilarity], result of:
            0.055499226 = score(doc=2180,freq=2.0), product of:
              0.09557943 = queryWeight, product of:
                1.2495129 = boost
                4.3796177 = idf(docFreq=1512, maxDocs=44421)
                0.017465761 = queryNorm
              0.58066076 = fieldWeight in 2180, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3796177 = idf(docFreq=1512, maxDocs=44421)
                0.09375 = fieldNorm(doc=2180)
          0.28359655 = weight(abstract_txt:openrefine in 2180) [ClassicSimilarity], result of:
            0.28359655 = score(doc=2180,freq=2.0), product of:
              0.22505939 = queryWeight, product of:
                1.355789 = boost
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.017465761 = queryNorm
              1.2600964 = fieldWeight in 2180, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.09375 = fieldNorm(doc=2180)
          0.056787856 = weight(abstract_txt:digital in 2180) [ClassicSimilarity], result of:
            0.056787856 = score(doc=2180,freq=1.0), product of:
              0.13997503 = queryWeight, product of:
                1.8519508 = boost
                4.3274655 = idf(docFreq=1593, maxDocs=44421)
                0.017465761 = queryNorm
              0.4056999 = fieldWeight in 2180, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3274655 = idf(docFreq=1593, maxDocs=44421)
                0.09375 = fieldNorm(doc=2180)
          0.3798621 = weight(abstract_txt:metadata in 2180) [ClassicSimilarity], result of:
            0.3798621 = score(doc=2180,freq=4.0), product of:
              0.41521195 = queryWeight, product of:
                4.8722267 = boost
                4.87927 = idf(docFreq=917, maxDocs=44421)
                0.017465761 = queryNorm
              0.9148631 = fieldWeight in 2180, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.87927 = idf(docFreq=917, maxDocs=44421)
                0.09375 = fieldNorm(doc=2180)
        0.16 = coord(4/25)
    
  4. Stevens, G.: New metadata recipes for old cookbooks : creating and analyzing a digital collection using the HathiTrust Research Center Portal (2017) 0.12
    0.122086264 = sum of:
      0.122086264 = product of:
        0.6104313 = sum of:
          0.045314927 = weight(abstract_txt:project in 4897) [ClassicSimilarity], result of:
            0.045314927 = score(doc=4897,freq=3.0), product of:
              0.09557943 = queryWeight, product of:
                1.2495129 = boost
                4.3796177 = idf(docFreq=1512, maxDocs=44421)
                0.017465761 = queryNorm
              0.4741075 = fieldWeight in 4897, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.3796177 = idf(docFreq=1512, maxDocs=44421)
                0.0625 = fieldNorm(doc=4897)
          0.13368869 = weight(abstract_txt:openrefine in 4897) [ClassicSimilarity], result of:
            0.13368869 = score(doc=4897,freq=1.0), product of:
              0.22505939 = queryWeight, product of:
                1.355789 = boost
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.017465761 = queryNorm
              0.5940152 = fieldWeight in 4897, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.0625 = fieldNorm(doc=4897)
          0.075717136 = weight(abstract_txt:digital in 4897) [ClassicSimilarity], result of:
            0.075717136 = score(doc=4897,freq=4.0), product of:
              0.13997503 = queryWeight, product of:
                1.8519508 = boost
                4.3274655 = idf(docFreq=1593, maxDocs=44421)
                0.017465761 = queryNorm
              0.5409332 = fieldWeight in 4897, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.3274655 = idf(docFreq=1593, maxDocs=44421)
                0.0625 = fieldNorm(doc=4897)
          0.17664188 = weight(abstract_txt:analyzing in 4897) [ClassicSimilarity], result of:
            0.17664188 = score(doc=4897,freq=3.0), product of:
              0.27099615 = queryWeight, product of:
                2.5768297 = boost
                6.021295 = idf(docFreq=292, maxDocs=44421)
                0.017465761 = queryNorm
              0.6518243 = fieldWeight in 4897, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.021295 = idf(docFreq=292, maxDocs=44421)
                0.0625 = fieldNorm(doc=4897)
          0.1790687 = weight(abstract_txt:metadata in 4897) [ClassicSimilarity], result of:
            0.1790687 = score(doc=4897,freq=2.0), product of:
              0.41521195 = queryWeight, product of:
                4.8722267 = boost
                4.87927 = idf(docFreq=917, maxDocs=44421)
                0.017465761 = queryNorm
              0.4312706 = fieldWeight in 4897, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.87927 = idf(docFreq=917, maxDocs=44421)
                0.0625 = fieldNorm(doc=4897)
        0.2 = coord(5/25)
    
  5. Valentino, M.L.: Integrating metadata creation into catalog workflow (2010) 0.11
    0.114013985 = sum of:
      0.114013985 = product of:
        0.5700699 = sum of:
          0.029893937 = weight(abstract_txt:through in 160) [ClassicSimilarity], result of:
            0.029893937 = score(doc=160,freq=1.0), product of:
              0.07972085 = queryWeight, product of:
                1.1411545 = boost
                3.9998152 = idf(docFreq=2211, maxDocs=44421)
                0.017465761 = queryNorm
              0.37498268 = fieldWeight in 160, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9998152 = idf(docFreq=2211, maxDocs=44421)
                0.09375 = fieldNorm(doc=160)
          0.14681265 = weight(abstract_txt:undertook in 160) [ClassicSimilarity], result of:
            0.14681265 = score(doc=160,freq=1.0), product of:
              0.18281654 = queryWeight, product of:
                1.2219439 = boost
                8.565973 = idf(docFreq=22, maxDocs=44421)
                0.017465761 = queryNorm
              0.80306 = fieldWeight in 160, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.565973 = idf(docFreq=22, maxDocs=44421)
                0.09375 = fieldNorm(doc=160)
          0.067972384 = weight(abstract_txt:project in 160) [ClassicSimilarity], result of:
            0.067972384 = score(doc=160,freq=3.0), product of:
              0.09557943 = queryWeight, product of:
                1.2495129 = boost
                4.3796177 = idf(docFreq=1512, maxDocs=44421)
                0.017465761 = queryNorm
              0.71116126 = fieldWeight in 160, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.3796177 = idf(docFreq=1512, maxDocs=44421)
                0.09375 = fieldNorm(doc=160)
          0.056787856 = weight(abstract_txt:digital in 160) [ClassicSimilarity], result of:
            0.056787856 = score(doc=160,freq=1.0), product of:
              0.13997503 = queryWeight, product of:
                1.8519508 = boost
                4.3274655 = idf(docFreq=1593, maxDocs=44421)
                0.017465761 = queryNorm
              0.4056999 = fieldWeight in 160, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3274655 = idf(docFreq=1593, maxDocs=44421)
                0.09375 = fieldNorm(doc=160)
          0.26860306 = weight(abstract_txt:metadata in 160) [ClassicSimilarity], result of:
            0.26860306 = score(doc=160,freq=2.0), product of:
              0.41521195 = queryWeight, product of:
                4.8722267 = boost
                4.87927 = idf(docFreq=917, maxDocs=44421)
                0.017465761 = queryNorm
              0.6469059 = fieldWeight in 160, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.87927 = idf(docFreq=917, maxDocs=44421)
                0.09375 = fieldNorm(doc=160)
        0.2 = coord(5/25)