Document (#41551)

Author
Roy, W.
Gray, C.
Title
Preparing existing metadata for repository batch import : a recipe for a fickle food
Source
Code4Lib journal. Issue 42(2018), [http://journal.code4lib.org]
Year
2018
Abstract
In 2016, the University of Waterloo began offering a mediated copyright review and deposit service to support the growth of our institutional repository UWSpace. This resulted in the need to batch import large lists of published works into the institutional repository quickly and accurately. A range of methods have been proposed for harvesting publications metadata en masse, but many technological solutions can easily become detached from a workflow that is both reproducible for support staff and applicable to a range of situations. Many repositories offer the capacity for batch upload via CSV, so our method provides a template Python script that leverages the Habanero library for populating CSV files with existing metadata retrieved from the CrossRef API. In our case, we have combined this with useful metadata contained in a TSV file downloaded from Web of Science in order to enrich our metadata as well. The appeal of this 'low-maintenance' method is that it provides more robust options for gathering metadata semi-automatically, and only requires the user's ability to access Web of Science and the Python program, while still remaining flexible enough for local customizations.
Content
Vgl.: https://journal.code4lib.org/articles/13895.
Theme
Metadaten

Similar documents (author)

  1. Gray, R.A.: Classification schemes as cognitive maps (1984) 5.51
    5.506935 = sum of:
      5.506935 = weight(author_txt:gray in 1735) [ClassicSimilarity], result of:
        5.506935 = fieldWeight in 1735, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.811096 = idf(docFreq=17, maxDocs=44421)
          0.625 = fieldNorm(doc=1735)
    
  2. Gray, J.: Accessing electronic resources via the library catalogue at Monash University Library (1998) 5.51
    5.506935 = sum of:
      5.506935 = weight(author_txt:gray in 4719) [ClassicSimilarity], result of:
        5.506935 = fieldWeight in 4719, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.811096 = idf(docFreq=17, maxDocs=44421)
          0.625 = fieldNorm(doc=4719)
    
  3. Gray, J.: Symbols and suggestions : Communication of mathematics in print (2001) 5.51
    5.506935 = sum of:
      5.506935 = weight(author_txt:gray in 6893) [ClassicSimilarity], result of:
        5.506935 = fieldWeight in 6893, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.811096 = idf(docFreq=17, maxDocs=44421)
          0.625 = fieldNorm(doc=6893)
    
  4. Gray, B.: Cataloging the special collections of Allegheny college (2005) 5.51
    5.506935 = sum of:
      5.506935 = weight(author_txt:gray in 252) [ClassicSimilarity], result of:
        5.506935 = fieldWeight in 252, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.811096 = idf(docFreq=17, maxDocs=44421)
          0.625 = fieldNorm(doc=252)
    
  5. Gray, W.A.; Harley, A.J.: Computer assisted indexing (1971) 4.41
    4.405548 = sum of:
      4.405548 = weight(author_txt:gray in 4345) [ClassicSimilarity], result of:
        4.405548 = fieldWeight in 4345, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.811096 = idf(docFreq=17, maxDocs=44421)
          0.5 = fieldNorm(doc=4345)
    

Similar documents (content)

  1. Chapman, J.W.; Reynolds, D.; Shreeves, S.A.: Repository metadata : approaches and challenges (2009) 0.18
    0.17759834 = sum of:
      0.17759834 = product of:
        0.7399931 = sum of:
          0.02300423 = weight(abstract_txt:many in 3980) [ClassicSimilarity], result of:
            0.02300423 = score(doc=3980,freq=1.0), product of:
              0.07217398 = queryWeight, product of:
                1.0475156 = boost
                4.0797825 = idf(docFreq=2041, maxDocs=44421)
                0.016888192 = queryNorm
              0.318733 = fieldWeight in 3980, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0797825 = idf(docFreq=2041, maxDocs=44421)
                0.078125 = fieldNorm(doc=3980)
          0.015098938 = weight(abstract_txt:from in 3980) [ClassicSimilarity], result of:
            0.015098938 = score(doc=3980,freq=2.0), product of:
              0.049525276 = queryWeight, product of:
                1.0627455 = boost
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.016888192 = queryNorm
              0.30487338 = fieldWeight in 3980, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.078125 = fieldNorm(doc=3980)
          0.04379403 = weight(abstract_txt:range in 3980) [ClassicSimilarity], result of:
            0.04379403 = score(doc=3980,freq=1.0), product of:
              0.1108627 = queryWeight, product of:
                1.2982637 = boost
                5.0563765 = idf(docFreq=768, maxDocs=44421)
                0.016888192 = queryNorm
              0.39502943 = fieldWeight in 3980, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0563765 = idf(docFreq=768, maxDocs=44421)
                0.078125 = fieldNorm(doc=3980)
          0.082049005 = weight(abstract_txt:institutional in 3980) [ClassicSimilarity], result of:
            0.082049005 = score(doc=3980,freq=1.0), product of:
              0.16848364 = queryWeight, product of:
                1.6004755 = boost
                6.2334075 = idf(docFreq=236, maxDocs=44421)
                0.016888192 = queryNorm
              0.48698497 = fieldWeight in 3980, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2334075 = idf(docFreq=236, maxDocs=44421)
                0.078125 = fieldNorm(doc=3980)
          0.2868736 = weight(abstract_txt:repository in 3980) [ClassicSimilarity], result of:
            0.2868736 = score(doc=3980,freq=4.0), product of:
              0.27988505 = queryWeight, product of:
                2.5264206 = boost
                6.559804 = idf(docFreq=170, maxDocs=44421)
                0.016888192 = queryNorm
              1.0249693 = fieldWeight in 3980, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.559804 = idf(docFreq=170, maxDocs=44421)
                0.078125 = fieldNorm(doc=3980)
          0.28917328 = weight(abstract_txt:metadata in 3980) [ClassicSimilarity], result of:
            0.28917328 = score(doc=3980,freq=6.0), product of:
              0.30969745 = queryWeight, product of:
                3.7583706 = boost
                4.87927 = idf(docFreq=917, maxDocs=44421)
                0.016888192 = queryNorm
              0.9337283 = fieldWeight in 3980, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.87927 = idf(docFreq=917, maxDocs=44421)
                0.078125 = fieldNorm(doc=3980)
        0.24 = coord(6/25)
    
  2. Nichols, D.M.; Paynter, G.W.; Chan, C.-H.; Bainbridge, D.; McKay, D.; Twidale, M.B.; Blandford, A.: Experiences in deploying metadata analysis tools for institutional repositories (2009) 0.18
    0.17591946 = sum of:
      0.17591946 = product of:
        0.6282838 = sum of:
          0.018403385 = weight(abstract_txt:many in 3986) [ClassicSimilarity], result of:
            0.018403385 = score(doc=3986,freq=1.0), product of:
              0.07217398 = queryWeight, product of:
                1.0475156 = boost
                4.0797825 = idf(docFreq=2041, maxDocs=44421)
                0.016888192 = queryNorm
              0.2549864 = fieldWeight in 3986, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0797825 = idf(docFreq=2041, maxDocs=44421)
                0.0625 = fieldNorm(doc=3986)
          0.0147938775 = weight(abstract_txt:from in 3986) [ClassicSimilarity], result of:
            0.0147938775 = score(doc=3986,freq=3.0), product of:
              0.049525276 = queryWeight, product of:
                1.0627455 = boost
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.016888192 = queryNorm
              0.29871368 = fieldWeight in 3986, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.0625 = fieldNorm(doc=3986)
          0.020243991 = weight(abstract_txt:provides in 3986) [ClassicSimilarity], result of:
            0.020243991 = score(doc=3986,freq=1.0), product of:
              0.07690944 = queryWeight, product of:
                1.0813344 = boost
                4.211497 = idf(docFreq=1789, maxDocs=44421)
                0.016888192 = queryNorm
              0.26321855 = fieldWeight in 3986, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.211497 = idf(docFreq=1789, maxDocs=44421)
                0.0625 = fieldNorm(doc=3986)
          0.027120715 = weight(abstract_txt:existing in 3986) [ClassicSimilarity], result of:
            0.027120715 = score(doc=3986,freq=1.0), product of:
              0.093465135 = queryWeight, product of:
                1.1920514 = boost
                4.6427093 = idf(docFreq=1162, maxDocs=44421)
                0.016888192 = queryNorm
              0.29016933 = fieldWeight in 3986, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6427093 = idf(docFreq=1162, maxDocs=44421)
                0.0625 = fieldNorm(doc=3986)
          0.0656392 = weight(abstract_txt:institutional in 3986) [ClassicSimilarity], result of:
            0.0656392 = score(doc=3986,freq=1.0), product of:
              0.16848364 = queryWeight, product of:
                1.6004755 = boost
                6.2334075 = idf(docFreq=236, maxDocs=44421)
                0.016888192 = queryNorm
              0.38958797 = fieldWeight in 3986, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2334075 = idf(docFreq=236, maxDocs=44421)
                0.0625 = fieldNorm(doc=3986)
          0.19875187 = weight(abstract_txt:repository in 3986) [ClassicSimilarity], result of:
            0.19875187 = score(doc=3986,freq=3.0), product of:
              0.27988505 = queryWeight, product of:
                2.5264206 = boost
                6.559804 = idf(docFreq=170, maxDocs=44421)
                0.016888192 = queryNorm
              0.7101196 = fieldWeight in 3986, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.559804 = idf(docFreq=170, maxDocs=44421)
                0.0625 = fieldNorm(doc=3986)
          0.28333077 = weight(abstract_txt:metadata in 3986) [ClassicSimilarity], result of:
            0.28333077 = score(doc=3986,freq=9.0), product of:
              0.30969745 = queryWeight, product of:
                3.7583706 = boost
                4.87927 = idf(docFreq=917, maxDocs=44421)
                0.016888192 = queryNorm
              0.9148631 = fieldWeight in 3986, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                4.87927 = idf(docFreq=917, maxDocs=44421)
                0.0625 = fieldNorm(doc=3986)
        0.28 = coord(7/25)
    
  3. Salo, D.: Name authority control in institutional repositories (2009) 0.15
    0.14917085 = sum of:
      0.14917085 = product of:
        0.6215452 = sum of:
          0.09758376 = weight(abstract_txt:deposit in 3976) [ClassicSimilarity], result of:
            0.09758376 = score(doc=3976,freq=1.0), product of:
              0.13293207 = queryWeight, product of:
                1.0052407 = boost
                7.8302665 = idf(docFreq=47, maxDocs=44421)
                0.016888192 = queryNorm
              0.73408747 = fieldWeight in 3976, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.8302665 = idf(docFreq=47, maxDocs=44421)
                0.09375 = fieldNorm(doc=3976)
          0.027605077 = weight(abstract_txt:many in 3976) [ClassicSimilarity], result of:
            0.027605077 = score(doc=3976,freq=1.0), product of:
              0.07217398 = queryWeight, product of:
                1.0475156 = boost
                4.0797825 = idf(docFreq=2041, maxDocs=44421)
                0.016888192 = queryNorm
              0.3824796 = fieldWeight in 3976, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0797825 = idf(docFreq=2041, maxDocs=44421)
                0.09375 = fieldNorm(doc=3976)
          0.012811874 = weight(abstract_txt:from in 3976) [ClassicSimilarity], result of:
            0.012811874 = score(doc=3976,freq=1.0), product of:
              0.049525276 = queryWeight, product of:
                1.0627455 = boost
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.016888192 = queryNorm
              0.25869364 = fieldWeight in 3976, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.09375 = fieldNorm(doc=3976)
          0.0984588 = weight(abstract_txt:institutional in 3976) [ClassicSimilarity], result of:
            0.0984588 = score(doc=3976,freq=1.0), product of:
              0.16848364 = queryWeight, product of:
                1.6004755 = boost
                6.2334075 = idf(docFreq=236, maxDocs=44421)
                0.016888192 = queryNorm
              0.58438194 = fieldWeight in 3976, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2334075 = idf(docFreq=236, maxDocs=44421)
                0.09375 = fieldNorm(doc=3976)
          0.24342032 = weight(abstract_txt:repository in 3976) [ClassicSimilarity], result of:
            0.24342032 = score(doc=3976,freq=2.0), product of:
              0.27988505 = queryWeight, product of:
                2.5264206 = boost
                6.559804 = idf(docFreq=170, maxDocs=44421)
                0.016888192 = queryNorm
              0.86971533 = fieldWeight in 3976, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.559804 = idf(docFreq=170, maxDocs=44421)
                0.09375 = fieldNorm(doc=3976)
          0.14166538 = weight(abstract_txt:metadata in 3976) [ClassicSimilarity], result of:
            0.14166538 = score(doc=3976,freq=1.0), product of:
              0.30969745 = queryWeight, product of:
                3.7583706 = boost
                4.87927 = idf(docFreq=917, maxDocs=44421)
                0.016888192 = queryNorm
              0.45743155 = fieldWeight in 3976, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.87927 = idf(docFreq=917, maxDocs=44421)
                0.09375 = fieldNorm(doc=3976)
        0.24 = coord(6/25)
    
  4. Stein, A.; Applegate, K.J.; Robbins, S.: Achieving and maintaining metadata quality : toward a sustainable workflow for the IDEALS Institutional Repository (2017) 0.13
    0.13159965 = sum of:
      0.13159965 = product of:
        1.0966638 = sum of:
          0.22949888 = weight(abstract_txt:repository in 159) [ClassicSimilarity], result of:
            0.22949888 = score(doc=159,freq=1.0), product of:
              0.27988505 = queryWeight, product of:
                2.5264206 = boost
                6.559804 = idf(docFreq=170, maxDocs=44421)
                0.016888192 = queryNorm
              0.8199755 = fieldWeight in 159, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.559804 = idf(docFreq=170, maxDocs=44421)
                0.125 = fieldNorm(doc=159)
          0.48939052 = weight(abstract_txt:batch in 159) [ClassicSimilarity], result of:
            0.48939052 = score(doc=159,freq=1.0), product of:
              0.46369207 = queryWeight, product of:
                3.251851 = boost
                8.443371 = idf(docFreq=25, maxDocs=44421)
                0.016888192 = queryNorm
              1.0554214 = fieldWeight in 159, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.443371 = idf(docFreq=25, maxDocs=44421)
                0.125 = fieldNorm(doc=159)
          0.3777744 = weight(abstract_txt:metadata in 159) [ClassicSimilarity], result of:
            0.3777744 = score(doc=159,freq=4.0), product of:
              0.30969745 = queryWeight, product of:
                3.7583706 = boost
                4.87927 = idf(docFreq=917, maxDocs=44421)
                0.016888192 = queryNorm
              1.2198175 = fieldWeight in 159, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.87927 = idf(docFreq=917, maxDocs=44421)
                0.125 = fieldNorm(doc=159)
        0.12 = coord(3/25)
    
  5. Agnew, G.: Developing a metadata strategy (2003) 0.11
    0.11470944 = sum of:
      0.11470944 = product of:
        0.955912 = sum of:
          0.2637359 = weight(abstract_txt:populating in 504) [ClassicSimilarity], result of:
            0.2637359 = score(doc=504,freq=1.0), product of:
              0.21291113 = queryWeight, product of:
                1.272197 = boost
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.016888192 = queryNorm
              1.2387135 = fieldWeight in 504, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.125 = fieldNorm(doc=504)
          0.22949888 = weight(abstract_txt:repository in 504) [ClassicSimilarity], result of:
            0.22949888 = score(doc=504,freq=1.0), product of:
              0.27988505 = queryWeight, product of:
                2.5264206 = boost
                6.559804 = idf(docFreq=170, maxDocs=44421)
                0.016888192 = queryNorm
              0.8199755 = fieldWeight in 504, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.559804 = idf(docFreq=170, maxDocs=44421)
                0.125 = fieldNorm(doc=504)
          0.46267724 = weight(abstract_txt:metadata in 504) [ClassicSimilarity], result of:
            0.46267724 = score(doc=504,freq=6.0), product of:
              0.30969745 = queryWeight, product of:
                3.7583706 = boost
                4.87927 = idf(docFreq=917, maxDocs=44421)
                0.016888192 = queryNorm
              1.4939653 = fieldWeight in 504, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.87927 = idf(docFreq=917, maxDocs=44421)
                0.125 = fieldNorm(doc=504)
        0.12 = coord(3/25)