Document (#41551)

Author
Roy, W.
Gray, C.
Title
Preparing existing metadata for repository batch import : a recipe for a fickle food
Source
Code4Lib journal. Issue 42(2018), [http://journal.code4lib.org]
Year
2018
Abstract
In 2016, the University of Waterloo began offering a mediated copyright review and deposit service to support the growth of our institutional repository UWSpace. This resulted in the need to batch import large lists of published works into the institutional repository quickly and accurately. A range of methods have been proposed for harvesting publications metadata en masse, but many technological solutions can easily become detached from a workflow that is both reproducible for support staff and applicable to a range of situations. Many repositories offer the capacity for batch upload via CSV, so our method provides a template Python script that leverages the Habanero library for populating CSV files with existing metadata retrieved from the CrossRef API. In our case, we have combined this with useful metadata contained in a TSV file downloaded from Web of Science in order to enrich our metadata as well. The appeal of this 'low-maintenance' method is that it provides more robust options for gathering metadata semi-automatically, and only requires the user's ability to access Web of Science and the Python program, while still remaining flexible enough for local customizations.
Content
Vgl.: https://journal.code4lib.org/articles/13895.
Theme
Metadaten

Similar documents (author)

  1. Gray, R.A.: Classification schemes as cognitive maps (1984) 5.50
    5.504072 = sum of:
      5.504072 = weight(author_txt:gray in 1736) [ClassicSimilarity], result of:
        5.504072 = fieldWeight in 1736, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.806516 = idf(docFreq=17, maxDocs=44218)
          0.625 = fieldNorm(doc=1736)
    
  2. Gray, J.: Accessing electronic resources via the library catalogue at Monash University Library (1998) 5.50
    5.504072 = sum of:
      5.504072 = weight(author_txt:gray in 3719) [ClassicSimilarity], result of:
        5.504072 = fieldWeight in 3719, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.806516 = idf(docFreq=17, maxDocs=44218)
          0.625 = fieldNorm(doc=3719)
    
  3. Gray, J.: Symbols and suggestions : Communication of mathematics in print (2001) 5.50
    5.504072 = sum of:
      5.504072 = weight(author_txt:gray in 5893) [ClassicSimilarity], result of:
        5.504072 = fieldWeight in 5893, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.806516 = idf(docFreq=17, maxDocs=44218)
          0.625 = fieldNorm(doc=5893)
    
  4. Gray, B.: Cataloging the special collections of Allegheny college (2005) 5.50
    5.504072 = sum of:
      5.504072 = weight(author_txt:gray in 127) [ClassicSimilarity], result of:
        5.504072 = fieldWeight in 127, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.806516 = idf(docFreq=17, maxDocs=44218)
          0.625 = fieldNorm(doc=127)
    
  5. Gray, W.A.; Harley, A.J.: Computer assisted indexing (1971) 4.40
    4.403258 = sum of:
      4.403258 = weight(author_txt:gray in 4346) [ClassicSimilarity], result of:
        4.403258 = fieldWeight in 4346, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.806516 = idf(docFreq=17, maxDocs=44218)
          0.5 = fieldNorm(doc=4346)
    

Similar documents (content)

  1. Chapman, J.W.; Reynolds, D.; Shreeves, S.A.: Repository metadata : approaches and challenges (2009) 0.18
    0.18045156 = sum of:
      0.18045156 = product of:
        0.7518815 = sum of:
          0.02334037 = weight(abstract_txt:many in 2980) [ClassicSimilarity], result of:
            0.02334037 = score(doc=2980,freq=1.0), product of:
              0.073205024 = queryWeight, product of:
                1.0570289 = boost
                4.081096 = idf(docFreq=2029, maxDocs=44218)
                0.016969819 = queryNorm
              0.31883565 = fieldWeight in 2980, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.081096 = idf(docFreq=2029, maxDocs=44218)
                0.078125 = fieldNorm(doc=2980)
          0.015379503 = weight(abstract_txt:from in 2980) [ClassicSimilarity], result of:
            0.015379503 = score(doc=2980,freq=2.0), product of:
              0.050363705 = queryWeight, product of:
                1.0737938 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.016969819 = queryNorm
              0.30536878 = fieldWeight in 2980, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.078125 = fieldNorm(doc=2980)
          0.04454603 = weight(abstract_txt:range in 2980) [ClassicSimilarity], result of:
            0.04454603 = score(doc=2980,freq=1.0), product of:
              0.11263544 = queryWeight, product of:
                1.3111547 = boost
                5.062254 = idf(docFreq=760, maxDocs=44218)
                0.016969819 = queryNorm
              0.3954886 = fieldWeight in 2980, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.062254 = idf(docFreq=760, maxDocs=44218)
                0.078125 = fieldNorm(doc=2980)
          0.08418839 = weight(abstract_txt:institutional in 2980) [ClassicSimilarity], result of:
            0.08418839 = score(doc=2980,freq=1.0), product of:
              0.17217517 = queryWeight, product of:
                1.6210697 = boost
                6.258808 = idf(docFreq=229, maxDocs=44218)
                0.016969819 = queryNorm
              0.4889694 = fieldWeight in 2980, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.258808 = idf(docFreq=229, maxDocs=44218)
                0.078125 = fieldNorm(doc=2980)
          0.29095528 = weight(abstract_txt:repository in 2980) [ClassicSimilarity], result of:
            0.29095528 = score(doc=2980,freq=4.0), product of:
              0.28381172 = queryWeight, product of:
                2.5490432 = boost
                6.5610886 = idf(docFreq=169, maxDocs=44218)
                0.016969819 = queryNorm
              1.0251701 = fieldWeight in 2980, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.5610886 = idf(docFreq=169, maxDocs=44218)
                0.078125 = fieldNorm(doc=2980)
          0.2934719 = weight(abstract_txt:metadata in 2980) [ClassicSimilarity], result of:
            0.2934719 = score(doc=2980,freq=6.0), product of:
              0.31417388 = queryWeight, product of:
                3.7928188 = boost
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.016969819 = queryNorm
              0.93410665 = fieldWeight in 2980, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.078125 = fieldNorm(doc=2980)
        0.24 = coord(6/25)
    
  2. Nichols, D.M.; Paynter, G.W.; Chan, C.-H.; Bainbridge, D.; McKay, D.; Twidale, M.B.; Blandford, A.: Experiences in deploying metadata analysis tools for institutional repositories (2009) 0.18
    0.17876273 = sum of:
      0.17876273 = product of:
        0.63843834 = sum of:
          0.018672297 = weight(abstract_txt:many in 2986) [ClassicSimilarity], result of:
            0.018672297 = score(doc=2986,freq=1.0), product of:
              0.073205024 = queryWeight, product of:
                1.0570289 = boost
                4.081096 = idf(docFreq=2029, maxDocs=44218)
                0.016969819 = queryNorm
              0.2550685 = fieldWeight in 2986, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.081096 = idf(docFreq=2029, maxDocs=44218)
                0.0625 = fieldNorm(doc=2986)
          0.015068774 = weight(abstract_txt:from in 2986) [ClassicSimilarity], result of:
            0.015068774 = score(doc=2986,freq=3.0), product of:
              0.050363705 = queryWeight, product of:
                1.0737938 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.016969819 = queryNorm
              0.29919907 = fieldWeight in 2986, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.0625 = fieldNorm(doc=2986)
          0.020584326 = weight(abstract_txt:provides in 2986) [ClassicSimilarity], result of:
            0.020584326 = score(doc=2986,freq=1.0), product of:
              0.07812083 = queryWeight, product of:
                1.0919427 = boost
                4.215895 = idf(docFreq=1773, maxDocs=44218)
                0.016969819 = queryNorm
              0.26349345 = fieldWeight in 2986, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.215895 = idf(docFreq=1773, maxDocs=44218)
                0.0625 = fieldNorm(doc=2986)
          0.02763995 = weight(abstract_txt:existing in 2986) [ClassicSimilarity], result of:
            0.02763995 = score(doc=2986,freq=1.0), product of:
              0.09508249 = queryWeight, product of:
                1.2046661 = boost
                4.6511106 = idf(docFreq=1147, maxDocs=44218)
                0.016969819 = queryNorm
              0.29069442 = fieldWeight in 2986, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6511106 = idf(docFreq=1147, maxDocs=44218)
                0.0625 = fieldNorm(doc=2986)
          0.06735071 = weight(abstract_txt:institutional in 2986) [ClassicSimilarity], result of:
            0.06735071 = score(doc=2986,freq=1.0), product of:
              0.17217517 = queryWeight, product of:
                1.6210697 = boost
                6.258808 = idf(docFreq=229, maxDocs=44218)
                0.016969819 = queryNorm
              0.3911755 = fieldWeight in 2986, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.258808 = idf(docFreq=229, maxDocs=44218)
                0.0625 = fieldNorm(doc=2986)
          0.20157973 = weight(abstract_txt:repository in 2986) [ClassicSimilarity], result of:
            0.20157973 = score(doc=2986,freq=3.0), product of:
              0.28381172 = queryWeight, product of:
                2.5490432 = boost
                6.5610886 = idf(docFreq=169, maxDocs=44218)
                0.016969819 = queryNorm
              0.71025866 = fieldWeight in 2986, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.5610886 = idf(docFreq=169, maxDocs=44218)
                0.0625 = fieldNorm(doc=2986)
          0.28754258 = weight(abstract_txt:metadata in 2986) [ClassicSimilarity], result of:
            0.28754258 = score(doc=2986,freq=9.0), product of:
              0.31417388 = queryWeight, product of:
                3.7928188 = boost
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.016969819 = queryNorm
              0.91523385 = fieldWeight in 2986, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.0625 = fieldNorm(doc=2986)
        0.28 = coord(7/25)
    
  3. Salo, D.: Name authority control in institutional repositories (2009) 0.15
    0.15155523 = sum of:
      0.15155523 = product of:
        0.6314801 = sum of:
          0.0987406 = weight(abstract_txt:deposit in 2976) [ClassicSimilarity], result of:
            0.0987406 = score(doc=2976,freq=1.0), product of:
              0.13458668 = queryWeight, product of:
                1.0134503 = boost
                7.825686 = idf(docFreq=47, maxDocs=44218)
                0.016969819 = queryNorm
              0.7336581 = fieldWeight in 2976, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.825686 = idf(docFreq=47, maxDocs=44218)
                0.09375 = fieldNorm(doc=2976)
          0.028008444 = weight(abstract_txt:many in 2976) [ClassicSimilarity], result of:
            0.028008444 = score(doc=2976,freq=1.0), product of:
              0.073205024 = queryWeight, product of:
                1.0570289 = boost
                4.081096 = idf(docFreq=2029, maxDocs=44218)
                0.016969819 = queryNorm
              0.38260275 = fieldWeight in 2976, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.081096 = idf(docFreq=2029, maxDocs=44218)
                0.09375 = fieldNorm(doc=2976)
          0.013049941 = weight(abstract_txt:from in 2976) [ClassicSimilarity], result of:
            0.013049941 = score(doc=2976,freq=1.0), product of:
              0.050363705 = queryWeight, product of:
                1.0737938 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.016969819 = queryNorm
              0.259114 = fieldWeight in 2976, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.09375 = fieldNorm(doc=2976)
          0.101026066 = weight(abstract_txt:institutional in 2976) [ClassicSimilarity], result of:
            0.101026066 = score(doc=2976,freq=1.0), product of:
              0.17217517 = queryWeight, product of:
                1.6210697 = boost
                6.258808 = idf(docFreq=229, maxDocs=44218)
                0.016969819 = queryNorm
              0.58676326 = fieldWeight in 2976, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.258808 = idf(docFreq=229, maxDocs=44218)
                0.09375 = fieldNorm(doc=2976)
          0.24688374 = weight(abstract_txt:repository in 2976) [ClassicSimilarity], result of:
            0.24688374 = score(doc=2976,freq=2.0), product of:
              0.28381172 = queryWeight, product of:
                2.5490432 = boost
                6.5610886 = idf(docFreq=169, maxDocs=44218)
                0.016969819 = queryNorm
              0.8698856 = fieldWeight in 2976, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.5610886 = idf(docFreq=169, maxDocs=44218)
                0.09375 = fieldNorm(doc=2976)
          0.14377129 = weight(abstract_txt:metadata in 2976) [ClassicSimilarity], result of:
            0.14377129 = score(doc=2976,freq=1.0), product of:
              0.31417388 = queryWeight, product of:
                3.7928188 = boost
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.016969819 = queryNorm
              0.45761693 = fieldWeight in 2976, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.09375 = fieldNorm(doc=2976)
        0.24 = coord(6/25)
    
  4. Stein, A.; Applegate, K.J.; Robbins, S.: Achieving and maintaining metadata quality : toward a sustainable workflow for the IDEALS Institutional Repository (2017) 0.13
    0.13336916 = sum of:
      0.13336916 = product of:
        1.1114097 = sum of:
          0.23276423 = weight(abstract_txt:repository in 5159) [ClassicSimilarity], result of:
            0.23276423 = score(doc=5159,freq=1.0), product of:
              0.28381172 = queryWeight, product of:
                2.5490432 = boost
                6.5610886 = idf(docFreq=169, maxDocs=44218)
                0.016969819 = queryNorm
              0.8201361 = fieldWeight in 5159, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5610886 = idf(docFreq=169, maxDocs=44218)
                0.125 = fieldNorm(doc=5159)
          0.4952553 = weight(abstract_txt:batch in 5159) [ClassicSimilarity], result of:
            0.4952553 = score(doc=5159,freq=1.0), product of:
              0.46950358 = queryWeight, product of:
                3.2785478 = boost
                8.43879 = idf(docFreq=25, maxDocs=44218)
                0.016969819 = queryNorm
              1.0548488 = fieldWeight in 5159, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.43879 = idf(docFreq=25, maxDocs=44218)
                0.125 = fieldNorm(doc=5159)
          0.38339007 = weight(abstract_txt:metadata in 5159) [ClassicSimilarity], result of:
            0.38339007 = score(doc=5159,freq=4.0), product of:
              0.31417388 = queryWeight, product of:
                3.7928188 = boost
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.016969819 = queryNorm
              1.2203118 = fieldWeight in 5159, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.125 = fieldNorm(doc=5159)
        0.12 = coord(3/25)
    
  5. Rice, R.: Applying DC to institutional data repositories (2008) 0.13
    0.12784857 = sum of:
      0.12784857 = product of:
        0.456602 = sum of:
          0.01676924 = weight(abstract_txt:science in 2664) [ClassicSimilarity], result of:
            0.01676924 = score(doc=2664,freq=2.0), product of:
              0.06551898 = queryWeight, product of:
                3.8609126 = idf(docFreq=2529, maxDocs=44218)
                0.016969819 = queryNorm
              0.25594476 = fieldWeight in 2664, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.8609126 = idf(docFreq=2529, maxDocs=44218)
                0.046875 = fieldNorm(doc=2664)
          0.015982848 = weight(abstract_txt:from in 2664) [ClassicSimilarity], result of:
            0.015982848 = score(doc=2664,freq=6.0), product of:
              0.050363705 = queryWeight, product of:
                1.0737938 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.016969819 = queryNorm
              0.31734854 = fieldWeight in 2664, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.046875 = fieldNorm(doc=2664)
          0.017292561 = weight(abstract_txt:support in 2664) [ClassicSimilarity], result of:
            0.017292561 = score(doc=2664,freq=1.0), product of:
              0.08425734 = queryWeight, product of:
                1.1340189 = boost
                4.378348 = idf(docFreq=1507, maxDocs=44218)
                0.016969819 = queryNorm
              0.20523506 = fieldWeight in 2664, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.378348 = idf(docFreq=1507, maxDocs=44218)
                0.046875 = fieldNorm(doc=2664)
          0.020729963 = weight(abstract_txt:existing in 2664) [ClassicSimilarity], result of:
            0.020729963 = score(doc=2664,freq=1.0), product of:
              0.09508249 = queryWeight, product of:
                1.2046661 = boost
                4.6511106 = idf(docFreq=1147, maxDocs=44218)
                0.016969819 = queryNorm
              0.21802081 = fieldWeight in 2664, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6511106 = idf(docFreq=1147, maxDocs=44218)
                0.046875 = fieldNorm(doc=2664)
          0.050513033 = weight(abstract_txt:institutional in 2664) [ClassicSimilarity], result of:
            0.050513033 = score(doc=2664,freq=1.0), product of:
              0.17217517 = queryWeight, product of:
                1.6210697 = boost
                6.258808 = idf(docFreq=229, maxDocs=44218)
                0.016969819 = queryNorm
              0.29338163 = fieldWeight in 2664, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.258808 = idf(docFreq=229, maxDocs=44218)
                0.046875 = fieldNorm(doc=2664)
          0.17457317 = weight(abstract_txt:repository in 2664) [ClassicSimilarity], result of:
            0.17457317 = score(doc=2664,freq=4.0), product of:
              0.28381172 = queryWeight, product of:
                2.5490432 = boost
                6.5610886 = idf(docFreq=169, maxDocs=44218)
                0.016969819 = queryNorm
              0.61510205 = fieldWeight in 2664, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.5610886 = idf(docFreq=169, maxDocs=44218)
                0.046875 = fieldNorm(doc=2664)
          0.16074118 = weight(abstract_txt:metadata in 2664) [ClassicSimilarity], result of:
            0.16074118 = score(doc=2664,freq=5.0), product of:
              0.31417388 = queryWeight, product of:
                3.7928188 = boost
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.016969819 = queryNorm
              0.51163125 = fieldWeight in 2664, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.046875 = fieldNorm(doc=2664)
        0.28 = coord(7/25)