Document (#42445)

Author
Hodges, D.W.
Schlottmann, K.
Title
better archival migration outcomes with Python and the Google Sheets API : Reporting from the archives
Source
Code4Lib journal. Issue 46(2019), [http://journal.code4lib.org]
Year
2019
Abstract
Columbia University Libraries recently embarked on a multi-phase project to migrate nearly 4,000 records describing over 70,000 linear feet of archival material from disparate sources and formats into ArchivesSpace. This paper discusses tools and methods brought to bear in Phase 2 of this project, which required us to look closely at how to integrate a large number of legacy finding aids into the new system and merge descriptive data that had diverged in myriad ways. Using Python, XSLT, and a widely available if underappreciated resource-the Google Sheets API-archival and technical library staff devised ways to efficiently report data from different sources, and present it in an accessible, user-friendly way,. Responses were then fed back into automated data remediation processes to keep the migration project on track and minimize manual intervention. The scripts and processes developed proved very effective, and moreover, show promise well beyond the ArchivesSpace migration. This paper describes the Python/XSLT/Sheets API processes developed and how they opened a path to move beyond CSV-based reporting with flexible, ad-hoc data interfaces easily adaptable to meet a variety of purposes.
Content
Vgl.: https://journal.code4lib.org/articles/14871.
Theme
Metadaten
Object
Google Sheets API
Python

Similar documents (author)

  1. Hodges, K.L.: Chronological order (1975) 5.71
    5.7103243 = sum of:
      5.7103243 = weight(author_txt:hodges in 7344) [ClassicSimilarity], result of:
        5.7103243 = fieldWeight in 7344, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.1365185 = idf(docFreq=12, maxDocs=44421)
          0.625 = fieldNorm(doc=7344)
    
  2. Hodges, P.R.: Keyword in title indexes : effectiveness of retrieval in computer searches (1983) 5.71
    5.7103243 = sum of:
      5.7103243 = weight(author_txt:hodges in 5069) [ClassicSimilarity], result of:
        5.7103243 = fieldWeight in 5069, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.1365185 = idf(docFreq=12, maxDocs=44421)
          0.625 = fieldNorm(doc=5069)
    
  3. Hodges, J.E.: Automated systems for the generation of document indexes (2000) 5.71
    5.7103243 = sum of:
      5.7103243 = weight(author_txt:hodges in 5668) [ClassicSimilarity], result of:
        5.7103243 = fieldWeight in 5668, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.1365185 = idf(docFreq=12, maxDocs=44421)
          0.625 = fieldNorm(doc=5668)
    
  4. Hodges, A.: ¬Der Mann hinter der Maschine (2012) 5.71
    5.7103243 = sum of:
      5.7103243 = weight(author_txt:hodges in 1157) [ClassicSimilarity], result of:
        5.7103243 = fieldWeight in 1157, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.1365185 = idf(docFreq=12, maxDocs=44421)
          0.625 = fieldNorm(doc=1157)
    
  5. Hodges, J.A.: Forensically reconstructing biomedical maintenance labor : PDF metadata under the epistemic conditions of COVID-19 (2021) 5.71
    5.7103243 = sum of:
      5.7103243 = weight(author_txt:hodges in 1389) [ClassicSimilarity], result of:
        5.7103243 = fieldWeight in 1389, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.1365185 = idf(docFreq=12, maxDocs=44421)
          0.625 = fieldNorm(doc=1389)
    

Similar documents (content)

  1. Sinn, D.; Soares, N.: Historians' use of digital archival collections : the web, historical scholarship, and archival research (2014) 0.10
    0.10052009 = sum of:
      0.10052009 = product of:
        0.35900033 = sum of:
          0.010484343 = weight(abstract_txt:from in 2349) [ClassicSimilarity], result of:
            0.010484343 = score(doc=2349,freq=2.0), product of:
              0.042986467 = queryWeight, product of:
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.015578199 = queryNorm
              0.2438987 = fieldWeight in 2349, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.0625 = fieldNorm(doc=2349)
          0.01737762 = weight(abstract_txt:developed in 2349) [ClassicSimilarity], result of:
            0.01737762 = score(doc=2349,freq=1.0), product of:
              0.06626396 = queryWeight, product of:
                1.0137414 = boost
                4.1959753 = idf(docFreq=1817, maxDocs=44421)
                0.015578199 = queryNorm
              0.26224846 = fieldWeight in 2349, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1959753 = idf(docFreq=1817, maxDocs=44421)
                0.0625 = fieldNorm(doc=2349)
          0.035495326 = weight(abstract_txt:sources in 2349) [ClassicSimilarity], result of:
            0.035495326 = score(doc=2349,freq=2.0), product of:
              0.08466839 = queryWeight, product of:
                1.1459064 = boost
                4.743019 = idf(docFreq=1051, maxDocs=44421)
                0.015578199 = queryNorm
              0.4192276 = fieldWeight in 2349, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.743019 = idf(docFreq=1051, maxDocs=44421)
                0.0625 = fieldNorm(doc=2349)
          0.01783065 = weight(abstract_txt:into in 2349) [ClassicSimilarity], result of:
            0.01783065 = score(doc=2349,freq=1.0), product of:
              0.07716595 = queryWeight, product of:
                1.3398217 = boost
                3.697102 = idf(docFreq=2993, maxDocs=44421)
                0.015578199 = queryNorm
              0.23106888 = fieldWeight in 2349, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.697102 = idf(docFreq=2993, maxDocs=44421)
                0.0625 = fieldNorm(doc=2349)
          0.0296409 = weight(abstract_txt:project in 2349) [ClassicSimilarity], result of:
            0.0296409 = score(doc=2349,freq=1.0), product of:
              0.10828671 = queryWeight, product of:
                1.5871637 = boost
                4.3796177 = idf(docFreq=1512, maxDocs=44421)
                0.015578199 = queryNorm
              0.2737261 = fieldWeight in 2349, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3796177 = idf(docFreq=1512, maxDocs=44421)
                0.0625 = fieldNorm(doc=2349)
          0.047535513 = weight(abstract_txt:processes in 2349) [ClassicSimilarity], result of:
            0.047535513 = score(doc=2349,freq=1.0), product of:
              0.14836326 = queryWeight, product of:
                1.857793 = boost
                5.126392 = idf(docFreq=716, maxDocs=44421)
                0.015578199 = queryNorm
              0.3203995 = fieldWeight in 2349, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.126392 = idf(docFreq=716, maxDocs=44421)
                0.0625 = fieldNorm(doc=2349)
          0.20063598 = weight(abstract_txt:archival in 2349) [ClassicSimilarity], result of:
            0.20063598 = score(doc=2349,freq=5.0), product of:
              0.22660185 = queryWeight, product of:
                2.2959683 = boost
                6.3354917 = idf(docFreq=213, maxDocs=44421)
                0.015578199 = queryNorm
              0.8854119 = fieldWeight in 2349, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.3354917 = idf(docFreq=213, maxDocs=44421)
                0.0625 = fieldNorm(doc=2349)
        0.28 = coord(7/25)
    
  2. Suranofsky, M.; McColl, L.: a Google sheets add-on that uses the WorldCat search API : MatchMarc (2019) 0.08
    0.084814325 = sum of:
      0.084814325 = product of:
        0.53008956 = sum of:
          0.009266939 = weight(abstract_txt:from in 442) [ClassicSimilarity], result of:
            0.009266939 = score(doc=442,freq=1.0), product of:
              0.042986467 = queryWeight, product of:
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.015578199 = queryNorm
              0.21557805 = fieldWeight in 442, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.078125 = fieldNorm(doc=442)
          0.030719586 = weight(abstract_txt:developed in 442) [ClassicSimilarity], result of:
            0.030719586 = score(doc=442,freq=2.0), product of:
              0.06626396 = queryWeight, product of:
                1.0137414 = boost
                4.1959753 = idf(docFreq=1817, maxDocs=44421)
                0.015578199 = queryNorm
              0.46359417 = fieldWeight in 442, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1959753 = idf(docFreq=1817, maxDocs=44421)
                0.078125 = fieldNorm(doc=442)
          0.07855156 = weight(abstract_txt:google in 442) [ClassicSimilarity], result of:
            0.07855156 = score(doc=442,freq=3.0), product of:
              0.108244695 = queryWeight, product of:
                1.2956623 = boost
                5.3628736 = idf(docFreq=565, maxDocs=44421)
                0.015578199 = queryNorm
              0.7256851 = fieldWeight in 442, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.3628736 = idf(docFreq=565, maxDocs=44421)
                0.078125 = fieldNorm(doc=442)
          0.41155148 = weight(abstract_txt:sheets in 442) [ClassicSimilarity], result of:
            0.41155148 = score(doc=442,freq=2.0), product of:
              0.42787182 = queryWeight, product of:
                3.1549392 = boost
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.015578199 = queryNorm
              0.9618569 = fieldWeight in 442, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.078125 = fieldNorm(doc=442)
        0.16 = coord(4/25)
    
  3. Chen, S.-J.: Semantic enrichment of linked archival materials (2019) 0.08
    0.08288984 = sum of:
      0.08288984 = product of:
        0.34537435 = sum of:
          0.010484343 = weight(abstract_txt:from in 488) [ClassicSimilarity], result of:
            0.010484343 = score(doc=488,freq=2.0), product of:
              0.042986467 = queryWeight, product of:
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.015578199 = queryNorm
              0.2438987 = fieldWeight in 488, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.0625 = fieldNorm(doc=488)
          0.035495326 = weight(abstract_txt:sources in 488) [ClassicSimilarity], result of:
            0.035495326 = score(doc=488,freq=2.0), product of:
              0.08466839 = queryWeight, product of:
                1.1459064 = boost
                4.743019 = idf(docFreq=1051, maxDocs=44421)
                0.015578199 = queryNorm
              0.4192276 = fieldWeight in 488, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.743019 = idf(docFreq=1051, maxDocs=44421)
                0.0625 = fieldNorm(doc=488)
          0.01783065 = weight(abstract_txt:into in 488) [ClassicSimilarity], result of:
            0.01783065 = score(doc=488,freq=1.0), product of:
              0.07716595 = queryWeight, product of:
                1.3398217 = boost
                3.697102 = idf(docFreq=2993, maxDocs=44421)
                0.015578199 = queryNorm
              0.23106888 = fieldWeight in 488, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.697102 = idf(docFreq=2993, maxDocs=44421)
                0.0625 = fieldNorm(doc=488)
          0.04191856 = weight(abstract_txt:project in 488) [ClassicSimilarity], result of:
            0.04191856 = score(doc=488,freq=2.0), product of:
              0.10828671 = queryWeight, product of:
                1.5871637 = boost
                4.3796177 = idf(docFreq=1512, maxDocs=44421)
                0.015578199 = queryNorm
              0.38710716 = fieldWeight in 488, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3796177 = idf(docFreq=1512, maxDocs=44421)
                0.0625 = fieldNorm(doc=488)
          0.060191207 = weight(abstract_txt:data in 488) [ClassicSimilarity], result of:
            0.060191207 = score(doc=488,freq=12.0), product of:
              0.083481215 = queryWeight, product of:
                1.609155 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.015578199 = queryNorm
              0.721015 = fieldWeight in 488, product of:
                3.4641016 = tf(freq=12.0), with freq of:
                  12.0 = termFreq=12.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.0625 = fieldNorm(doc=488)
          0.17945427 = weight(abstract_txt:archival in 488) [ClassicSimilarity], result of:
            0.17945427 = score(doc=488,freq=4.0), product of:
              0.22660185 = queryWeight, product of:
                2.2959683 = boost
                6.3354917 = idf(docFreq=213, maxDocs=44421)
                0.015578199 = queryNorm
              0.79193646 = fieldWeight in 488, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.3354917 = idf(docFreq=213, maxDocs=44421)
                0.0625 = fieldNorm(doc=488)
        0.24 = coord(6/25)
    
  4. Godfrey, B.; Johnson, J.: ¬The geospatial metadata manager's toolbox : three techniques for maintaining records (2015) 0.08
    0.081619486 = sum of:
      0.081619486 = product of:
        0.68016243 = sum of:
          0.026066432 = weight(abstract_txt:developed in 3275) [ClassicSimilarity], result of:
            0.026066432 = score(doc=3275,freq=1.0), product of:
              0.06626396 = queryWeight, product of:
                1.0137414 = boost
                4.1959753 = idf(docFreq=1817, maxDocs=44421)
                0.015578199 = queryNorm
              0.39337268 = fieldWeight in 3275, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1959753 = idf(docFreq=1817, maxDocs=44421)
                0.09375 = fieldNorm(doc=3275)
          0.269107 = weight(abstract_txt:xslt in 3275) [ClassicSimilarity], result of:
            0.269107 = score(doc=3275,freq=1.0), product of:
              0.314176 = queryWeight, product of:
                2.2073693 = boost
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.015578199 = queryNorm
              0.8565486 = fieldWeight in 3275, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.09375 = fieldNorm(doc=3275)
          0.38498902 = weight(abstract_txt:python in 3275) [ClassicSimilarity], result of:
            0.38498902 = score(doc=3275,freq=1.0), product of:
              0.45661724 = queryWeight, product of:
                3.2591946 = boost
                8.993418 = idf(docFreq=14, maxDocs=44421)
                0.015578199 = queryNorm
              0.8431329 = fieldWeight in 3275, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.993418 = idf(docFreq=14, maxDocs=44421)
                0.09375 = fieldNorm(doc=3275)
        0.12 = coord(3/25)
    
  5. Alexander, F.; Heather, A.: Transformation of a legacy UDC-based classification system : exploiting and remodelling semantic relationships (2011) 0.08
    0.081060335 = sum of:
      0.081060335 = product of:
        0.2895012 = sum of:
          0.0074135507 = weight(abstract_txt:from in 829) [ClassicSimilarity], result of:
            0.0074135507 = score(doc=829,freq=1.0), product of:
              0.042986467 = queryWeight, product of:
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.015578199 = queryNorm
              0.17246243 = fieldWeight in 829, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.759399 = idf(docFreq=7646, maxDocs=44421)
                0.0625 = fieldNorm(doc=829)
          0.025281558 = weight(abstract_txt:ways in 829) [ClassicSimilarity], result of:
            0.025281558 = score(doc=829,freq=1.0), product of:
              0.08507848 = queryWeight, product of:
                1.1486782 = boost
                4.754492 = idf(docFreq=1039, maxDocs=44421)
                0.015578199 = queryNorm
              0.29715574 = fieldWeight in 829, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.754492 = idf(docFreq=1039, maxDocs=44421)
                0.0625 = fieldNorm(doc=829)
          0.01783065 = weight(abstract_txt:into in 829) [ClassicSimilarity], result of:
            0.01783065 = score(doc=829,freq=1.0), product of:
              0.07716595 = queryWeight, product of:
                1.3398217 = boost
                3.697102 = idf(docFreq=2993, maxDocs=44421)
                0.015578199 = queryNorm
              0.23106888 = fieldWeight in 829, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.697102 = idf(docFreq=2993, maxDocs=44421)
                0.0625 = fieldNorm(doc=829)
          0.0296409 = weight(abstract_txt:project in 829) [ClassicSimilarity], result of:
            0.0296409 = score(doc=829,freq=1.0), product of:
              0.10828671 = queryWeight, product of:
                1.5871637 = boost
                4.3796177 = idf(docFreq=1512, maxDocs=44421)
                0.015578199 = queryNorm
              0.2737261 = fieldWeight in 829, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3796177 = idf(docFreq=1512, maxDocs=44421)
                0.0625 = fieldNorm(doc=829)
          0.017375704 = weight(abstract_txt:data in 829) [ClassicSimilarity], result of:
            0.017375704 = score(doc=829,freq=1.0), product of:
              0.083481215 = queryWeight, product of:
                1.609155 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.015578199 = queryNorm
              0.20813909 = fieldWeight in 829, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.0625 = fieldNorm(doc=829)
          0.047535513 = weight(abstract_txt:processes in 829) [ClassicSimilarity], result of:
            0.047535513 = score(doc=829,freq=1.0), product of:
              0.14836326 = queryWeight, product of:
                1.857793 = boost
                5.126392 = idf(docFreq=716, maxDocs=44421)
                0.015578199 = queryNorm
              0.3203995 = fieldWeight in 829, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.126392 = idf(docFreq=716, maxDocs=44421)
                0.0625 = fieldNorm(doc=829)
          0.14442332 = weight(abstract_txt:migration in 829) [ClassicSimilarity], result of:
            0.14442332 = score(doc=829,freq=1.0), product of:
              0.31122357 = queryWeight, product of:
                2.6907315 = boost
                7.4248013 = idf(docFreq=71, maxDocs=44421)
                0.015578199 = queryNorm
              0.46405008 = fieldWeight in 829, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4248013 = idf(docFreq=71, maxDocs=44421)
                0.0625 = fieldNorm(doc=829)
        0.28 = coord(7/25)