Document (#39278)

Author
Harlow, C.
Title
Data munging tools in Preparation for RDF : Catmandu and LODRefine
Source
Code4Lib journal. Issue 30(2015), [http://journal.code4lib.org]
Year
2015
Abstract
Data munging, or the work of remediating, enhancing and transforming library datasets for new or improved uses, has become more important and staff-inclusive in many library technology discussions and projects. Many times we know how we want our data to look, as well as how we want our data to act in discovery interfaces or when exposed, but we are uncertain how to make the data we have into the data we want. This article introduces and compares two library data munging tools that can help: LODRefine (OpenRefine with the DERI RDF Extension) and Catmandu. The strengths and best practices of each tool are discussed in the context of metadata munging use cases for an institution's metadata migration workflow. There is a focus on Linked Open Data modeling and transformation applications of each tool, in particular how metadataists, catalogers, and programmers can create metadata quality reports, enhance existing data with LOD sets, and transform that data to a RDF model. Integration of these tools with other systems and projects, the use of domain specific transformation languages, and the expansion of vocabulary reconciliation services are mentioned.
Content
Vgl.: http://journal.code4lib.org/articles/11013.
Theme
Formalerschließung
Semantic Web
Object
Catmandu
LODRefine
RDF

Similar documents (content)

  1. Hooland, S. van; Verborgh, R.; Wilde, M. De; Hercher, J.; Mannens, E.; Wa, R.Van de: Evaluating the success of vocabulary reconciliation for cultural heritage collections (2013) 0.31
    0.30535874 = sum of:
      0.30535874 = product of:
        0.9542461 = sum of:
          0.010947667 = weight(abstract_txt:with in 1662) [ClassicSimilarity], result of:
            0.010947667 = score(doc=1662,freq=1.0), product of:
              0.05613874 = queryWeight, product of:
                1.1147029 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.020175979 = queryNorm
              0.19501092 = fieldWeight in 1662, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.078125 = fieldNorm(doc=1662)
          0.24136977 = weight(abstract_txt:reconciliation in 1662) [ClassicSimilarity], result of:
            0.24136977 = score(doc=1662,freq=2.0), product of:
              0.24291432 = queryWeight, product of:
                1.3387322 = boost
                8.993418 = idf(docFreq=14, maxDocs=44421)
                0.020175979 = queryNorm
              0.9936416 = fieldWeight in 1662, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.993418 = idf(docFreq=14, maxDocs=44421)
                0.078125 = fieldNorm(doc=1662)
          0.20144023 = weight(abstract_txt:openrefine in 1662) [ClassicSimilarity], result of:
            0.20144023 = score(doc=1662,freq=1.0), product of:
              0.27129304 = queryWeight, product of:
                1.4147722 = boost
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.020175979 = queryNorm
              0.74251896 = fieldWeight in 1662, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.078125 = fieldNorm(doc=1662)
          0.022826087 = weight(abstract_txt:library in 1662) [ClassicSimilarity], result of:
            0.022826087 = score(doc=1662,freq=1.0), product of:
              0.09162259 = queryWeight, product of:
                1.4240626 = boost
                3.188885 = idf(docFreq=4976, maxDocs=44421)
                0.020175979 = queryNorm
              0.24913163 = fieldWeight in 1662, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.188885 = idf(docFreq=4976, maxDocs=44421)
                0.078125 = fieldNorm(doc=1662)
          0.05688895 = weight(abstract_txt:tool in 1662) [ClassicSimilarity], result of:
            0.05688895 = score(doc=1662,freq=1.0), product of:
              0.14713064 = queryWeight, product of:
                1.4734443 = boost
                4.9491973 = idf(docFreq=855, maxDocs=44421)
                0.020175979 = queryNorm
              0.38665605 = fieldWeight in 1662, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9491973 = idf(docFreq=855, maxDocs=44421)
                0.078125 = fieldNorm(doc=1662)
          0.1290503 = weight(abstract_txt:transformation in 1662) [ClassicSimilarity], result of:
            0.1290503 = score(doc=1662,freq=1.0), product of:
              0.25401372 = queryWeight, product of:
                1.9360242 = boost
                6.5029707 = idf(docFreq=180, maxDocs=44421)
                0.020175979 = queryNorm
              0.5080446 = fieldWeight in 1662, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5029707 = idf(docFreq=180, maxDocs=44421)
                0.078125 = fieldNorm(doc=1662)
          0.14162505 = weight(abstract_txt:metadata in 1662) [ClassicSimilarity], result of:
            0.14162505 = score(doc=1662,freq=3.0), product of:
              0.2145036 = queryWeight, product of:
                2.178939 = boost
                4.87927 = idf(docFreq=917, maxDocs=44421)
                0.020175979 = queryNorm
              0.66024554 = fieldWeight in 1662, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.87927 = idf(docFreq=917, maxDocs=44421)
                0.078125 = fieldNorm(doc=1662)
          0.15009809 = weight(abstract_txt:data in 1662) [ClassicSimilarity], result of:
            0.15009809 = score(doc=1662,freq=3.0), product of:
              0.33308175 = queryWeight, product of:
                4.95727 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.020175979 = queryNorm
              0.45063436 = fieldWeight in 1662, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.078125 = fieldNorm(doc=1662)
        0.32 = coord(8/25)
    
  2. Stephens, O.: Introduction to OpenRefine (2014) 0.22
    0.21621652 = sum of:
      0.21621652 = product of:
        0.9009022 = sum of:
          0.017516267 = weight(abstract_txt:with in 3884) [ClassicSimilarity], result of:
            0.017516267 = score(doc=3884,freq=4.0), product of:
              0.05613874 = queryWeight, product of:
                1.1147029 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.020175979 = queryNorm
              0.31201747 = fieldWeight in 3884, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.0625 = fieldNorm(doc=3884)
          0.025493216 = weight(abstract_txt:many in 3884) [ClassicSimilarity], result of:
            0.025493216 = score(doc=3884,freq=1.0), product of:
              0.09997872 = queryWeight, product of:
                1.2146076 = boost
                4.0797825 = idf(docFreq=2041, maxDocs=44421)
                0.020175979 = queryNorm
              0.2549864 = fieldWeight in 3884, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0797825 = idf(docFreq=2041, maxDocs=44421)
                0.0625 = fieldNorm(doc=3884)
          0.27912378 = weight(abstract_txt:openrefine in 3884) [ClassicSimilarity], result of:
            0.27912378 = score(doc=3884,freq=3.0), product of:
              0.27129304 = queryWeight, product of:
                1.4147722 = boost
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.020175979 = queryNorm
              1.0288645 = fieldWeight in 3884, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.0625 = fieldNorm(doc=3884)
          0.04551116 = weight(abstract_txt:tool in 3884) [ClassicSimilarity], result of:
            0.04551116 = score(doc=3884,freq=1.0), product of:
              0.14713064 = queryWeight, product of:
                1.4734443 = boost
                4.9491973 = idf(docFreq=855, maxDocs=44421)
                0.020175979 = queryNorm
              0.30932483 = fieldWeight in 3884, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9491973 = idf(docFreq=855, maxDocs=44421)
                0.0625 = fieldNorm(doc=3884)
          0.27385864 = weight(abstract_txt:want in 3884) [ClassicSimilarity], result of:
            0.27385864 = score(doc=3884,freq=3.0), product of:
              0.38633627 = queryWeight, product of:
                2.9242234 = boost
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.020175979 = queryNorm
              0.7088608 = fieldWeight in 3884, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.0625 = fieldNorm(doc=3884)
          0.25939912 = weight(abstract_txt:data in 3884) [ClassicSimilarity], result of:
            0.25939912 = score(doc=3884,freq=14.0), product of:
              0.33308175 = queryWeight, product of:
                4.95727 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.020175979 = queryNorm
              0.77878517 = fieldWeight in 3884, product of:
                3.7416575 = tf(freq=14.0), with freq of:
                  14.0 = termFreq=14.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.0625 = fieldNorm(doc=3884)
        0.24 = coord(6/25)
    
  3. Takhirov, N.; Aalberg, T.; Duchateau, F.; Zumer, M.: FRBR-ML: a FRBR-based framework for semantic interoperability (2012) 0.17
    0.17190811 = sum of:
      0.17190811 = product of:
        0.53721285 = sum of:
          0.049794886 = weight(abstract_txt:enhancing in 1134) [ClassicSimilarity], result of:
            0.049794886 = score(doc=1134,freq=1.0), product of:
              0.13553943 = queryWeight, product of:
                6.717861 = idf(docFreq=145, maxDocs=44421)
                0.020175979 = queryNorm
              0.36738303 = fieldWeight in 1134, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.717861 = idf(docFreq=145, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1134)
          0.007663367 = weight(abstract_txt:with in 1134) [ClassicSimilarity], result of:
            0.007663367 = score(doc=1134,freq=1.0), product of:
              0.05613874 = queryWeight, product of:
                1.1147029 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.020175979 = queryNorm
              0.13650765 = fieldWeight in 1134, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1134)
          0.14556545 = weight(abstract_txt:transforming in 1134) [ClassicSimilarity], result of:
            0.14556545 = score(doc=1134,freq=4.0), product of:
              0.17456669 = queryWeight, product of:
                1.1348746 = boost
                7.62393 = idf(docFreq=58, maxDocs=44421)
                0.020175979 = queryNorm
              0.8338673 = fieldWeight in 1134, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.62393 = idf(docFreq=58, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1134)
          0.022306563 = weight(abstract_txt:many in 1134) [ClassicSimilarity], result of:
            0.022306563 = score(doc=1134,freq=1.0), product of:
              0.09997872 = queryWeight, product of:
                1.2146076 = boost
                4.0797825 = idf(docFreq=2041, maxDocs=44421)
                0.020175979 = queryNorm
              0.2231131 = fieldWeight in 1134, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0797825 = idf(docFreq=2041, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1134)
          0.01597826 = weight(abstract_txt:library in 1134) [ClassicSimilarity], result of:
            0.01597826 = score(doc=1134,freq=1.0), product of:
              0.09162259 = queryWeight, product of:
                1.4240626 = boost
                3.188885 = idf(docFreq=4976, maxDocs=44421)
                0.020175979 = queryNorm
              0.17439215 = fieldWeight in 1134, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.188885 = idf(docFreq=4976, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1134)
          0.04338242 = weight(abstract_txt:tools in 1134) [ClassicSimilarity], result of:
            0.04338242 = score(doc=1134,freq=1.0), product of:
              0.17831673 = queryWeight, product of:
                1.9866613 = boost
                4.448705 = idf(docFreq=1411, maxDocs=44421)
                0.020175979 = queryNorm
              0.24328856 = fieldWeight in 1134, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.448705 = idf(docFreq=1411, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1134)
          0.08094546 = weight(abstract_txt:metadata in 1134) [ClassicSimilarity], result of:
            0.08094546 = score(doc=1134,freq=2.0), product of:
              0.2145036 = queryWeight, product of:
                2.178939 = boost
                4.87927 = idf(docFreq=917, maxDocs=44421)
                0.020175979 = queryNorm
              0.37736177 = fieldWeight in 1134, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.87927 = idf(docFreq=917, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1134)
          0.17157641 = weight(abstract_txt:data in 1134) [ClassicSimilarity], result of:
            0.17157641 = score(doc=1134,freq=8.0), product of:
              0.33308175 = queryWeight, product of:
                4.95727 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.020175979 = queryNorm
              0.515118 = fieldWeight in 1134, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1134)
        0.32 = coord(8/25)
    
  4. Lynch, J.D.; Gibson, J.; Han, M.-J.: Analyzing and normalizing type metadata for a large aggregated digital library (2020) 0.17
    0.17086947 = sum of:
      0.17086947 = product of:
        0.71195614 = sum of:
          0.08536266 = weight(abstract_txt:enhancing in 720) [ClassicSimilarity], result of:
            0.08536266 = score(doc=720,freq=1.0), product of:
              0.13553943 = queryWeight, product of:
                6.717861 = idf(docFreq=145, maxDocs=44421)
                0.020175979 = queryNorm
              0.6297995 = fieldWeight in 720, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.717861 = idf(docFreq=145, maxDocs=44421)
                0.09375 = fieldNorm(doc=720)
          0.013137201 = weight(abstract_txt:with in 720) [ClassicSimilarity], result of:
            0.013137201 = score(doc=720,freq=1.0), product of:
              0.05613874 = queryWeight, product of:
                1.1147029 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.020175979 = queryNorm
              0.23401311 = fieldWeight in 720, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.09375 = fieldNorm(doc=720)
          0.24172829 = weight(abstract_txt:openrefine in 720) [ClassicSimilarity], result of:
            0.24172829 = score(doc=720,freq=1.0), product of:
              0.27129304 = queryWeight, product of:
                1.4147722 = boost
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.020175979 = queryNorm
              0.8910228 = fieldWeight in 720, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.09375 = fieldNorm(doc=720)
          0.027391303 = weight(abstract_txt:library in 720) [ClassicSimilarity], result of:
            0.027391303 = score(doc=720,freq=1.0), product of:
              0.09162259 = queryWeight, product of:
                1.4240626 = boost
                3.188885 = idf(docFreq=4976, maxDocs=44421)
                0.020175979 = queryNorm
              0.29895797 = fieldWeight in 720, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.188885 = idf(docFreq=4976, maxDocs=44421)
                0.09375 = fieldNorm(doc=720)
          0.2403457 = weight(abstract_txt:metadata in 720) [ClassicSimilarity], result of:
            0.2403457 = score(doc=720,freq=6.0), product of:
              0.2145036 = queryWeight, product of:
                2.178939 = boost
                4.87927 = idf(docFreq=917, maxDocs=44421)
                0.020175979 = queryNorm
              1.120474 = fieldWeight in 720, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.87927 = idf(docFreq=917, maxDocs=44421)
                0.09375 = fieldNorm(doc=720)
          0.103991 = weight(abstract_txt:data in 720) [ClassicSimilarity], result of:
            0.103991 = score(doc=720,freq=1.0), product of:
              0.33308175 = queryWeight, product of:
                4.95727 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.020175979 = queryNorm
              0.31220865 = fieldWeight in 720, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.09375 = fieldNorm(doc=720)
        0.24 = coord(6/25)
    
  5. Hooland, S. van; Verborgh, R.: Linked data for Lilibraries, archives and museums : how to clean, link, and publish your metadata (2014) 0.16
    0.16208431 = sum of:
      0.16208431 = product of:
        0.57887256 = sum of:
          0.0092894025 = weight(abstract_txt:with in 153) [ClassicSimilarity], result of:
            0.0092894025 = score(doc=153,freq=2.0), product of:
              0.05613874 = queryWeight, product of:
                1.1147029 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.020175979 = queryNorm
              0.16547224 = fieldWeight in 153, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.046875 = fieldNorm(doc=153)
          0.019658148 = weight(abstract_txt:each in 153) [ClassicSimilarity], result of:
            0.019658148 = score(doc=153,freq=1.0), product of:
              0.10184634 = queryWeight, product of:
                1.2258996 = boost
                4.1177115 = idf(docFreq=1965, maxDocs=44421)
                0.020175979 = queryNorm
              0.19301772 = fieldWeight in 153, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1177115 = idf(docFreq=1965, maxDocs=44421)
                0.046875 = fieldNorm(doc=153)
          0.17736983 = weight(abstract_txt:reconciliation in 153) [ClassicSimilarity], result of:
            0.17736983 = score(doc=153,freq=3.0), product of:
              0.24291432 = queryWeight, product of:
                1.3387322 = boost
                8.993418 = idf(docFreq=14, maxDocs=44421)
                0.020175979 = queryNorm
              0.7301745 = fieldWeight in 153, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.993418 = idf(docFreq=14, maxDocs=44421)
                0.046875 = fieldNorm(doc=153)
          0.013695652 = weight(abstract_txt:library in 153) [ClassicSimilarity], result of:
            0.013695652 = score(doc=153,freq=1.0), product of:
              0.09162259 = queryWeight, product of:
                1.4240626 = boost
                3.188885 = idf(docFreq=4976, maxDocs=44421)
                0.020175979 = queryNorm
              0.14947899 = fieldWeight in 153, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.188885 = idf(docFreq=4976, maxDocs=44421)
                0.046875 = fieldNorm(doc=153)
          0.05258743 = weight(abstract_txt:tools in 153) [ClassicSimilarity], result of:
            0.05258743 = score(doc=153,freq=2.0), product of:
              0.17831673 = queryWeight, product of:
                1.9866613 = boost
                4.448705 = idf(docFreq=1411, maxDocs=44421)
                0.020175979 = queryNorm
              0.29491025 = fieldWeight in 153, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.448705 = idf(docFreq=1411, maxDocs=44421)
                0.046875 = fieldNorm(doc=153)
          0.20228106 = weight(abstract_txt:metadata in 153) [ClassicSimilarity], result of:
            0.20228106 = score(doc=153,freq=17.0), product of:
              0.2145036 = queryWeight, product of:
                2.178939 = boost
                4.87927 = idf(docFreq=917, maxDocs=44421)
                0.020175979 = queryNorm
              0.9430194 = fieldWeight in 153, product of:
                4.1231055 = tf(freq=17.0), with freq of:
                  17.0 = termFreq=17.0
                4.87927 = idf(docFreq=917, maxDocs=44421)
                0.046875 = fieldNorm(doc=153)
          0.103991 = weight(abstract_txt:data in 153) [ClassicSimilarity], result of:
            0.103991 = score(doc=153,freq=4.0), product of:
              0.33308175 = queryWeight, product of:
                4.95727 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.020175979 = queryNorm
              0.31220865 = fieldWeight in 153, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.046875 = fieldNorm(doc=153)
        0.28 = coord(7/25)