Document (#39885)

Author
Stephens, O.
Title
Introduction to OpenRefine
Source
http://www.meanboyfriend.com/overdue_ideas/wp-content/uploads/2014/11/Introduction-to-OpenRefine-handout-CC-BY.pdf
Year
2014
Abstract
OpenRefine is described as a tool for working with 'messy' data - but what does this mean? It is probably easiest to describe the kinds of data OpenRefine is good at working with and the sorts of problems it can help you solve. OpenRefine is most useful where you have data in a simple tabular format but with internal inconsistencies either in data formats, or where data appears, or in terminology used. It can help you: Get an overview of a data set Resolve inconsistencies in a data set Help you split data up into more granular parts Match local data up to other data sets Enhance a data set with data from other sources Some common scenarios might be: 1. Where you want to know how many times a particular value appears in a column in your data. 2. Where you want to know how values are distributed across your whole data set. 3. Where you have a list of dates which are formatted in different ways, and want to change all the dates in the list to a single common date format.
Theme
Formalerschließung
Datenformate

Similar documents (author)

  1. Stephens, A.: ¬The history of the British National Bibliography 1950-1973 : a catalogue of achievement (1994) 5.71
    5.7103243 = sum of:
      5.7103243 = weight(author_txt:stephens in 2673) [ClassicSimilarity], result of:
        5.7103243 = fieldWeight in 2673, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.1365185 = idf(docFreq=12, maxDocs=44421)
          0.625 = fieldNorm(doc=2673)
    
  2. Stephens, A.: Recent developments in the British national bibliographic services (1987) 5.71
    5.7103243 = sum of:
      5.7103243 = weight(author_txt:stephens in 5707) [ClassicSimilarity], result of:
        5.7103243 = fieldWeight in 5707, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.1365185 = idf(docFreq=12, maxDocs=44421)
          0.625 = fieldNorm(doc=5707)
    
  3. Stephens, D.O.: ISO 9000 and international records management (1996) 5.71
    5.7103243 = sum of:
      5.7103243 = weight(author_txt:stephens in 6849) [ClassicSimilarity], result of:
        5.7103243 = fieldWeight in 6849, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.1365185 = idf(docFreq=12, maxDocs=44421)
          0.625 = fieldNorm(doc=6849)
    
  4. Stephens, I.E.: Getting more out of call numbers : displaying holdings, locations and circulation status (1991) 5.71
    5.7103243 = sum of:
      5.7103243 = weight(author_txt:stephens in 1374) [ClassicSimilarity], result of:
        5.7103243 = fieldWeight in 1374, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.1365185 = idf(docFreq=12, maxDocs=44421)
          0.625 = fieldNorm(doc=1374)
    
  5. Stephens, D.: Managing the Web-enhanced geographic information service (1997) 5.71
    5.7103243 = sum of:
      5.7103243 = weight(author_txt:stephens in 3719) [ClassicSimilarity], result of:
        5.7103243 = fieldWeight in 3719, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.1365185 = idf(docFreq=12, maxDocs=44421)
          0.625 = fieldNorm(doc=3719)
    

Similar documents (content)

  1. Harlow, C.: Data munging tools in Preparation for RDF : Catmandu and LODRefine (2015) 0.20
    0.1988474 = sum of:
      0.1988474 = product of:
        0.82853085 = sum of:
          0.012152786 = weight(abstract_txt:with in 3277) [ClassicSimilarity], result of:
            0.012152786 = score(doc=3277,freq=3.0), product of:
              0.044974495 = queryWeight, product of:
                1.3521922 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.013324747 = queryNorm
              0.27021506 = fieldWeight in 3277, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.0625 = fieldNorm(doc=3277)
          0.051845018 = weight(abstract_txt:know in 3277) [ClassicSimilarity], result of:
            0.051845018 = score(doc=3277,freq=1.0), product of:
              0.13542043 = queryWeight, product of:
                1.6591375 = boost
                6.1255183 = idf(docFreq=263, maxDocs=44421)
                0.013324747 = queryNorm
              0.3828449 = fieldWeight in 3277, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1255183 = idf(docFreq=263, maxDocs=44421)
                0.0625 = fieldNorm(doc=3277)
          0.037721504 = weight(abstract_txt:help in 3277) [ClassicSimilarity], result of:
            0.037721504 = score(doc=3277,freq=1.0), product of:
              0.12540145 = queryWeight, product of:
                1.9554073 = boost
                4.8128953 = idf(docFreq=980, maxDocs=44421)
                0.013324747 = queryNorm
              0.30080596 = fieldWeight in 3277, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8128953 = idf(docFreq=980, maxDocs=44421)
                0.0625 = fieldNorm(doc=3277)
          0.16454753 = weight(abstract_txt:want in 3277) [ClassicSimilarity], result of:
            0.16454753 = score(doc=3277,freq=3.0), product of:
              0.23212954 = queryWeight, product of:
                2.6604257 = boost
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.013324747 = queryNorm
              0.7088608 = fieldWeight in 3277, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.0625 = fieldNorm(doc=3277)
          0.3873121 = weight(abstract_txt:openrefine in 3277) [ClassicSimilarity], result of:
            0.3873121 = score(doc=3277,freq=1.0), product of:
              0.6520239 = queryWeight, product of:
                5.148575 = boost
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.013324747 = queryNorm
              0.5940152 = fieldWeight in 3277, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.0625 = fieldNorm(doc=3277)
          0.17495193 = weight(abstract_txt:data in 3277) [ClassicSimilarity], result of:
            0.17495193 = score(doc=3277,freq=9.0), product of:
              0.2801843 = queryWeight, product of:
                6.3140965 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.013324747 = queryNorm
              0.6244173 = fieldWeight in 3277, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.0625 = fieldNorm(doc=3277)
        0.24 = coord(6/25)
    
  2. Ahmed, M.; Mukhopadhyay, M.; Mukhopadhyay, P.: Automated knowledge organization : AI ML based subject indexing system for libraries (2023) 0.13
    0.13003594 = sum of:
      0.13003594 = product of:
        0.81272465 = sum of:
          0.017186634 = weight(abstract_txt:with in 1979) [ClassicSimilarity], result of:
            0.017186634 = score(doc=1979,freq=6.0), product of:
              0.044974495 = queryWeight, product of:
                1.3521922 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.013324747 = queryNorm
              0.3821418 = fieldWeight in 1979, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.0625 = fieldNorm(doc=1979)
          0.042220667 = weight(abstract_txt:format in 1979) [ClassicSimilarity], result of:
            0.042220667 = score(doc=1979,freq=2.0), product of:
              0.09373164 = queryWeight, product of:
                1.3803315 = boost
                5.0961695 = idf(docFreq=738, maxDocs=44421)
                0.013324747 = queryNorm
              0.450442 = fieldWeight in 1979, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.0961695 = idf(docFreq=738, maxDocs=44421)
                0.0625 = fieldNorm(doc=1979)
          0.67084426 = weight(abstract_txt:openrefine in 1979) [ClassicSimilarity], result of:
            0.67084426 = score(doc=1979,freq=3.0), product of:
              0.6520239 = queryWeight, product of:
                5.148575 = boost
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.013324747 = queryNorm
              1.0288645 = fieldWeight in 1979, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.0625 = fieldNorm(doc=1979)
          0.08247312 = weight(abstract_txt:data in 1979) [ClassicSimilarity], result of:
            0.08247312 = score(doc=1979,freq=2.0), product of:
              0.2801843 = queryWeight, product of:
                6.3140965 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.013324747 = queryNorm
              0.29435313 = fieldWeight in 1979, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.0625 = fieldNorm(doc=1979)
        0.16 = coord(4/25)
    
  3. Durand, J.J.: Making your MARC (1997) 0.11
    0.11474 = sum of:
      0.11474 = product of:
        0.717125 = sum of:
          0.010524621 = weight(abstract_txt:with in 871) [ClassicSimilarity], result of:
            0.010524621 = score(doc=871,freq=1.0), product of:
              0.044974495 = queryWeight, product of:
                1.3521922 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.013324747 = queryNorm
              0.23401311 = fieldWeight in 871, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.09375 = fieldNorm(doc=871)
          0.04478178 = weight(abstract_txt:format in 871) [ClassicSimilarity], result of:
            0.04478178 = score(doc=871,freq=1.0), product of:
              0.09373164 = queryWeight, product of:
                1.3803315 = boost
                5.0961695 = idf(docFreq=738, maxDocs=44421)
                0.013324747 = queryNorm
              0.4777659 = fieldWeight in 871, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0961695 = idf(docFreq=738, maxDocs=44421)
                0.09375 = fieldNorm(doc=871)
          0.6052363 = weight(title_txt:your in 871) [ClassicSimilarity], result of:
            0.6052363 = score(doc=871,freq=1.0), product of:
              0.17422192 = queryWeight, product of:
                1.8818789 = boost
                6.9478774 = idf(docFreq=115, maxDocs=44421)
                0.013324747 = queryNorm
              3.4739387 = fieldWeight in 871, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9478774 = idf(docFreq=115, maxDocs=44421)
                0.5 = fieldNorm(doc=871)
          0.056582257 = weight(abstract_txt:help in 871) [ClassicSimilarity], result of:
            0.056582257 = score(doc=871,freq=1.0), product of:
              0.12540145 = queryWeight, product of:
                1.9554073 = boost
                4.8128953 = idf(docFreq=980, maxDocs=44421)
                0.013324747 = queryNorm
              0.45120895 = fieldWeight in 871, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8128953 = idf(docFreq=980, maxDocs=44421)
                0.09375 = fieldNorm(doc=871)
        0.16 = coord(4/25)
    
  4. Hooland, S. van; Verborgh, R.; Wilde, M. De; Hercher, J.; Mannens, E.; Wa, R.Van de: Evaluating the success of vocabulary reconciliation for cultural heritage collections (2013) 0.11
    0.10661171 = sum of:
      0.10661171 = product of:
        0.6663232 = sum of:
          0.008770517 = weight(abstract_txt:with in 1662) [ClassicSimilarity], result of:
            0.008770517 = score(doc=1662,freq=1.0), product of:
              0.044974495 = queryWeight, product of:
                1.3521922 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.013324747 = queryNorm
              0.19501092 = fieldWeight in 1662, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.078125 = fieldNorm(doc=1662)
          0.04715188 = weight(abstract_txt:help in 1662) [ClassicSimilarity], result of:
            0.04715188 = score(doc=1662,freq=1.0), product of:
              0.12540145 = queryWeight, product of:
                1.9554073 = boost
                4.8128953 = idf(docFreq=980, maxDocs=44421)
                0.013324747 = queryNorm
              0.37600744 = fieldWeight in 1662, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8128953 = idf(docFreq=980, maxDocs=44421)
                0.078125 = fieldNorm(doc=1662)
          0.48414013 = weight(abstract_txt:openrefine in 1662) [ClassicSimilarity], result of:
            0.48414013 = score(doc=1662,freq=1.0), product of:
              0.6520239 = queryWeight, product of:
                5.148575 = boost
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.013324747 = queryNorm
              0.74251896 = fieldWeight in 1662, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.078125 = fieldNorm(doc=1662)
          0.12626067 = weight(abstract_txt:data in 1662) [ClassicSimilarity], result of:
            0.12626067 = score(doc=1662,freq=3.0), product of:
              0.2801843 = queryWeight, product of:
                6.3140965 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.013324747 = queryNorm
              0.45063436 = fieldWeight in 1662, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.078125 = fieldNorm(doc=1662)
        0.16 = coord(4/25)
    
  5. Dawson, H.: Know it all, find it fast for academic libraries (2012) 0.10
    0.10480013 = sum of:
      0.10480013 = product of:
        0.37428617 = sum of:
          0.078752644 = weight(abstract_txt:split in 4728) [ClassicSimilarity], result of:
            0.078752644 = score(doc=4728,freq=1.0), product of:
              0.1223976 = queryWeight, product of:
                1.1153514 = boost
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.013324747 = queryNorm
              0.6434166 = fieldWeight in 4728, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.078125 = fieldNorm(doc=4728)
          0.044230536 = weight(abstract_txt:common in 4728) [ClassicSimilarity], result of:
            0.044230536 = score(doc=4728,freq=2.0), product of:
              0.083319046 = queryWeight, product of:
                1.3014048 = boost
                4.8047733 = idf(docFreq=988, maxDocs=44421)
                0.013324747 = queryNorm
              0.53085744 = fieldWeight in 4728, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.8047733 = idf(docFreq=988, maxDocs=44421)
                0.078125 = fieldNorm(doc=4728)
          0.008770517 = weight(abstract_txt:with in 4728) [ClassicSimilarity], result of:
            0.008770517 = score(doc=4728,freq=1.0), product of:
              0.044974495 = queryWeight, product of:
                1.3521922 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.013324747 = queryNorm
              0.19501092 = fieldWeight in 4728, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.078125 = fieldNorm(doc=4728)
          0.058260467 = weight(abstract_txt:working in 4728) [ClassicSimilarity], result of:
            0.058260467 = score(doc=4728,freq=2.0), product of:
              0.10011799 = queryWeight, product of:
                1.4265807 = boost
                5.266921 = idf(docFreq=622, maxDocs=44421)
                0.013324747 = queryNorm
              0.58191806 = fieldWeight in 4728, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.266921 = idf(docFreq=622, maxDocs=44421)
                0.078125 = fieldNorm(doc=4728)
          0.064806275 = weight(abstract_txt:know in 4728) [ClassicSimilarity], result of:
            0.064806275 = score(doc=4728,freq=1.0), product of:
              0.13542043 = queryWeight, product of:
                1.6591375 = boost
                6.1255183 = idf(docFreq=263, maxDocs=44421)
                0.013324747 = queryNorm
              0.47855613 = fieldWeight in 4728, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1255183 = idf(docFreq=263, maxDocs=44421)
                0.078125 = fieldNorm(doc=4728)
          0.04715188 = weight(abstract_txt:help in 4728) [ClassicSimilarity], result of:
            0.04715188 = score(doc=4728,freq=1.0), product of:
              0.12540145 = queryWeight, product of:
                1.9554073 = boost
                4.8128953 = idf(docFreq=980, maxDocs=44421)
                0.013324747 = queryNorm
              0.37600744 = fieldWeight in 4728, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8128953 = idf(docFreq=980, maxDocs=44421)
                0.078125 = fieldNorm(doc=4728)
          0.07231385 = weight(abstract_txt:where in 4728) [ClassicSimilarity], result of:
            0.07231385 = score(doc=4728,freq=1.0), product of:
              0.19772753 = queryWeight, product of:
                3.1698875 = boost
                4.681277 = idf(docFreq=1118, maxDocs=44421)
                0.013324747 = queryNorm
              0.36572474 = fieldWeight in 4728, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.681277 = idf(docFreq=1118, maxDocs=44421)
                0.078125 = fieldNorm(doc=4728)
        0.28 = coord(7/25)