Document (#39885)

Author
Stephens, O.
Title
Introduction to OpenRefine
Source
http://www.meanboyfriend.com/overdue_ideas/wp-content/uploads/2014/11/Introduction-to-OpenRefine-handout-CC-BY.pdf
Year
2014
Abstract
OpenRefine is described as a tool for working with 'messy' data - but what does this mean? It is probably easiest to describe the kinds of data OpenRefine is good at working with and the sorts of problems it can help you solve. OpenRefine is most useful where you have data in a simple tabular format but with internal inconsistencies either in data formats, or where data appears, or in terminology used. It can help you: Get an overview of a data set Resolve inconsistencies in a data set Help you split data up into more granular parts Match local data up to other data sets Enhance a data set with data from other sources Some common scenarios might be: 1. Where you want to know how many times a particular value appears in a column in your data. 2. Where you want to know how values are distributed across your whole data set. 3. Where you have a list of dates which are formatted in different ways, and want to change all the dates in the list to a single common date format.
Theme
Formalerschließung
Datenformate

Similar documents (author)

  1. Stephens, A.: ¬The history of the British National Bibliography 1950-1973 : a catalogue of achievement (1994) 5.71
    5.7074614 = sum of:
      5.7074614 = weight(author_txt:stephens in 2674) [ClassicSimilarity], result of:
        5.7074614 = fieldWeight in 2674, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.131938 = idf(docFreq=12, maxDocs=44218)
          0.625 = fieldNorm(doc=2674)
    
  2. Stephens, A.: Recent developments in the British national bibliographic services (1987) 5.71
    5.7074614 = sum of:
      5.7074614 = weight(author_txt:stephens in 5708) [ClassicSimilarity], result of:
        5.7074614 = fieldWeight in 5708, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.131938 = idf(docFreq=12, maxDocs=44218)
          0.625 = fieldNorm(doc=5708)
    
  3. Stephens, D.O.: ISO 9000 and international records management (1996) 5.71
    5.7074614 = sum of:
      5.7074614 = weight(author_txt:stephens in 6781) [ClassicSimilarity], result of:
        5.7074614 = fieldWeight in 6781, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.131938 = idf(docFreq=12, maxDocs=44218)
          0.625 = fieldNorm(doc=6781)
    
  4. Stephens, I.E.: Getting more out of call numbers : displaying holdings, locations and circulation status (1991) 5.71
    5.7074614 = sum of:
      5.7074614 = weight(author_txt:stephens in 374) [ClassicSimilarity], result of:
        5.7074614 = fieldWeight in 374, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.131938 = idf(docFreq=12, maxDocs=44218)
          0.625 = fieldNorm(doc=374)
    
  5. Stephens, D.: Managing the Web-enhanced geographic information service (1997) 5.71
    5.7074614 = sum of:
      5.7074614 = weight(author_txt:stephens in 2719) [ClassicSimilarity], result of:
        5.7074614 = fieldWeight in 2719, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.131938 = idf(docFreq=12, maxDocs=44218)
          0.625 = fieldNorm(doc=2719)
    

Similar documents (content)

  1. Harlow, C.: Data munging tools in Preparation for RDF : Catmandu and LODRefine (2015) 0.20
    0.19875494 = sum of:
      0.19875494 = product of:
        0.82814556 = sum of:
          0.012191317 = weight(abstract_txt:with in 2277) [ClassicSimilarity], result of:
            0.012191317 = score(doc=2277,freq=3.0), product of:
              0.04505223 = queryWeight, product of:
                1.3549794 = boost
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.013301171 = queryNorm
              0.27060407 = fieldWeight in 2277, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.0625 = fieldNorm(doc=2277)
          0.05176549 = weight(abstract_txt:know in 2277) [ClassicSimilarity], result of:
            0.05176549 = score(doc=2277,freq=1.0), product of:
              0.13523003 = queryWeight, product of:
                1.6599543 = boost
                6.124733 = idf(docFreq=262, maxDocs=44218)
                0.013301171 = queryNorm
              0.3827958 = fieldWeight in 2277, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.124733 = idf(docFreq=262, maxDocs=44218)
                0.0625 = fieldNorm(doc=2277)
          0.037714653 = weight(abstract_txt:help in 2277) [ClassicSimilarity], result of:
            0.037714653 = score(doc=2277,freq=1.0), product of:
              0.1253382 = queryWeight, product of:
                1.9572527 = boost
                4.81445 = idf(docFreq=974, maxDocs=44218)
                0.013301171 = queryNorm
              0.3009031 = fieldWeight in 2277, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.81445 = idf(docFreq=974, maxDocs=44218)
                0.0625 = fieldNorm(doc=2277)
          0.16444999 = weight(abstract_txt:want in 2277) [ClassicSimilarity], result of:
            0.16444999 = score(doc=2277,freq=3.0), product of:
              0.23194884 = queryWeight, product of:
                2.6625714 = boost
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.013301171 = queryNorm
              0.70899254 = fieldWeight in 2277, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.0625 = fieldNorm(doc=2277)
          0.38630775 = weight(abstract_txt:openrefine in 2277) [ClassicSimilarity], result of:
            0.38630775 = score(doc=2277,freq=1.0), product of:
              0.6506467 = queryWeight, product of:
                5.1492877 = boost
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.013301171 = queryNorm
              0.5937289 = fieldWeight in 2277, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.0625 = fieldNorm(doc=2277)
          0.17571636 = weight(abstract_txt:data in 2277) [ClassicSimilarity], result of:
            0.17571636 = score(doc=2277,freq=9.0), product of:
              0.28089213 = queryWeight, product of:
                6.3296304 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.013301171 = queryNorm
              0.62556523 = fieldWeight in 2277, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.0625 = fieldNorm(doc=2277)
        0.24 = coord(6/25)
    
  2. Ahmed, M.; Mukhopadhyay, M.; Mukhopadhyay, P.: Automated knowledge organization : AI ML based subject indexing system for libraries (2023) 0.13
    0.12980878 = sum of:
      0.12980878 = product of:
        0.8113049 = sum of:
          0.017241128 = weight(abstract_txt:with in 977) [ClassicSimilarity], result of:
            0.017241128 = score(doc=977,freq=6.0), product of:
              0.04505223 = queryWeight, product of:
                1.3549794 = boost
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.013301171 = queryNorm
              0.38269198 = fieldWeight in 977, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.0625 = fieldNorm(doc=977)
          0.042125713 = weight(abstract_txt:format in 977) [ClassicSimilarity], result of:
            0.042125713 = score(doc=977,freq=2.0), product of:
              0.09355517 = queryWeight, product of:
                1.3806813 = boost
                5.0942993 = idf(docFreq=736, maxDocs=44218)
                0.013301171 = queryNorm
              0.4502767 = fieldWeight in 977, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.0942993 = idf(docFreq=736, maxDocs=44218)
                0.0625 = fieldNorm(doc=977)
          0.66910464 = weight(abstract_txt:openrefine in 977) [ClassicSimilarity], result of:
            0.66910464 = score(doc=977,freq=3.0), product of:
              0.6506467 = queryWeight, product of:
                5.1492877 = boost
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.013301171 = queryNorm
              1.0283686 = fieldWeight in 977, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.0625 = fieldNorm(doc=977)
          0.082833484 = weight(abstract_txt:data in 977) [ClassicSimilarity], result of:
            0.082833484 = score(doc=977,freq=2.0), product of:
              0.28089213 = queryWeight, product of:
                6.3296304 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.013301171 = queryNorm
              0.29489428 = fieldWeight in 977, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.0625 = fieldNorm(doc=977)
        0.16 = coord(4/25)
    
  3. Durand, J.J.: Making your MARC (1997) 0.11
    0.11478664 = sum of:
      0.11478664 = product of:
        0.7174165 = sum of:
          0.0105579905 = weight(abstract_txt:with in 871) [ClassicSimilarity], result of:
            0.0105579905 = score(doc=871,freq=1.0), product of:
              0.04505223 = queryWeight, product of:
                1.3549794 = boost
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.013301171 = queryNorm
              0.23435001 = fieldWeight in 871, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.09375 = fieldNorm(doc=871)
          0.044681065 = weight(abstract_txt:format in 871) [ClassicSimilarity], result of:
            0.044681065 = score(doc=871,freq=1.0), product of:
              0.09355517 = queryWeight, product of:
                1.3806813 = boost
                5.0942993 = idf(docFreq=736, maxDocs=44218)
                0.013301171 = queryNorm
              0.47759056 = fieldWeight in 871, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0942993 = idf(docFreq=736, maxDocs=44218)
                0.09375 = fieldNorm(doc=871)
          0.6056055 = weight(title_txt:your in 871) [ClassicSimilarity], result of:
            0.6056055 = score(doc=871,freq=1.0), product of:
              0.17422594 = queryWeight, product of:
                1.884152 = boost
                6.9519553 = idf(docFreq=114, maxDocs=44218)
                0.013301171 = queryNorm
              3.4759777 = fieldWeight in 871, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9519553 = idf(docFreq=114, maxDocs=44218)
                0.5 = fieldNorm(doc=871)
          0.056571983 = weight(abstract_txt:help in 871) [ClassicSimilarity], result of:
            0.056571983 = score(doc=871,freq=1.0), product of:
              0.1253382 = queryWeight, product of:
                1.9572527 = boost
                4.81445 = idf(docFreq=974, maxDocs=44218)
                0.013301171 = queryNorm
              0.45135468 = fieldWeight in 871, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.81445 = idf(docFreq=974, maxDocs=44218)
                0.09375 = fieldNorm(doc=871)
        0.16 = coord(4/25)
    
  4. Hooland, S. van; Verborgh, R.; Wilde, M. De; Hercher, J.; Mannens, E.; Wa, R.Van de: Evaluating the success of vocabulary reconciliation for cultural heritage collections (2013) 0.11
    0.10650219 = sum of:
      0.10650219 = product of:
        0.6656387 = sum of:
          0.008798325 = weight(abstract_txt:with in 662) [ClassicSimilarity], result of:
            0.008798325 = score(doc=662,freq=1.0), product of:
              0.04505223 = queryWeight, product of:
                1.3549794 = boost
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.013301171 = queryNorm
              0.19529167 = fieldWeight in 662, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.078125 = fieldNorm(doc=662)
          0.047143314 = weight(abstract_txt:help in 662) [ClassicSimilarity], result of:
            0.047143314 = score(doc=662,freq=1.0), product of:
              0.1253382 = queryWeight, product of:
                1.9572527 = boost
                4.81445 = idf(docFreq=974, maxDocs=44218)
                0.013301171 = queryNorm
              0.37612888 = fieldWeight in 662, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.81445 = idf(docFreq=974, maxDocs=44218)
                0.078125 = fieldNorm(doc=662)
          0.4828847 = weight(abstract_txt:openrefine in 662) [ClassicSimilarity], result of:
            0.4828847 = score(doc=662,freq=1.0), product of:
              0.6506467 = queryWeight, product of:
                5.1492877 = boost
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.013301171 = queryNorm
              0.74216115 = fieldWeight in 662, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.078125 = fieldNorm(doc=662)
          0.12681235 = weight(abstract_txt:data in 662) [ClassicSimilarity], result of:
            0.12681235 = score(doc=662,freq=3.0), product of:
              0.28089213 = queryWeight, product of:
                6.3296304 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.013301171 = queryNorm
              0.4514628 = fieldWeight in 662, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.078125 = fieldNorm(doc=662)
        0.16 = coord(4/25)
    
  5. Dawson, H.: Know it all, find it fast for academic libraries (2012) 0.10
    0.10473356 = sum of:
      0.10473356 = product of:
        0.3740484 = sum of:
          0.07853092 = weight(abstract_txt:split in 3728) [ClassicSimilarity], result of:
            0.07853092 = score(doc=3728,freq=1.0), product of:
              0.12212092 = queryWeight, product of:
                1.115423 = boost
                8.231152 = idf(docFreq=31, maxDocs=44218)
                0.013301171 = queryNorm
              0.6430587 = fieldWeight in 3728, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.231152 = idf(docFreq=31, maxDocs=44218)
                0.078125 = fieldNorm(doc=3728)
          0.044221207 = weight(abstract_txt:common in 3728) [ClassicSimilarity], result of:
            0.044221207 = score(doc=3728,freq=2.0), product of:
              0.083275385 = queryWeight, product of:
                1.3026204 = boost
                4.806278 = idf(docFreq=982, maxDocs=44218)
                0.013301171 = queryNorm
              0.53102374 = fieldWeight in 3728, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.806278 = idf(docFreq=982, maxDocs=44218)
                0.078125 = fieldNorm(doc=3728)
          0.008798325 = weight(abstract_txt:with in 3728) [ClassicSimilarity], result of:
            0.008798325 = score(doc=3728,freq=1.0), product of:
              0.04505223 = queryWeight, product of:
                1.3549794 = boost
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.013301171 = queryNorm
              0.19529167 = fieldWeight in 3728, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.078125 = fieldNorm(doc=3728)
          0.05825518 = weight(abstract_txt:working in 3728) [ClassicSimilarity], result of:
            0.05825518 = score(doc=3728,freq=2.0), product of:
              0.10007355 = queryWeight, product of:
                1.4279704 = boost
                5.268782 = idf(docFreq=618, maxDocs=44218)
                0.013301171 = queryNorm
              0.58212364 = fieldWeight in 3728, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.268782 = idf(docFreq=618, maxDocs=44218)
                0.078125 = fieldNorm(doc=3728)
          0.06470686 = weight(abstract_txt:know in 3728) [ClassicSimilarity], result of:
            0.06470686 = score(doc=3728,freq=1.0), product of:
              0.13523003 = queryWeight, product of:
                1.6599543 = boost
                6.124733 = idf(docFreq=262, maxDocs=44218)
                0.013301171 = queryNorm
              0.47849476 = fieldWeight in 3728, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.124733 = idf(docFreq=262, maxDocs=44218)
                0.078125 = fieldNorm(doc=3728)
          0.047143314 = weight(abstract_txt:help in 3728) [ClassicSimilarity], result of:
            0.047143314 = score(doc=3728,freq=1.0), product of:
              0.1253382 = queryWeight, product of:
                1.9572527 = boost
                4.81445 = idf(docFreq=974, maxDocs=44218)
                0.013301171 = queryNorm
              0.37612888 = fieldWeight in 3728, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.81445 = idf(docFreq=974, maxDocs=44218)
                0.078125 = fieldNorm(doc=3728)
          0.07239262 = weight(abstract_txt:where in 3728) [ClassicSimilarity], result of:
            0.07239262 = score(doc=3728,freq=1.0), product of:
              0.19779523 = queryWeight, product of:
                3.1742232 = boost
                4.684772 = idf(docFreq=1109, maxDocs=44218)
                0.013301171 = queryNorm
              0.36599782 = fieldWeight in 3728, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.684772 = idf(docFreq=1109, maxDocs=44218)
                0.078125 = fieldNorm(doc=3728)
        0.28 = coord(7/25)