Document (#38575)

Author
Wisser, K.
Title
¬The errors of our ways : using metadata quality research to understand common error patterns in the application of name headings
Source
Metadata and semantics research: 8th Research Conference, MTSR 2014, Karlsruhe, Germany, November 27-29, 2014, Proceedings. Eds.: S. Closs et al
Imprint
Cham : Springer
Year
2014
Pages
S.83-94
Series
Communications in computer and information science; 478
Abstract
Using data culled during a metadata quality research project for the Social Network and Archival Context (SNAC) project, this article discusses common errors and problems in the use of standardized languages, specifically unambiguous names for persons and corporate bodies. Errors such as misspelling, qualifiers, format, and miss-encoding point to several areas where quality control measures can improve aggregation of data. Results from a large data set indicate that there are predictable problems that can be retrospectively corrected before aggregation. This research looked specifically at name formation and expression in metadata records, but the errors detected could be extended to other controlled vocabularies as well.
Theme
Metadaten
Formalerschließung

Similar documents (content)

  1. Beall, J.; Kafadar, K.: ¬The effectiveness of copy cotaloging at eliminating typographical errors in shared bibliographic records (2004) 0.28
    0.2752464 = sum of:
      0.2752464 = product of:
        1.1468601 = sum of:
          0.08422537 = weight(abstract_txt:error in 5849) [ClassicSimilarity], result of:
            0.08422537 = score(doc=5849,freq=1.0), product of:
              0.13101462 = queryWeight, product of:
                6.8572807 = idf(docFreq=126, maxDocs=44421)
                0.019105915 = queryNorm
              0.64287007 = fieldWeight in 5849, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8572807 = idf(docFreq=126, maxDocs=44421)
                0.09375 = fieldNorm(doc=5849)
          0.041560344 = weight(abstract_txt:problems in 5849) [ClassicSimilarity], result of:
            0.041560344 = score(doc=5849,freq=1.0), product of:
              0.103075124 = queryWeight, product of:
                1.2543885 = boost
                4.300847 = idf(docFreq=1636, maxDocs=44421)
                0.019105915 = queryNorm
              0.4032044 = fieldWeight in 5849, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.300847 = idf(docFreq=1636, maxDocs=44421)
                0.09375 = fieldNorm(doc=5849)
          0.257642 = weight(abstract_txt:corrected in 5849) [ClassicSimilarity], result of:
            0.257642 = score(doc=5849,freq=2.0), product of:
              0.21912515 = queryWeight, product of:
                1.2932612 = boost
                8.868255 = idf(docFreq=16, maxDocs=44421)
                0.019105915 = queryNorm
              1.1757755 = fieldWeight in 5849, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.868255 = idf(docFreq=16, maxDocs=44421)
                0.09375 = fieldNorm(doc=5849)
          0.028942058 = weight(abstract_txt:data in 5849) [ClassicSimilarity], result of:
            0.028942058 = score(doc=5849,freq=1.0), product of:
              0.09270101 = queryWeight, product of:
                1.4569445 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.019105915 = queryNorm
              0.31220865 = fieldWeight in 5849, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.09375 = fieldNorm(doc=5849)
          0.07850655 = weight(abstract_txt:quality in 5849) [ClassicSimilarity], result of:
            0.07850655 = score(doc=5849,freq=1.0), product of:
              0.18030265 = queryWeight, product of:
                2.0318975 = boost
                4.6444306 = idf(docFreq=1160, maxDocs=44421)
                0.019105915 = queryNorm
              0.4354154 = fieldWeight in 5849, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6444306 = idf(docFreq=1160, maxDocs=44421)
                0.09375 = fieldNorm(doc=5849)
          0.65598387 = weight(abstract_txt:errors in 5849) [ClassicSimilarity], result of:
            0.65598387 = score(doc=5849,freq=5.0), product of:
              0.47787747 = queryWeight, product of:
                3.8196924 = boost
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.019105915 = queryNorm
              1.3727031 = fieldWeight in 5849, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.09375 = fieldNorm(doc=5849)
        0.24 = coord(6/25)
    
  2. Pope, J.T.; Holley, R.P.: Google Book Search and metadata (2011) 0.16
    0.16340764 = sum of:
      0.16340764 = product of:
        0.68086517 = sum of:
          0.070187815 = weight(abstract_txt:error in 2887) [ClassicSimilarity], result of:
            0.070187815 = score(doc=2887,freq=1.0), product of:
              0.13101462 = queryWeight, product of:
                6.8572807 = idf(docFreq=126, maxDocs=44421)
                0.019105915 = queryNorm
              0.53572506 = fieldWeight in 2887, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8572807 = idf(docFreq=126, maxDocs=44421)
                0.078125 = fieldNorm(doc=2887)
          0.034633618 = weight(abstract_txt:problems in 2887) [ClassicSimilarity], result of:
            0.034633618 = score(doc=2887,freq=1.0), product of:
              0.103075124 = queryWeight, product of:
                1.2543885 = boost
                4.300847 = idf(docFreq=1636, maxDocs=44421)
                0.019105915 = queryNorm
              0.33600366 = fieldWeight in 2887, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.300847 = idf(docFreq=1636, maxDocs=44421)
                0.078125 = fieldNorm(doc=2887)
          0.036571648 = weight(abstract_txt:project in 2887) [ClassicSimilarity], result of:
            0.036571648 = score(doc=2887,freq=1.0), product of:
              0.10688538 = queryWeight, product of:
                1.2773628 = boost
                4.3796177 = idf(docFreq=1512, maxDocs=44421)
                0.019105915 = queryNorm
              0.34215763 = fieldWeight in 2887, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3796177 = idf(docFreq=1512, maxDocs=44421)
                0.078125 = fieldNorm(doc=2887)
          0.024118379 = weight(abstract_txt:data in 2887) [ClassicSimilarity], result of:
            0.024118379 = score(doc=2887,freq=1.0), product of:
              0.09270101 = queryWeight, product of:
                1.4569445 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.019105915 = queryNorm
              0.26017386 = fieldWeight in 2887, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.078125 = fieldNorm(doc=2887)
          0.16961986 = weight(abstract_txt:metadata in 2887) [ClassicSimilarity], result of:
            0.16961986 = score(doc=2887,freq=5.0), product of:
              0.19899714 = queryWeight, product of:
                2.1346376 = boost
                4.87927 = idf(docFreq=917, maxDocs=44421)
                0.019105915 = queryNorm
              0.85237336 = fieldWeight in 2887, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.87927 = idf(docFreq=917, maxDocs=44421)
                0.078125 = fieldNorm(doc=2887)
          0.34573388 = weight(abstract_txt:errors in 2887) [ClassicSimilarity], result of:
            0.34573388 = score(doc=2887,freq=2.0), product of:
              0.47787747 = queryWeight, product of:
                3.8196924 = boost
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.019105915 = queryNorm
              0.7234781 = fieldWeight in 2887, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.078125 = fieldNorm(doc=2887)
        0.24 = coord(6/25)
    
  3. Lardy, J.P.; Herzhaft, L.: Bibliometric treatments according to bibliographic errors and data heterogenity : the end-user point of view (1992) 0.16
    0.16172004 = sum of:
      0.16172004 = product of:
        0.8086002 = sum of:
          0.08195036 = weight(abstract_txt:common in 5132) [ClassicSimilarity], result of:
            0.08195036 = score(doc=5132,freq=2.0), product of:
              0.12864465 = queryWeight, product of:
                1.4013641 = boost
                4.8047733 = idf(docFreq=988, maxDocs=44421)
                0.019105915 = queryNorm
              0.63702893 = fieldWeight in 5132, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.8047733 = idf(docFreq=988, maxDocs=44421)
                0.09375 = fieldNorm(doc=5132)
          0.04093025 = weight(abstract_txt:data in 5132) [ClassicSimilarity], result of:
            0.04093025 = score(doc=5132,freq=2.0), product of:
              0.09270101 = queryWeight, product of:
                1.4569445 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.019105915 = queryNorm
              0.4415297 = fieldWeight in 5132, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.09375 = fieldNorm(doc=5132)
          0.09909009 = weight(abstract_txt:name in 5132) [ClassicSimilarity], result of:
            0.09909009 = score(doc=5132,freq=1.0), product of:
              0.1839591 = queryWeight, product of:
                1.6757752 = boost
                5.7456303 = idf(docFreq=385, maxDocs=44421)
                0.019105915 = queryNorm
              0.53865284 = fieldWeight in 5132, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7456303 = idf(docFreq=385, maxDocs=44421)
                0.09375 = fieldNorm(doc=5132)
          0.07850655 = weight(abstract_txt:quality in 5132) [ClassicSimilarity], result of:
            0.07850655 = score(doc=5132,freq=1.0), product of:
              0.18030265 = queryWeight, product of:
                2.0318975 = boost
                4.6444306 = idf(docFreq=1160, maxDocs=44421)
                0.019105915 = queryNorm
              0.4354154 = fieldWeight in 5132, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6444306 = idf(docFreq=1160, maxDocs=44421)
                0.09375 = fieldNorm(doc=5132)
          0.5081229 = weight(abstract_txt:errors in 5132) [ClassicSimilarity], result of:
            0.5081229 = score(doc=5132,freq=3.0), product of:
              0.47787747 = queryWeight, product of:
                3.8196924 = boost
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.019105915 = queryNorm
              1.0632912 = fieldWeight in 5132, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.09375 = fieldNorm(doc=5132)
        0.2 = coord(5/25)
    
  4. Tani, A.; Candela, L.; Castelli, D.: Dealing with metadata quality : the legacy of digital library efforts (2013) 0.15
    0.15284896 = sum of:
      0.15284896 = product of:
        0.6368707 = sum of:
          0.041560344 = weight(abstract_txt:problems in 3662) [ClassicSimilarity], result of:
            0.041560344 = score(doc=3662,freq=1.0), product of:
              0.103075124 = queryWeight, product of:
                1.2543885 = boost
                4.300847 = idf(docFreq=1636, maxDocs=44421)
                0.019105915 = queryNorm
              0.4032044 = fieldWeight in 3662, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.300847 = idf(docFreq=1636, maxDocs=44421)
                0.09375 = fieldNorm(doc=3662)
          0.057947658 = weight(abstract_txt:common in 3662) [ClassicSimilarity], result of:
            0.057947658 = score(doc=3662,freq=1.0), product of:
              0.12864465 = queryWeight, product of:
                1.4013641 = boost
                4.8047733 = idf(docFreq=988, maxDocs=44421)
                0.019105915 = queryNorm
              0.4504475 = fieldWeight in 3662, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8047733 = idf(docFreq=988, maxDocs=44421)
                0.09375 = fieldNorm(doc=3662)
          0.04093025 = weight(abstract_txt:data in 3662) [ClassicSimilarity], result of:
            0.04093025 = score(doc=3662,freq=2.0), product of:
              0.09270101 = queryWeight, product of:
                1.4569445 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.019105915 = queryNorm
              0.4415297 = fieldWeight in 3662, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.09375 = fieldNorm(doc=3662)
          0.13597733 = weight(abstract_txt:quality in 3662) [ClassicSimilarity], result of:
            0.13597733 = score(doc=3662,freq=3.0), product of:
              0.18030265 = queryWeight, product of:
                2.0318975 = boost
                4.6444306 = idf(docFreq=1160, maxDocs=44421)
                0.019105915 = queryNorm
              0.75416154 = fieldWeight in 3662, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.6444306 = idf(docFreq=1160, maxDocs=44421)
                0.09375 = fieldNorm(doc=3662)
          0.20279074 = weight(abstract_txt:aggregation in 3662) [ClassicSimilarity], result of:
            0.20279074 = score(doc=3662,freq=1.0), product of:
              0.29652855 = queryWeight, product of:
                2.127592 = boost
                7.2947483 = idf(docFreq=81, maxDocs=44421)
                0.019105915 = queryNorm
              0.68388265 = fieldWeight in 3662, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2947483 = idf(docFreq=81, maxDocs=44421)
                0.09375 = fieldNorm(doc=3662)
          0.15766437 = weight(abstract_txt:metadata in 3662) [ClassicSimilarity], result of:
            0.15766437 = score(doc=3662,freq=3.0), product of:
              0.19899714 = queryWeight, product of:
                2.1346376 = boost
                4.87927 = idf(docFreq=917, maxDocs=44421)
                0.019105915 = queryNorm
              0.7922947 = fieldWeight in 3662, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.87927 = idf(docFreq=917, maxDocs=44421)
                0.09375 = fieldNorm(doc=3662)
        0.24 = coord(6/25)
    
  5. Jarke, M.; Lenzerini, M.; Vassiliou, Y.: Fundamentals of data warehousing (1999) 0.14
    0.13581575 = sum of:
      0.13581575 = product of:
        0.565899 = sum of:
          0.021580754 = weight(abstract_txt:using in 2302) [ClassicSimilarity], result of:
            0.021580754 = score(doc=2302,freq=1.0), product of:
              0.06659049 = queryWeight, product of:
                1.0082337 = boost
                3.4568708 = idf(docFreq=3806, maxDocs=44421)
                0.019105915 = queryNorm
              0.32408163 = fieldWeight in 2302, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4568708 = idf(docFreq=3806, maxDocs=44421)
                0.09375 = fieldNorm(doc=2302)
          0.04388598 = weight(abstract_txt:project in 2302) [ClassicSimilarity], result of:
            0.04388598 = score(doc=2302,freq=1.0), product of:
              0.10688538 = queryWeight, product of:
                1.2773628 = boost
                4.3796177 = idf(docFreq=1512, maxDocs=44421)
                0.019105915 = queryNorm
              0.41058916 = fieldWeight in 2302, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3796177 = idf(docFreq=1512, maxDocs=44421)
                0.09375 = fieldNorm(doc=2302)
          0.057884116 = weight(abstract_txt:data in 2302) [ClassicSimilarity], result of:
            0.057884116 = score(doc=2302,freq=4.0), product of:
              0.09270101 = queryWeight, product of:
                1.4569445 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.019105915 = queryNorm
              0.6244173 = fieldWeight in 2302, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.09375 = fieldNorm(doc=2302)
          0.11102502 = weight(abstract_txt:quality in 2302) [ClassicSimilarity], result of:
            0.11102502 = score(doc=2302,freq=2.0), product of:
              0.18030265 = queryWeight, product of:
                2.0318975 = boost
                4.6444306 = idf(docFreq=1160, maxDocs=44421)
                0.019105915 = queryNorm
              0.61577034 = fieldWeight in 2302, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.6444306 = idf(docFreq=1160, maxDocs=44421)
                0.09375 = fieldNorm(doc=2302)
          0.20279074 = weight(abstract_txt:aggregation in 2302) [ClassicSimilarity], result of:
            0.20279074 = score(doc=2302,freq=1.0), product of:
              0.29652855 = queryWeight, product of:
                2.127592 = boost
                7.2947483 = idf(docFreq=81, maxDocs=44421)
                0.019105915 = queryNorm
              0.68388265 = fieldWeight in 2302, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2947483 = idf(docFreq=81, maxDocs=44421)
                0.09375 = fieldNorm(doc=2302)
          0.12873243 = weight(abstract_txt:metadata in 2302) [ClassicSimilarity], result of:
            0.12873243 = score(doc=2302,freq=2.0), product of:
              0.19899714 = queryWeight, product of:
                2.1346376 = boost
                4.87927 = idf(docFreq=917, maxDocs=44421)
                0.019105915 = queryNorm
              0.6469059 = fieldWeight in 2302, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.87927 = idf(docFreq=917, maxDocs=44421)
                0.09375 = fieldNorm(doc=2302)
        0.24 = coord(6/25)