Document (#266)

Author
Hustand, S.
Title
Problems of duplicate records
Source
Future of online catalogs. Essen Symposium, 30.9.-3.10.1985. Ed. by A.H. Helal, J.W. Weiss
Imprint
Essen : Gesamthochschulbibliothek
Year
1986
Pages
S.169-202
Series
Veröffentlichungen der Gesamthochschulbibliothek Essen; 8
Abstract
Duplicate records is a familiar problem in bibliographic databases. The problem is obvious when a union catalogue is established by automatically merging two or more separate and independent source of catalogue information. However, even in systems with on-line cataloguing and access to previous records, duplication is a problem. Author / title search search prior to cataloguing does not cut duplication to zero. A great deal of effort has been put into developing methods of duplicate detection. A major problem in this work has been efficiency. Particularly in the on-line setting is this of importance. Most studies have dealt with book and article material. The Research Libraries Group Inc. has described matching algorithms also for films, maps, recordings, scores and serials. Various methods of detecting duplicates will be discussed.
Theme
Formalerschließung

Similar documents (content)

  1. Cousins, S.A.: Duplicate detection and record consolidation in large bibliographic databases : the COPAC database experience (1998) 0.42
    0.42017803 = sum of:
      0.42017803 = product of:
        1.3130563 = sum of:
          0.07227797 = weight(abstract_txt:union in 3833) [ClassicSimilarity], result of:
            0.07227797 = score(doc=3833,freq=2.0), product of:
              0.10738443 = queryWeight, product of:
                6.0919957 = idf(docFreq=272, maxDocs=44421)
                0.017627135 = queryNorm
              0.6730768 = fieldWeight in 3833, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.0919957 = idf(docFreq=272, maxDocs=44421)
                0.078125 = fieldNorm(doc=3833)
          0.09938111 = weight(abstract_txt:detection in 3833) [ClassicSimilarity], result of:
            0.09938111 = score(doc=3833,freq=2.0), product of:
              0.13278222 = queryWeight, product of:
                1.1119859 = boost
                6.774214 = idf(docFreq=137, maxDocs=44421)
                0.017627135 = queryNorm
              0.74845195 = fieldWeight in 3833, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.774214 = idf(docFreq=137, maxDocs=44421)
                0.078125 = fieldNorm(doc=3833)
          0.021348432 = weight(abstract_txt:been in 3833) [ClassicSimilarity], result of:
            0.021348432 = score(doc=3833,freq=1.0), product of:
              0.07560224 = queryWeight, product of:
                1.18662 = boost
                3.614442 = idf(docFreq=3251, maxDocs=44421)
                0.017627135 = queryNorm
              0.2823783 = fieldWeight in 3833, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.614442 = idf(docFreq=3251, maxDocs=44421)
                0.078125 = fieldNorm(doc=3833)
          0.083757326 = weight(abstract_txt:catalogue in 3833) [ClassicSimilarity], result of:
            0.083757326 = score(doc=3833,freq=2.0), product of:
              0.14926657 = queryWeight, product of:
                1.6673455 = boost
                5.078731 = idf(docFreq=751, maxDocs=44421)
                0.017627135 = queryNorm
              0.5611258 = fieldWeight in 3833, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.078731 = idf(docFreq=751, maxDocs=44421)
                0.078125 = fieldNorm(doc=3833)
          0.1017195 = weight(abstract_txt:records in 3833) [ClassicSimilarity], result of:
            0.1017195 = score(doc=3833,freq=3.0), product of:
              0.16990918 = queryWeight, product of:
                2.1787047 = boost
                4.42422 = idf(docFreq=1446, maxDocs=44421)
                0.017627135 = queryNorm
              0.5986698 = fieldWeight in 3833, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.42422 = idf(docFreq=1446, maxDocs=44421)
                0.078125 = fieldNorm(doc=3833)
          0.24439617 = weight(abstract_txt:duplication in 3833) [ClassicSimilarity], result of:
            0.24439617 = score(doc=3833,freq=1.0), product of:
              0.3840198 = queryWeight, product of:
                2.674368 = boost
                8.146119 = idf(docFreq=34, maxDocs=44421)
                0.017627135 = queryNorm
              0.63641554 = fieldWeight in 3833, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.146119 = idf(docFreq=34, maxDocs=44421)
                0.078125 = fieldNorm(doc=3833)
          0.08018575 = weight(abstract_txt:problem in 3833) [ClassicSimilarity], result of:
            0.08018575 = score(doc=3833,freq=1.0), product of:
              0.23016122 = queryWeight, product of:
                2.9280293 = boost
                4.4593854 = idf(docFreq=1396, maxDocs=44421)
                0.017627135 = queryNorm
              0.34838948 = fieldWeight in 3833, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4593854 = idf(docFreq=1396, maxDocs=44421)
                0.078125 = fieldNorm(doc=3833)
          0.60999006 = weight(abstract_txt:duplicate in 3833) [ClassicSimilarity], result of:
            0.60999006 = score(doc=3833,freq=3.0), product of:
              0.5608274 = queryWeight, product of:
                3.9582622 = boost
                8.037906 = idf(docFreq=38, maxDocs=44421)
                0.017627135 = queryNorm
              1.087661 = fieldWeight in 3833, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.037906 = idf(docFreq=38, maxDocs=44421)
                0.078125 = fieldNorm(doc=3833)
        0.32 = coord(8/25)
    
  2. Süle, G.: Problems of duplicate records, standards and quality control (1986) 0.28
    0.28371438 = sum of:
      0.28371438 = product of:
        1.418572 = sum of:
          0.07193995 = weight(abstract_txt:cataloguing in 3060) [ClassicSimilarity], result of:
            0.07193995 = score(doc=3060,freq=1.0), product of:
              0.13578509 = queryWeight, product of:
                1.5902683 = boost
                4.8439536 = idf(docFreq=950, maxDocs=44421)
                0.017627135 = queryNorm
              0.52980745 = fieldWeight in 3060, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8439536 = idf(docFreq=950, maxDocs=44421)
                0.109375 = fieldNorm(doc=3060)
          0.08291552 = weight(abstract_txt:catalogue in 3060) [ClassicSimilarity], result of:
            0.08291552 = score(doc=3060,freq=1.0), product of:
              0.14926657 = queryWeight, product of:
                1.6673455 = boost
                5.078731 = idf(docFreq=751, maxDocs=44421)
                0.017627135 = queryNorm
              0.5554862 = fieldWeight in 3060, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.078731 = idf(docFreq=751, maxDocs=44421)
                0.109375 = fieldNorm(doc=3060)
          0.13521108 = weight(abstract_txt:line in 3060) [ClassicSimilarity], result of:
            0.13521108 = score(doc=3060,freq=1.0), product of:
              0.20679826 = queryWeight, product of:
                1.9625367 = boost
                5.9778824 = idf(docFreq=305, maxDocs=44421)
                0.017627135 = queryNorm
              0.6538309 = fieldWeight in 3060, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9778824 = idf(docFreq=305, maxDocs=44421)
                0.109375 = fieldNorm(doc=3060)
          0.1424073 = weight(abstract_txt:records in 3060) [ClassicSimilarity], result of:
            0.1424073 = score(doc=3060,freq=3.0), product of:
              0.16990918 = queryWeight, product of:
                2.1787047 = boost
                4.42422 = idf(docFreq=1446, maxDocs=44421)
                0.017627135 = queryNorm
              0.83813775 = fieldWeight in 3060, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.42422 = idf(docFreq=1446, maxDocs=44421)
                0.109375 = fieldNorm(doc=3060)
          0.9860982 = weight(abstract_txt:duplicate in 3060) [ClassicSimilarity], result of:
            0.9860982 = score(doc=3060,freq=4.0), product of:
              0.5608274 = queryWeight, product of:
                3.9582622 = boost
                8.037906 = idf(docFreq=38, maxDocs=44421)
                0.017627135 = queryNorm
              1.7582918 = fieldWeight in 3060, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                8.037906 = idf(docFreq=38, maxDocs=44421)
                0.109375 = fieldNorm(doc=3060)
        0.2 = coord(5/25)
    
  3. Sitas, A.; Kapidakis, S.: Duplicate detection algorithms of bibliographic descriptions (2008) 0.23
    0.23013206 = sum of:
      0.23013206 = product of:
        1.1506603 = sum of:
          0.14054613 = weight(abstract_txt:detection in 3543) [ClassicSimilarity], result of:
            0.14054613 = score(doc=3543,freq=4.0), product of:
              0.13278222 = queryWeight, product of:
                1.1119859 = boost
                6.774214 = idf(docFreq=137, maxDocs=44421)
                0.017627135 = queryNorm
              1.058471 = fieldWeight in 3543, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.774214 = idf(docFreq=137, maxDocs=44421)
                0.078125 = fieldNorm(doc=3543)
          0.09700022 = weight(abstract_txt:merging in 3543) [ClassicSimilarity], result of:
            0.09700022 = score(doc=3543,freq=1.0), product of:
              0.16461238 = queryWeight, product of:
                1.2381139 = boost
                7.5425844 = idf(docFreq=63, maxDocs=44421)
                0.017627135 = queryNorm
              0.5892644 = fieldWeight in 3543, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5425844 = idf(docFreq=63, maxDocs=44421)
                0.078125 = fieldNorm(doc=3543)
          0.058727782 = weight(abstract_txt:records in 3543) [ClassicSimilarity], result of:
            0.058727782 = score(doc=3543,freq=1.0), product of:
              0.16990918 = queryWeight, product of:
                2.1787047 = boost
                4.42422 = idf(docFreq=1446, maxDocs=44421)
                0.017627135 = queryNorm
              0.3456422 = fieldWeight in 3543, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.42422 = idf(docFreq=1446, maxDocs=44421)
                0.078125 = fieldNorm(doc=3543)
          0.24439617 = weight(abstract_txt:duplication in 3543) [ClassicSimilarity], result of:
            0.24439617 = score(doc=3543,freq=1.0), product of:
              0.3840198 = queryWeight, product of:
                2.674368 = boost
                8.146119 = idf(docFreq=34, maxDocs=44421)
                0.017627135 = queryNorm
              0.63641554 = fieldWeight in 3543, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.146119 = idf(docFreq=34, maxDocs=44421)
                0.078125 = fieldNorm(doc=3543)
          0.60999006 = weight(abstract_txt:duplicate in 3543) [ClassicSimilarity], result of:
            0.60999006 = score(doc=3543,freq=3.0), product of:
              0.5608274 = queryWeight, product of:
                3.9582622 = boost
                8.037906 = idf(docFreq=38, maxDocs=44421)
                0.017627135 = queryNorm
              1.087661 = fieldWeight in 3543, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.037906 = idf(docFreq=38, maxDocs=44421)
                0.078125 = fieldNorm(doc=3543)
        0.2 = coord(5/25)
    
  4. Weiss, P.J.: Getting the expert into the system : expert systems and cataloging (1995) 0.20
    0.2041432 = sum of:
      0.2041432 = product of:
        1.020716 = sum of:
          0.107376866 = weight(abstract_txt:serials in 2465) [ClassicSimilarity], result of:
            0.107376866 = score(doc=2465,freq=1.0), product of:
              0.12876797 = queryWeight, product of:
                1.0950483 = boost
                6.6710296 = idf(docFreq=152, maxDocs=44421)
                0.017627135 = queryNorm
              0.8338787 = fieldWeight in 2465, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6710296 = idf(docFreq=152, maxDocs=44421)
                0.125 = fieldNorm(doc=2465)
          0.11243689 = weight(abstract_txt:detection in 2465) [ClassicSimilarity], result of:
            0.11243689 = score(doc=2465,freq=1.0), product of:
              0.13278222 = queryWeight, product of:
                1.1119859 = boost
                6.774214 = idf(docFreq=137, maxDocs=44421)
                0.017627135 = queryNorm
              0.8467767 = fieldWeight in 2465, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.774214 = idf(docFreq=137, maxDocs=44421)
                0.125 = fieldNorm(doc=2465)
          0.15520035 = weight(abstract_txt:merging in 2465) [ClassicSimilarity], result of:
            0.15520035 = score(doc=2465,freq=1.0), product of:
              0.16461238 = queryWeight, product of:
                1.2381139 = boost
                7.5425844 = idf(docFreq=63, maxDocs=44421)
                0.017627135 = queryNorm
              0.94282305 = fieldWeight in 2465, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5425844 = idf(docFreq=63, maxDocs=44421)
                0.125 = fieldNorm(doc=2465)
          0.08221708 = weight(abstract_txt:cataloguing in 2465) [ClassicSimilarity], result of:
            0.08221708 = score(doc=2465,freq=1.0), product of:
              0.13578509 = queryWeight, product of:
                1.5902683 = boost
                4.8439536 = idf(docFreq=950, maxDocs=44421)
                0.017627135 = queryNorm
              0.6054942 = fieldWeight in 2465, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8439536 = idf(docFreq=950, maxDocs=44421)
                0.125 = fieldNorm(doc=2465)
          0.56348467 = weight(abstract_txt:duplicate in 2465) [ClassicSimilarity], result of:
            0.56348467 = score(doc=2465,freq=1.0), product of:
              0.5608274 = queryWeight, product of:
                3.9582622 = boost
                8.037906 = idf(docFreq=38, maxDocs=44421)
                0.017627135 = queryNorm
              1.0047382 = fieldWeight in 2465, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.037906 = idf(docFreq=38, maxDocs=44421)
                0.125 = fieldNorm(doc=2465)
        0.2 = coord(5/25)
    
  5. Conrad, J.G.; Schriber, C.P.: Managing déjà vu : collection building for the identification of nonidentical duplicate documents (2006) 0.20
    0.20039435 = sum of:
      0.20039435 = product of:
        1.0019717 = sum of:
          0.070273064 = weight(abstract_txt:detection in 59) [ClassicSimilarity], result of:
            0.070273064 = score(doc=59,freq=1.0), product of:
              0.13278222 = queryWeight, product of:
                1.1119859 = boost
                6.774214 = idf(docFreq=137, maxDocs=44421)
                0.017627135 = queryNorm
              0.5292355 = fieldWeight in 59, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.774214 = idf(docFreq=137, maxDocs=44421)
                0.078125 = fieldNorm(doc=59)
          0.031208726 = weight(abstract_txt:search in 59) [ClassicSimilarity], result of:
            0.031208726 = score(doc=59,freq=2.0), product of:
              0.07729144 = queryWeight, product of:
                1.1998032 = boost
                3.654598 = idf(docFreq=3123, maxDocs=44421)
                0.017627135 = queryNorm
              0.40377986 = fieldWeight in 59, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.654598 = idf(docFreq=3123, maxDocs=44421)
                0.078125 = fieldNorm(doc=59)
          0.03216145 = weight(abstract_txt:methods in 59) [ClassicSimilarity], result of:
            0.03216145 = score(doc=59,freq=1.0), product of:
              0.09935302 = queryWeight, product of:
                1.3603005 = boost
                4.1434727 = idf(docFreq=1915, maxDocs=44421)
                0.017627135 = queryNorm
              0.3237088 = fieldWeight in 59, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1434727 = idf(docFreq=1915, maxDocs=44421)
                0.078125 = fieldNorm(doc=59)
          0.25833848 = weight(abstract_txt:duplicates in 59) [ClassicSimilarity], result of:
            0.25833848 = score(doc=59,freq=3.0), product of:
              0.21929717 = queryWeight, product of:
                1.4290448 = boost
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.017627135 = queryNorm
              1.1780293 = fieldWeight in 59, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.078125 = fieldNorm(doc=59)
          0.60999006 = weight(abstract_txt:duplicate in 59) [ClassicSimilarity], result of:
            0.60999006 = score(doc=59,freq=3.0), product of:
              0.5608274 = queryWeight, product of:
                3.9582622 = boost
                8.037906 = idf(docFreq=38, maxDocs=44421)
                0.017627135 = queryNorm
              1.087661 = fieldWeight in 59, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.037906 = idf(docFreq=38, maxDocs=44421)
                0.078125 = fieldNorm(doc=59)
        0.2 = coord(5/25)