Document (#21383)

Author
Meir, D.D.
Lazinger, S.S.
Title
Measuring the performance of a merging algorithm : mismatches, missed-matches, and overlap in Israel's union list
Source
Information technology and libraries. 17(1998) no.3, S.116-123
Year
1998
Abstract
Reports results of a survey, undertaken in 1996, to measure the performance of the merging algorithm used to generate the now defunct ALEPH ULM (Union List of Monographs) file. Results showed that although the algorithm created almost no mismatches that would have led to the loss of information, it had a greater proportion of missed matches than was anticipated, especially when matching Hebrew bibliographic records. Discusses the central issues inherent in automatic detection and merging of duplicate records, as well as the main methodologies for measuring the performance of merging algorithms. Recommendations include integrating testing procedures into the initial specifications for any future algorithms and deciding on a performance threshold that the algorithm must exceed in order to be put to use
Theme
Formalerschließung
Object
ALEPH ULM
Location
Israel

Similar documents (author)

  1. Lazinger, S.S.: To merge or not to merge : Israel's Union List of Monographs in the context of merging algorithms (1994) 5.87
    5.874302 = sum of:
      5.874302 = weight(author_txt:lazinger in 3099) [ClassicSimilarity], result of:
        5.874302 = fieldWeight in 3099, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.398883 = idf(docFreq=9, maxDocs=44421)
          0.625 = fieldNorm(doc=3099)
    
  2. Lazinger, S.S.: Digital preservation and metadata : history, theory, practice (2002) 5.87
    5.874302 = sum of:
      5.874302 = weight(author_txt:lazinger in 2262) [ClassicSimilarity], result of:
        5.874302 = fieldWeight in 2262, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.398883 = idf(docFreq=9, maxDocs=44421)
          0.625 = fieldNorm(doc=2262)
    
  3. Lazinger, S.S.: LC Classification of a library and information science library for maximum shelf retrieval (1984) 5.87
    5.874302 = sum of:
      5.874302 = weight(author_txt:lazinger in 464) [ClassicSimilarity], result of:
        5.874302 = fieldWeight in 464, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.398883 = idf(docFreq=9, maxDocs=44421)
          0.625 = fieldNorm(doc=464)
    
  4. Lazinger, S.S.; Peritz, B.C.: Reader use of a nationwide research library network : local OPAC vs. remote files (1991) 4.70
    4.6994414 = sum of:
      4.6994414 = weight(author_txt:lazinger in 4013) [ClassicSimilarity], result of:
        4.6994414 = fieldWeight in 4013, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.398883 = idf(docFreq=9, maxDocs=44421)
          0.5 = fieldNorm(doc=4013)
    
  5. Shoham, S.; Lazinger, S.S.: ¬The no-main-entry principle and the automated catalog (1991) 4.70
    4.6994414 = sum of:
      4.6994414 = weight(author_txt:lazinger in 632) [ClassicSimilarity], result of:
        4.6994414 = fieldWeight in 632, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.398883 = idf(docFreq=9, maxDocs=44421)
          0.5 = fieldNorm(doc=632)
    

Similar documents (content)

  1. Lazinger, S.S.: To merge or not to merge : Israel's Union List of Monographs in the context of merging algorithms (1994) 0.32
    0.32393214 = sum of:
      0.32393214 = product of:
        1.1569005 = sum of:
          0.1546626 = weight(abstract_txt:monographs in 3099) [ClassicSimilarity], result of:
            0.1546626 = score(doc=3099,freq=4.0), product of:
              0.11471453 = queryWeight, product of:
                1.0412605 = boost
                7.190608 = idf(docFreq=90, maxDocs=44421)
                0.015321223 = queryNorm
              1.348239 = fieldWeight in 3099, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.190608 = idf(docFreq=90, maxDocs=44421)
                0.09375 = fieldNorm(doc=3099)
          0.0940769 = weight(abstract_txt:aleph in 3099) [ClassicSimilarity], result of:
            0.0940769 = score(doc=3099,freq=1.0), product of:
              0.13072848 = queryWeight, product of:
                1.1115661 = boost
                7.676116 = idf(docFreq=55, maxDocs=44421)
                0.015321223 = queryNorm
              0.71963584 = fieldWeight in 3099, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.676116 = idf(docFreq=55, maxDocs=44421)
                0.09375 = fieldNorm(doc=3099)
          0.036024556 = weight(abstract_txt:records in 3099) [ClassicSimilarity], result of:
            0.036024556 = score(doc=3099,freq=1.0), product of:
              0.08685416 = queryWeight, product of:
                1.2813284 = boost
                4.42422 = idf(docFreq=1446, maxDocs=44421)
                0.015321223 = queryNorm
              0.41477063 = fieldWeight in 3099, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.42422 = idf(docFreq=1446, maxDocs=44421)
                0.09375 = fieldNorm(doc=3099)
          0.13026273 = weight(abstract_txt:list in 3099) [ClassicSimilarity], result of:
            0.13026273 = score(doc=3099,freq=4.0), product of:
              0.12889963 = queryWeight, product of:
                1.5609572 = boost
                5.389733 = idf(docFreq=550, maxDocs=44421)
                0.015321223 = queryNorm
              1.0105749 = fieldWeight in 3099, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.389733 = idf(docFreq=550, maxDocs=44421)
                0.09375 = fieldNorm(doc=3099)
          0.23037899 = weight(abstract_txt:union in 3099) [ClassicSimilarity], result of:
            0.23037899 = score(doc=3099,freq=6.0), product of:
              0.16467829 = queryWeight, product of:
                1.7643443 = boost
                6.0919957 = idf(docFreq=272, maxDocs=44421)
                0.015321223 = queryNorm
              1.3989639 = fieldWeight in 3099, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.0919957 = idf(docFreq=272, maxDocs=44421)
                0.09375 = fieldNorm(doc=3099)
          0.15448596 = weight(abstract_txt:algorithm in 3099) [ClassicSimilarity], result of:
            0.15448596 = score(doc=3099,freq=1.0), product of:
              0.28884235 = queryWeight, product of:
                3.3045368 = boost
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.015321223 = queryNorm
              0.53484523 = fieldWeight in 3099, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.09375 = fieldNorm(doc=3099)
          0.35700884 = weight(abstract_txt:merging in 3099) [ClassicSimilarity], result of:
            0.35700884 = score(doc=3099,freq=1.0), product of:
              0.50487924 = queryWeight, product of:
                4.3689184 = boost
                7.5425844 = idf(docFreq=63, maxDocs=44421)
                0.015321223 = queryNorm
              0.7071173 = fieldWeight in 3099, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5425844 = idf(docFreq=63, maxDocs=44421)
                0.09375 = fieldNorm(doc=3099)
        0.28 = coord(7/25)
    
  2. Paltoglou, G.; Salampasis, M.; Satratzemi, M.: ¬A results merging algorithm for distributed information retrieval environments that combines regression methodologies with a selective download phase (2008) 0.22
    0.21757407 = sum of:
      0.21757407 = product of:
        0.77705026 = sum of:
          0.046507567 = weight(abstract_txt:overlap in 3111) [ClassicSimilarity], result of:
            0.046507567 = score(doc=3111,freq=1.0), product of:
              0.107100494 = queryWeight, product of:
                1.006111 = boost
                6.9478774 = idf(docFreq=115, maxDocs=44421)
                0.015321223 = queryNorm
              0.43424234 = fieldWeight in 3111, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9478774 = idf(docFreq=115, maxDocs=44421)
                0.0625 = fieldNorm(doc=3111)
          0.011674216 = weight(abstract_txt:results in 3111) [ClassicSimilarity], result of:
            0.011674216 = score(doc=3111,freq=1.0), product of:
              0.053695455 = queryWeight, product of:
                1.0074742 = boost
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.015321223 = queryNorm
              0.21741535 = fieldWeight in 3111, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.0625 = fieldNorm(doc=3111)
          0.01100465 = weight(abstract_txt:that in 3111) [ClassicSimilarity], result of:
            0.01100465 = score(doc=3111,freq=4.0), product of:
              0.03722605 = queryWeight, product of:
                1.0273875 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.015321223 = queryNorm
              0.2956169 = fieldWeight in 3111, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0625 = fieldNorm(doc=3111)
          0.07263547 = weight(abstract_txt:algorithms in 3111) [ClassicSimilarity], result of:
            0.07263547 = score(doc=3111,freq=2.0), product of:
              0.14417 = queryWeight, product of:
                1.650831 = boost
                5.7000527 = idf(docFreq=403, maxDocs=44421)
                0.015321223 = queryNorm
              0.5038182 = fieldWeight in 3111, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.7000527 = idf(docFreq=403, maxDocs=44421)
                0.0625 = fieldNorm(doc=3111)
          0.077339284 = weight(abstract_txt:performance in 3111) [ClassicSimilarity], result of:
            0.077339284 = score(doc=3111,freq=2.0), product of:
              0.18940255 = queryWeight, product of:
                2.6759195 = boost
                4.619759 = idf(docFreq=1189, maxDocs=44421)
                0.015321223 = queryNorm
              0.40833285 = fieldWeight in 3111, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.619759 = idf(docFreq=1189, maxDocs=44421)
                0.0625 = fieldNorm(doc=3111)
          0.14565074 = weight(abstract_txt:algorithm in 3111) [ClassicSimilarity], result of:
            0.14565074 = score(doc=3111,freq=2.0), product of:
              0.28884235 = queryWeight, product of:
                3.3045368 = boost
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.015321223 = queryNorm
              0.5042569 = fieldWeight in 3111, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.0625 = fieldNorm(doc=3111)
          0.4122383 = weight(abstract_txt:merging in 3111) [ClassicSimilarity], result of:
            0.4122383 = score(doc=3111,freq=3.0), product of:
              0.50487924 = queryWeight, product of:
                4.3689184 = boost
                7.5425844 = idf(docFreq=63, maxDocs=44421)
                0.015321223 = queryNorm
              0.8165087 = fieldWeight in 3111, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.5425844 = idf(docFreq=63, maxDocs=44421)
                0.0625 = fieldNorm(doc=3111)
        0.28 = coord(7/25)
    
  3. Tsai, M.-.F.; Chen, H.-H.; Wang, Y.-T.: Learning a merge model for multilingual information retrieval (2011) 0.19
    0.19483662 = sum of:
      0.19483662 = product of:
        0.9741831 = sum of:
          0.011674216 = weight(abstract_txt:results in 3750) [ClassicSimilarity], result of:
            0.011674216 = score(doc=3750,freq=1.0), product of:
              0.053695455 = queryWeight, product of:
                1.0074742 = boost
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.015321223 = queryNorm
              0.21741535 = fieldWeight in 3750, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.0625 = fieldNorm(doc=3750)
          0.009530306 = weight(abstract_txt:that in 3750) [ClassicSimilarity], result of:
            0.009530306 = score(doc=3750,freq=3.0), product of:
              0.03722605 = queryWeight, product of:
                1.0273875 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.015321223 = queryNorm
              0.25601172 = fieldWeight in 3750, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0625 = fieldNorm(doc=3750)
          0.054687135 = weight(abstract_txt:performance in 3750) [ClassicSimilarity], result of:
            0.054687135 = score(doc=3750,freq=1.0), product of:
              0.18940255 = queryWeight, product of:
                2.6759195 = boost
                4.619759 = idf(docFreq=1189, maxDocs=44421)
                0.015321223 = queryNorm
              0.28873494 = fieldWeight in 3750, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.619759 = idf(docFreq=1189, maxDocs=44421)
                0.0625 = fieldNorm(doc=3750)
          0.14565074 = weight(abstract_txt:algorithm in 3750) [ClassicSimilarity], result of:
            0.14565074 = score(doc=3750,freq=2.0), product of:
              0.28884235 = queryWeight, product of:
                3.3045368 = boost
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.015321223 = queryNorm
              0.5042569 = fieldWeight in 3750, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.0625 = fieldNorm(doc=3750)
          0.75264066 = weight(abstract_txt:merging in 3750) [ClassicSimilarity], result of:
            0.75264066 = score(doc=3750,freq=10.0), product of:
              0.50487924 = queryWeight, product of:
                4.3689184 = boost
                7.5425844 = idf(docFreq=63, maxDocs=44421)
                0.015321223 = queryNorm
              1.4907341 = fieldWeight in 3750, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                7.5425844 = idf(docFreq=63, maxDocs=44421)
                0.0625 = fieldNorm(doc=3750)
        0.2 = coord(5/25)
    
  4. Sitas, A.; Kapidakis, S.: Duplicate detection algorithms of bibliographic descriptions (2008) 0.13
    0.126495 = sum of:
      0.126495 = product of:
        0.632475 = sum of:
          0.020637292 = weight(abstract_txt:results in 3543) [ClassicSimilarity], result of:
            0.020637292 = score(doc=3543,freq=2.0), product of:
              0.053695455 = queryWeight, product of:
                1.0074742 = boost
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.015321223 = queryNorm
              0.38433966 = fieldWeight in 3543, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.078125 = fieldNorm(doc=3543)
          0.15590735 = weight(abstract_txt:duplicate in 3543) [ClassicSimilarity], result of:
            0.15590735 = score(doc=3543,freq=3.0), product of:
              0.14334185 = queryWeight, product of:
                1.1639563 = boost
                8.037906 = idf(docFreq=38, maxDocs=44421)
                0.015321223 = queryNorm
              1.087661 = fieldWeight in 3543, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.037906 = idf(docFreq=38, maxDocs=44421)
                0.078125 = fieldNorm(doc=3543)
          0.030020464 = weight(abstract_txt:records in 3543) [ClassicSimilarity], result of:
            0.030020464 = score(doc=3543,freq=1.0), product of:
              0.08685416 = queryWeight, product of:
                1.2813284 = boost
                4.42422 = idf(docFreq=1446, maxDocs=44421)
                0.015321223 = queryNorm
              0.3456422 = fieldWeight in 3543, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.42422 = idf(docFreq=1446, maxDocs=44421)
                0.078125 = fieldNorm(doc=3543)
          0.12840259 = weight(abstract_txt:algorithms in 3543) [ClassicSimilarity], result of:
            0.12840259 = score(doc=3543,freq=4.0), product of:
              0.14417 = queryWeight, product of:
                1.650831 = boost
                5.7000527 = idf(docFreq=403, maxDocs=44421)
                0.015321223 = queryNorm
              0.8906332 = fieldWeight in 3543, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.7000527 = idf(docFreq=403, maxDocs=44421)
                0.078125 = fieldNorm(doc=3543)
          0.29750735 = weight(abstract_txt:merging in 3543) [ClassicSimilarity], result of:
            0.29750735 = score(doc=3543,freq=1.0), product of:
              0.50487924 = queryWeight, product of:
                4.3689184 = boost
                7.5425844 = idf(docFreq=63, maxDocs=44421)
                0.015321223 = queryNorm
              0.5892644 = fieldWeight in 3543, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5425844 = idf(docFreq=63, maxDocs=44421)
                0.078125 = fieldNorm(doc=3543)
        0.2 = coord(5/25)
    
  5. Hustand, S.: Problems of duplicate records (1986) 0.12
    0.121967666 = sum of:
      0.121967666 = product of:
        0.6098383 = sum of:
          0.12729782 = weight(abstract_txt:duplicate in 265) [ClassicSimilarity], result of:
            0.12729782 = score(doc=265,freq=2.0), product of:
              0.14334185 = queryWeight, product of:
                1.1639563 = boost
                8.037906 = idf(docFreq=38, maxDocs=44421)
                0.015321223 = queryNorm
              0.88807154 = fieldWeight in 265, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.037906 = idf(docFreq=38, maxDocs=44421)
                0.078125 = fieldNorm(doc=265)
          0.042455345 = weight(abstract_txt:records in 265) [ClassicSimilarity], result of:
            0.042455345 = score(doc=265,freq=2.0), product of:
              0.08685416 = queryWeight, product of:
                1.2813284 = boost
                4.42422 = idf(docFreq=1446, maxDocs=44421)
                0.015321223 = queryNorm
              0.48881188 = fieldWeight in 265, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.42422 = idf(docFreq=1446, maxDocs=44421)
                0.078125 = fieldNorm(doc=265)
          0.064201295 = weight(abstract_txt:algorithms in 265) [ClassicSimilarity], result of:
            0.064201295 = score(doc=265,freq=1.0), product of:
              0.14417 = queryWeight, product of:
                1.650831 = boost
                5.7000527 = idf(docFreq=403, maxDocs=44421)
                0.015321223 = queryNorm
              0.4453166 = fieldWeight in 265, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7000527 = idf(docFreq=403, maxDocs=44421)
                0.078125 = fieldNorm(doc=265)
          0.07837652 = weight(abstract_txt:union in 265) [ClassicSimilarity], result of:
            0.07837652 = score(doc=265,freq=1.0), product of:
              0.16467829 = queryWeight, product of:
                1.7643443 = boost
                6.0919957 = idf(docFreq=272, maxDocs=44421)
                0.015321223 = queryNorm
              0.47593716 = fieldWeight in 265, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0919957 = idf(docFreq=272, maxDocs=44421)
                0.078125 = fieldNorm(doc=265)
          0.29750735 = weight(abstract_txt:merging in 265) [ClassicSimilarity], result of:
            0.29750735 = score(doc=265,freq=1.0), product of:
              0.50487924 = queryWeight, product of:
                4.3689184 = boost
                7.5425844 = idf(docFreq=63, maxDocs=44421)
                0.015321223 = queryNorm
              0.5892644 = fieldWeight in 265, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5425844 = idf(docFreq=63, maxDocs=44421)
                0.078125 = fieldNorm(doc=265)
        0.2 = coord(5/25)