Document (#34563)

Author
Jele, H.
Title
Erkennung bibliographischer Dubletten mittels Trigrammen : Messungen zur Performanz
Source
B.I.T.online. 12(2009) H.3, S.xxx-xxx
Year
2009
Abstract
Die Bildung von Trigrammen wird in der automatisierten Dublettenerkennung häufig in Situationen angewandt, in denen "sehr ähnliche" aber nicht idente Datensätze als Duplikate identifiziert werden sollen. In dieser Arbeit werden drei auf Trigrammen beruhende Erkennungsverfahren (das Jaccard-Maß, der euklidische Abstand sowie der Ähnlichkeitswert des KOBV) praktisch angewandt, sämtliche dabei notwendigen Schritte umgesetzt und schließlich der Verbrauch an Zeit und Ressourcen (=die "Performanz") gemessen. Die hier zur Anwendung gelangte Datenmenge umfasst 392.616 bibliographische Titeldatensätze, die im Österreichischen Bibliothekenverbund erbracht wurden.
Theme
Formalerschließung

Similar documents (content)

  1. Schneider, W.: ¬Ein verteiltes Bibliotheks-Informationssystem auf Basis des Z39.50 Protokolls (1999) 0.05
    0.0484513 = sum of:
      0.0484513 = product of:
        0.40376085 = sum of:
          0.09649828 = weight(abstract_txt:bibliographische in 5773) [ClassicSimilarity], result of:
            0.09649828 = score(doc=5773,freq=1.0), product of:
              0.13470903 = queryWeight, product of:
                1.0252318 = boost
                7.6410246 = idf(docFreq=57, maxDocs=44421)
                0.017195825 = queryNorm
              0.716346 = fieldWeight in 5773, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.6410246 = idf(docFreq=57, maxDocs=44421)
                0.09375 = fieldNorm(doc=5773)
          0.1064361 = weight(abstract_txt:datensätze in 5773) [ClassicSimilarity], result of:
            0.1064361 = score(doc=5773,freq=1.0), product of:
              0.14380577 = queryWeight, product of:
                1.0592828 = boost
                7.894805 = idf(docFreq=44, maxDocs=44421)
                0.017195825 = queryNorm
              0.74013793 = fieldWeight in 5773, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.894805 = idf(docFreq=44, maxDocs=44421)
                0.09375 = fieldNorm(doc=5773)
          0.20082648 = weight(abstract_txt:dubletten in 5773) [ClassicSimilarity], result of:
            0.20082648 = score(doc=5773,freq=1.0), product of:
              0.21958245 = queryWeight, product of:
                1.3089485 = boost
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.017195825 = queryNorm
              0.91458344 = fieldWeight in 5773, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.09375 = fieldNorm(doc=5773)
        0.12 = coord(3/25)
    
  2. Fürste, F.M.: Linked Open Library Data : Bibliographische Daten und ihre Zugänglichkeit im Web der Daten (2009) 0.04
    0.038168278 = sum of:
      0.038168278 = product of:
        0.31806898 = sum of:
          0.08954746 = weight(abstract_txt:notwendigen in 3900) [ClassicSimilarity], result of:
            0.08954746 = score(doc=3900,freq=1.0), product of:
              0.12816 = queryWeight, product of:
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.017195825 = queryNorm
              0.69871616 = fieldWeight in 3900, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.09375 = fieldNorm(doc=3900)
          0.09649828 = weight(abstract_txt:bibliographische in 3900) [ClassicSimilarity], result of:
            0.09649828 = score(doc=3900,freq=1.0), product of:
              0.13470903 = queryWeight, product of:
                1.0252318 = boost
                7.6410246 = idf(docFreq=57, maxDocs=44421)
                0.017195825 = queryNorm
              0.716346 = fieldWeight in 3900, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.6410246 = idf(docFreq=57, maxDocs=44421)
                0.09375 = fieldNorm(doc=3900)
          0.13202326 = weight(abstract_txt:bibliographischer in 3900) [ClassicSimilarity], result of:
            0.13202326 = score(doc=3900,freq=1.0), product of:
              0.16601625 = queryWeight, product of:
                1.1381488 = boost
                8.482592 = idf(docFreq=24, maxDocs=44421)
                0.017195825 = queryNorm
              0.79524297 = fieldWeight in 3900, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.482592 = idf(docFreq=24, maxDocs=44421)
                0.09375 = fieldNorm(doc=3900)
        0.12 = coord(3/25)
    
  3. Schaffner, V.: FRBR in MAB2 und Primo - ein kafkaesker Prozess? : Möglichkeiten der FRBRisierung von MAB2-Datensätzen in Primo exemplarisch dargestellt an Datensätzen zu Franz Kafkas "Der Process" (2011) 0.04
    0.036424138 = sum of:
      0.036424138 = product of:
        0.30353448 = sum of:
          0.09749829 = weight(abstract_txt:bibliographische in 1907) [ClassicSimilarity], result of:
            0.09749829 = score(doc=1907,freq=3.0), product of:
              0.13470903 = queryWeight, product of:
                1.0252318 = boost
                7.6410246 = idf(docFreq=57, maxDocs=44421)
                0.017195825 = queryNorm
              0.72376955 = fieldWeight in 1907, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.6410246 = idf(docFreq=57, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1907)
          0.12417546 = weight(abstract_txt:datensätze in 1907) [ClassicSimilarity], result of:
            0.12417546 = score(doc=1907,freq=4.0), product of:
              0.14380577 = queryWeight, product of:
                1.0592828 = boost
                7.894805 = idf(docFreq=44, maxDocs=44421)
                0.017195825 = queryNorm
              0.8634943 = fieldWeight in 1907, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.894805 = idf(docFreq=44, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1907)
          0.08186073 = weight(abstract_txt:bibliothekenverbund in 1907) [ClassicSimilarity], result of:
            0.08186073 = score(doc=1907,freq=1.0), product of:
              0.1729111 = queryWeight, product of:
                1.1615427 = boost
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.017195825 = queryNorm
              0.4734267 = fieldWeight in 1907, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1907)
        0.12 = coord(3/25)
    
  4. Bürger, T.: ¬Die Digitalisierung der kulturellen und wissenschaftlichen Überlieferung : Versuch einer Zwischenbilanz (2011) 0.03
    0.032746695 = sum of:
      0.032746695 = product of:
        0.4093337 = sum of:
          0.074622884 = weight(abstract_txt:notwendigen in 717) [ClassicSimilarity], result of:
            0.074622884 = score(doc=717,freq=1.0), product of:
              0.12816 = queryWeight, product of:
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.017195825 = queryNorm
              0.58226347 = fieldWeight in 717, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.078125 = fieldNorm(doc=717)
          0.3347108 = weight(abstract_txt:performanz in 717) [ClassicSimilarity], result of:
            0.3347108 = score(doc=717,freq=1.0), product of:
              0.4391649 = queryWeight, product of:
                2.617897 = boost
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.017195825 = queryNorm
              0.7621529 = fieldWeight in 717, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.078125 = fieldNorm(doc=717)
        0.08 = coord(2/25)
    
  5. Figge, U.L.: Technische Anleitungen und der Erwerb kohärenten Wissens (2004) 0.03
    0.031922128 = sum of:
      0.031922128 = product of:
        0.3990266 = sum of:
          0.14239842 = weight(abstract_txt:situationen in 4144) [ClassicSimilarity], result of:
            0.14239842 = score(doc=4144,freq=2.0), product of:
              0.15649408 = queryWeight, product of:
                1.1050265 = boost
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.017195825 = queryNorm
              0.90992844 = fieldWeight in 4144, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.078125 = fieldNorm(doc=4144)
          0.2566282 = weight(abstract_txt:angewandt in 4144) [ClassicSimilarity], result of:
            0.2566282 = score(doc=4144,freq=1.0), product of:
              0.3678895 = queryWeight, product of:
                2.3960586 = boost
                8.928879 = idf(docFreq=15, maxDocs=44421)
                0.017195825 = queryNorm
              0.69756866 = fieldWeight in 4144, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.928879 = idf(docFreq=15, maxDocs=44421)
                0.078125 = fieldNorm(doc=4144)
        0.08 = coord(2/25)