Document (#44003)

Author
Gabler, S.
Title
Vergabe von DDC-Sachgruppen mittels eines Schlagwort-Thesaurus
Imprint
Wien / Library and Information Studies : Universität
Year
2021
Pages
109 S
Abstract
Vorgestellt wird die Konstruktion eines thematisch geordneten Thesaurus auf Basis der Sachschlagwörter der Gemeinsamen Normdatei (GND) unter Nutzung der darin enthaltenen DDC-Notationen. Oberste Ordnungsebene dieses Thesaurus werden die DDC-Sachgruppen der Deutschen Nationalbibliothek. Die Konstruktion des Thesaurus erfolgt regelbasiert unter der Nutzung von Linked Data Prinzipien in einem SPARQL Prozessor. Der Thesaurus dient der automatisierten Gewinnung von Metadaten aus wissenschaftlichen Publikationen mittels eines computerlinguistischen Extraktors. Hierzu werden digitale Volltexte verarbeitet. Dieser ermittelt die gefundenen Schlagwörter über Vergleich der Zeichenfolgen Benennungen im Thesaurus, ordnet die Treffer nach Relevanz im Text und gibt die zugeordne-ten Sachgruppen rangordnend zurück. Die grundlegende Annahme dabei ist, dass die gesuchte Sachgruppe unter den oberen Rängen zurückgegeben wird. In einem dreistufigen Verfahren wird die Leistungsfähigkeit des Verfahrens validiert. Hierzu wird zunächst anhand von Metadaten und Erkenntnissen einer Kurzautopsie ein Goldstandard aus Dokumenten erstellt, die im Online-Katalog der DNB abrufbar sind. Die Dokumente vertei-len sich über 14 der Sachgruppen mit einer Losgröße von jeweils 50 Dokumenten. Sämtliche Dokumente werden mit dem Extraktor erschlossen und die Ergebnisse der Kategorisierung do-kumentiert. Schließlich wird die sich daraus ergebende Retrievalleistung sowohl für eine harte (binäre) Kategorisierung als auch eine rangordnende Rückgabe der Sachgruppen beurteilt.
Content
Master thesis Master of Science (Library and Information Studies) (MSc), Universität Wien. Advisor: Christoph Steiner. Vgl.: https://www.researchgate.net/publication/371680244_Vergabe_von_DDC-Sachgruppen_mittels_eines_Schlagwort-Thesaurus. DOI: 10.25365/thesis.70030. Vgl. dazu die Präsentation unter: https://www.google.com/url?sa=i&rct=j&q=&esrc=s&source=web&cd=&ved=0CAIQw7AJahcKEwjwoZzzytz_AhUAAAAAHQAAAAAQAg&url=https%3A%2F%2Fwiki.dnb.de%2Fdownload%2Fattachments%2F252121510%2FDA3%2520Workshop-Gabler.pdf%3Fversion%3D1%26modificationDate%3D1671093170000%26api%3Dv2&psig=AOvVaw0szwENK1or3HevgvIDOfjx&ust=1687719410889597&opi=89978449.
Theme
Beziehungen verbale / systematische Erschließung
Semantische Interoperabilität
Object
DDC
RSWK
SWD

Similar documents (content)

  1. Darstellung der CrissCross-Mappingrelationen im Rahmen des Semantic Web (2010) 0.19
    0.19021411 = sum of:
      0.19021411 = product of:
        0.5283725 = sum of:
          0.015580906 = weight(abstract_txt:einem in 285) [ClassicSimilarity], result of:
            0.015580906 = score(doc=285,freq=2.0), product of:
              0.065052345 = queryWeight, product of:
                4.3356547 = idf(docFreq=1580, maxDocs=44421)
                0.015004043 = queryNorm
              0.23951335 = fieldWeight in 285, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3356547 = idf(docFreq=1580, maxDocs=44421)
                0.0390625 = fieldNorm(doc=285)
          0.08874724 = weight(abstract_txt:sachgruppe in 285) [ClassicSimilarity], result of:
            0.08874724 = score(doc=285,freq=2.0), product of:
              0.16467503 = queryWeight, product of:
                1.1250385 = boost
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.015004043 = queryNorm
              0.5389235 = fieldWeight in 285, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.0390625 = fieldNorm(doc=285)
          0.030317726 = weight(abstract_txt:werden in 285) [ClassicSimilarity], result of:
            0.030317726 = score(doc=285,freq=12.0), product of:
              0.063872255 = queryWeight, product of:
                1.2135851 = boost
                3.507791 = idf(docFreq=3617, maxDocs=44421)
                0.015004043 = queryNorm
              0.4746619 = fieldWeight in 285, product of:
                3.4641016 = tf(freq=12.0), with freq of:
                  12.0 = termFreq=12.0
                3.507791 = idf(docFreq=3617, maxDocs=44421)
                0.0390625 = fieldNorm(doc=285)
          0.051155172 = weight(abstract_txt:dokumenten in 285) [ClassicSimilarity], result of:
            0.051155172 = score(doc=285,freq=2.0), product of:
              0.14370136 = queryWeight, product of:
                1.4862742 = boost
                6.443972 = idf(docFreq=191, maxDocs=44421)
                0.015004043 = queryNorm
              0.3559825 = fieldWeight in 285, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.443972 = idf(docFreq=191, maxDocs=44421)
                0.0390625 = fieldNorm(doc=285)
          0.0402506 = weight(abstract_txt:mittels in 285) [ClassicSimilarity], result of:
            0.0402506 = score(doc=285,freq=1.0), product of:
              0.15430953 = queryWeight, product of:
                1.5401566 = boost
                6.677587 = idf(docFreq=151, maxDocs=44421)
                0.015004043 = queryNorm
              0.26084325 = fieldWeight in 285, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.677587 = idf(docFreq=151, maxDocs=44421)
                0.0390625 = fieldNorm(doc=285)
          0.019452432 = weight(abstract_txt:eines in 285) [ClassicSimilarity], result of:
            0.019452432 = score(doc=285,freq=1.0), product of:
              0.10878213 = queryWeight, product of:
                1.5837729 = boost
                4.577795 = idf(docFreq=1240, maxDocs=44421)
                0.015004043 = queryNorm
              0.17882012 = fieldWeight in 285, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.577795 = idf(docFreq=1240, maxDocs=44421)
                0.0390625 = fieldNorm(doc=285)
          0.03143173 = weight(abstract_txt:unter in 285) [ClassicSimilarity], result of:
            0.03143173 = score(doc=285,freq=2.0), product of:
              0.118889585 = queryWeight, product of:
                1.6557168 = boost
                4.785744 = idf(docFreq=1007, maxDocs=44421)
                0.015004043 = queryNorm
              0.2643775 = fieldWeight in 285, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.785744 = idf(docFreq=1007, maxDocs=44421)
                0.0390625 = fieldNorm(doc=285)
          0.04801312 = weight(abstract_txt:wird in 285) [ClassicSimilarity], result of:
            0.04801312 = score(doc=285,freq=7.0), product of:
              0.123139806 = queryWeight, product of:
                2.175393 = boost
                3.7727013 = idf(docFreq=2775, maxDocs=44421)
                0.015004043 = queryNorm
              0.3899074 = fieldWeight in 285, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                3.7727013 = idf(docFreq=2775, maxDocs=44421)
                0.0390625 = fieldNorm(doc=285)
          0.20342356 = weight(abstract_txt:sachgruppen in 285) [ClassicSimilarity], result of:
            0.20342356 = score(doc=285,freq=1.0), product of:
              0.616773 = queryWeight, product of:
                4.8685675 = boost
                8.443371 = idf(docFreq=25, maxDocs=44421)
                0.015004043 = queryNorm
              0.32981917 = fieldWeight in 285, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.443371 = idf(docFreq=25, maxDocs=44421)
                0.0390625 = fieldNorm(doc=285)
        0.36 = coord(9/25)
    
  2. Hoffmann, R.: Entwicklung einer benutzerunterstützten automatisierten Klassifikation von Web - Dokumenten : Untersuchung gegenwärtiger Methoden zur automatisierten Dokumentklassifikation und Implementierung eines Prototyps zum verbesserten Information Retrieval für das xFIND System (2002) 0.17
    0.16840759 = sum of:
      0.16840759 = product of:
        0.46779886 = sum of:
          0.018697085 = weight(abstract_txt:einem in 5197) [ClassicSimilarity], result of:
            0.018697085 = score(doc=5197,freq=2.0), product of:
              0.065052345 = queryWeight, product of:
                4.3356547 = idf(docFreq=1580, maxDocs=44421)
                0.015004043 = queryNorm
              0.287416 = fieldWeight in 5197, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3356547 = idf(docFreq=1580, maxDocs=44421)
                0.046875 = fieldNorm(doc=5197)
          0.027786655 = weight(abstract_txt:werden in 5197) [ClassicSimilarity], result of:
            0.027786655 = score(doc=5197,freq=7.0), product of:
              0.063872255 = queryWeight, product of:
                1.2135851 = boost
                3.507791 = idf(docFreq=3617, maxDocs=44421)
                0.015004043 = queryNorm
              0.4350348 = fieldWeight in 5197, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                3.507791 = idf(docFreq=3617, maxDocs=44421)
                0.046875 = fieldNorm(doc=5197)
          0.039693613 = weight(abstract_txt:dokumente in 5197) [ClassicSimilarity], result of:
            0.039693613 = score(doc=5197,freq=1.0), product of:
              0.13538507 = queryWeight, product of:
                1.4426265 = boost
                6.25473 = idf(docFreq=231, maxDocs=44421)
                0.015004043 = queryNorm
              0.29319048 = fieldWeight in 5197, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.25473 = idf(docFreq=231, maxDocs=44421)
                0.046875 = fieldNorm(doc=5197)
          0.06138621 = weight(abstract_txt:dokumenten in 5197) [ClassicSimilarity], result of:
            0.06138621 = score(doc=5197,freq=2.0), product of:
              0.14370136 = queryWeight, product of:
                1.4862742 = boost
                6.443972 = idf(docFreq=191, maxDocs=44421)
                0.015004043 = queryNorm
              0.42717904 = fieldWeight in 5197, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.443972 = idf(docFreq=191, maxDocs=44421)
                0.046875 = fieldNorm(doc=5197)
          0.048300717 = weight(abstract_txt:mittels in 5197) [ClassicSimilarity], result of:
            0.048300717 = score(doc=5197,freq=1.0), product of:
              0.15430953 = queryWeight, product of:
                1.5401566 = boost
                6.677587 = idf(docFreq=151, maxDocs=44421)
                0.015004043 = queryNorm
              0.31301188 = fieldWeight in 5197, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.677587 = idf(docFreq=151, maxDocs=44421)
                0.046875 = fieldNorm(doc=5197)
          0.033011872 = weight(abstract_txt:eines in 5197) [ClassicSimilarity], result of:
            0.033011872 = score(doc=5197,freq=2.0), product of:
              0.10878213 = queryWeight, product of:
                1.5837729 = boost
                4.577795 = idf(docFreq=1240, maxDocs=44421)
                0.015004043 = queryNorm
              0.30346778 = fieldWeight in 5197, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.577795 = idf(docFreq=1240, maxDocs=44421)
                0.046875 = fieldNorm(doc=5197)
          0.026670711 = weight(abstract_txt:unter in 5197) [ClassicSimilarity], result of:
            0.026670711 = score(doc=5197,freq=1.0), product of:
              0.118889585 = queryWeight, product of:
                1.6557168 = boost
                4.785744 = idf(docFreq=1007, maxDocs=44421)
                0.015004043 = queryNorm
              0.22433177 = fieldWeight in 5197, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.785744 = idf(docFreq=1007, maxDocs=44421)
                0.046875 = fieldNorm(doc=5197)
          0.19047527 = weight(abstract_txt:kategorisierung in 5197) [ClassicSimilarity], result of:
            0.19047527 = score(doc=5197,freq=2.0), product of:
              0.30570748 = queryWeight, product of:
                2.1678116 = boost
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.015004043 = queryNorm
              0.6230638 = fieldWeight in 5197, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.046875 = fieldNorm(doc=5197)
          0.021776704 = weight(abstract_txt:wird in 5197) [ClassicSimilarity], result of:
            0.021776704 = score(doc=5197,freq=1.0), product of:
              0.123139806 = queryWeight, product of:
                2.175393 = boost
                3.7727013 = idf(docFreq=2775, maxDocs=44421)
                0.015004043 = queryNorm
              0.17684537 = fieldWeight in 5197, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7727013 = idf(docFreq=2775, maxDocs=44421)
                0.046875 = fieldNorm(doc=5197)
        0.36 = coord(9/25)
    
  3. Heiner-Freiling, M.: Dewey in der Deutschen Nationalbibliographie? (2002) 0.15
    0.14552933 = sum of:
      0.14552933 = product of:
        0.60637224 = sum of:
          0.0132208355 = weight(abstract_txt:einem in 2419) [ClassicSimilarity], result of:
            0.0132208355 = score(doc=2419,freq=1.0), product of:
              0.065052345 = queryWeight, product of:
                4.3356547 = idf(docFreq=1580, maxDocs=44421)
                0.015004043 = queryNorm
              0.20323381 = fieldWeight in 2419, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3356547 = idf(docFreq=1580, maxDocs=44421)
                0.046875 = fieldNorm(doc=2419)
          0.053515706 = weight(abstract_txt:vergabe in 2419) [ClassicSimilarity], result of:
            0.053515706 = score(doc=2419,freq=1.0), product of:
              0.1311398 = queryWeight, product of:
                1.00397 = boost
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.015004043 = queryNorm
              0.40808135 = fieldWeight in 2419, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.046875 = fieldNorm(doc=2419)
          0.07530452 = weight(abstract_txt:sachgruppe in 2419) [ClassicSimilarity], result of:
            0.07530452 = score(doc=2419,freq=1.0), product of:
              0.16467503 = queryWeight, product of:
                1.1250385 = boost
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.015004043 = queryNorm
              0.45729172 = fieldWeight in 2419, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.046875 = fieldNorm(doc=2419)
          0.014852591 = weight(abstract_txt:werden in 2419) [ClassicSimilarity], result of:
            0.014852591 = score(doc=2419,freq=2.0), product of:
              0.063872255 = queryWeight, product of:
                1.2135851 = boost
                3.507791 = idf(docFreq=3617, maxDocs=44421)
                0.015004043 = queryNorm
              0.23253587 = fieldWeight in 2419, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.507791 = idf(docFreq=3617, maxDocs=44421)
                0.046875 = fieldNorm(doc=2419)
          0.026670711 = weight(abstract_txt:unter in 2419) [ClassicSimilarity], result of:
            0.026670711 = score(doc=2419,freq=1.0), product of:
              0.118889585 = queryWeight, product of:
                1.6557168 = boost
                4.785744 = idf(docFreq=1007, maxDocs=44421)
                0.015004043 = queryNorm
              0.22433177 = fieldWeight in 2419, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.785744 = idf(docFreq=1007, maxDocs=44421)
                0.046875 = fieldNorm(doc=2419)
          0.4228079 = weight(abstract_txt:sachgruppen in 2419) [ClassicSimilarity], result of:
            0.4228079 = score(doc=2419,freq=3.0), product of:
              0.616773 = queryWeight, product of:
                4.8685675 = boost
                8.443371 = idf(docFreq=25, maxDocs=44421)
                0.015004043 = queryNorm
              0.68551624 = fieldWeight in 2419, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.443371 = idf(docFreq=25, maxDocs=44421)
                0.046875 = fieldNorm(doc=2419)
        0.24 = coord(6/25)
    
  4. Alex, H.; Heiner-Freiling, M.: DDC-Sachgruppen der Deutschen Naitonalbibliografie : Leitfaden zu ihrer Vergabe (2003) 0.13
    0.12698622 = sum of:
      0.12698622 = product of:
        1.5873278 = sum of:
          0.28541708 = weight(abstract_txt:vergabe in 3191) [ClassicSimilarity], result of:
            0.28541708 = score(doc=3191,freq=1.0), product of:
              0.1311398 = queryWeight, product of:
                1.00397 = boost
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.015004043 = queryNorm
              2.1764338 = fieldWeight in 3191, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.25 = fieldNorm(doc=3191)
          1.3019108 = weight(abstract_txt:sachgruppen in 3191) [ClassicSimilarity], result of:
            1.3019108 = score(doc=3191,freq=1.0), product of:
              0.616773 = queryWeight, product of:
                4.8685675 = boost
                8.443371 = idf(docFreq=25, maxDocs=44421)
                0.015004043 = queryNorm
              2.1108427 = fieldWeight in 3191, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.443371 = idf(docFreq=25, maxDocs=44421)
                0.25 = fieldNorm(doc=3191)
        0.08 = coord(2/25)
    
  5. Krischker, U.: Formale Analyse von Dokumenten (1997) 0.13
    0.12662476 = sum of:
      0.12662476 = product of:
        0.45223132 = sum of:
          0.030317726 = weight(abstract_txt:werden in 912) [ClassicSimilarity], result of:
            0.030317726 = score(doc=912,freq=3.0), product of:
              0.063872255 = queryWeight, product of:
                1.2135851 = boost
                3.507791 = idf(docFreq=3617, maxDocs=44421)
                0.015004043 = queryNorm
              0.4746619 = fieldWeight in 912, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.507791 = idf(docFreq=3617, maxDocs=44421)
                0.078125 = fieldNorm(doc=912)
          0.093558736 = weight(abstract_txt:dokumente in 912) [ClassicSimilarity], result of:
            0.093558736 = score(doc=912,freq=2.0), product of:
              0.13538507 = queryWeight, product of:
                1.4426265 = boost
                6.25473 = idf(docFreq=231, maxDocs=44421)
                0.015004043 = queryNorm
              0.69105655 = fieldWeight in 912, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.25473 = idf(docFreq=231, maxDocs=44421)
                0.078125 = fieldNorm(doc=912)
          0.102310345 = weight(abstract_txt:dokumenten in 912) [ClassicSimilarity], result of:
            0.102310345 = score(doc=912,freq=2.0), product of:
              0.14370136 = queryWeight, product of:
                1.4862742 = boost
                6.443972 = idf(docFreq=191, maxDocs=44421)
                0.015004043 = queryNorm
              0.711965 = fieldWeight in 912, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.443972 = idf(docFreq=191, maxDocs=44421)
                0.078125 = fieldNorm(doc=912)
          0.038904864 = weight(abstract_txt:eines in 912) [ClassicSimilarity], result of:
            0.038904864 = score(doc=912,freq=1.0), product of:
              0.10878213 = queryWeight, product of:
                1.5837729 = boost
                4.577795 = idf(docFreq=1240, maxDocs=44421)
                0.015004043 = queryNorm
              0.35764024 = fieldWeight in 912, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.577795 = idf(docFreq=1240, maxDocs=44421)
                0.078125 = fieldNorm(doc=912)
          0.09136027 = weight(abstract_txt:hierzu in 912) [ClassicSimilarity], result of:
            0.09136027 = score(doc=912,freq=1.0), product of:
              0.16789177 = queryWeight, product of:
                1.6065092 = boost
                6.965269 = idf(docFreq=113, maxDocs=44421)
                0.015004043 = queryNorm
              0.5441617 = fieldWeight in 912, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.965269 = idf(docFreq=113, maxDocs=44421)
                0.078125 = fieldNorm(doc=912)
          0.04445118 = weight(abstract_txt:unter in 912) [ClassicSimilarity], result of:
            0.04445118 = score(doc=912,freq=1.0), product of:
              0.118889585 = queryWeight, product of:
                1.6557168 = boost
                4.785744 = idf(docFreq=1007, maxDocs=44421)
                0.015004043 = queryNorm
              0.37388626 = fieldWeight in 912, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.785744 = idf(docFreq=1007, maxDocs=44421)
                0.078125 = fieldNorm(doc=912)
          0.051328186 = weight(abstract_txt:wird in 912) [ClassicSimilarity], result of:
            0.051328186 = score(doc=912,freq=2.0), product of:
              0.123139806 = queryWeight, product of:
                2.175393 = boost
                3.7727013 = idf(docFreq=2775, maxDocs=44421)
                0.015004043 = queryNorm
              0.41682854 = fieldWeight in 912, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.7727013 = idf(docFreq=2775, maxDocs=44421)
                0.078125 = fieldNorm(doc=912)
        0.28 = coord(7/25)