Document (#44320)

Berendt, B.
Karadeniz, O.Ö.
Kiyak, S.
Mertens, S.
d'Haenens, L.
Diversity and bias in DBpedia and Wikidata as a challenge for text-analysis tools
o-bib: Das offene Bibliotheksjournal. 10(2023) Nr.2, S.1-12
Diversity Searcher ist ein Tool, das ursprünglich entwickelt wurde, um bei der Analyse von Diversität in Nachrichtentexten zu helfen. Es beruht auf einer automatisierten Inhaltsanalyse und stützt sich daher auf Annahmen und hängt von Designentscheidungen in Bezug auf Diversität ab. In diesem Artikel untersuchen wir die Auswirkungen davon, dass Ergebnisse einer automatisierten Inhaltsanalyse in der Regel von externen Wissensquellen abhängig sind. Wir vergleichen zwei Datenquellen, mit denen der Diversity Searcher arbeitet - DBpedia und Wikidata - im Hinblick auf ihre ontologische Abdeckung und Diversität und beschreiben die Auswirkungen auf die daraus resultierenden Analysen von Textkorpora. Wir beschreiben eine Fallstudie zur relativen Über- bzw. Unterrepräsentation belgischer politischer Parteien zwischen 1990 und 2020. Insbesondere stießen wir auf eine erstaunlich starke Überrepräsentation der politischen Rechten in der englischsprachigen DBpedia.
Diversity Searcher

Similar documents (author)

  1. Mertens, N.: Formale und sachliche Erschließung von AV-Medien in Öffentlichen Bibliotheken der Bundesrepublik Deutschland : Untersuchung von Universalklassifikationen und Regelwerken (1987) 6.10
    6.0972233 = sum of:
      6.0972233 = weight(author_txt:mertens in 6805) [ClassicSimilarity], result of:
        6.0972233 = fieldWeight in 6805, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.755557 = idf(docFreq=6, maxDocs=44421)
          0.625 = fieldNorm(doc=6805)
  2. Mertens, M.: ¬Ein Manager regelt Sprache und Raum (2003) 6.10
    6.0972233 = sum of:
      6.0972233 = weight(author_txt:mertens in 3228) [ClassicSimilarity], result of:
        6.0972233 = fieldWeight in 3228, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.755557 = idf(docFreq=6, maxDocs=44421)
          0.625 = fieldNorm(doc=3228)
  3. Mertens, M.: Bewusstlos im Bett : im Schlaf organisiert sich das Gehirn komplett um (2011) 6.10
    6.0972233 = sum of:
      6.0972233 = weight(author_txt:mertens in 419) [ClassicSimilarity], result of:
        6.0972233 = fieldWeight in 419, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.755557 = idf(docFreq=6, maxDocs=44421)
          0.625 = fieldNorm(doc=419)
  4. Mertens, T.: Vergleich von Archiv- und Dokumentenmanagementsystemen für die betriebliche Anwendung (2000) 6.10
    6.0972233 = sum of:
      6.0972233 = weight(author_txt:mertens in 651) [ClassicSimilarity], result of:
        6.0972233 = fieldWeight in 651, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.755557 = idf(docFreq=6, maxDocs=44421)
          0.625 = fieldNorm(doc=651)
  5. Mertens, H.-J.; Servos, B.; Stadler, F.: ¬Die Sprache der Mathematik : Das Wortmaterial und seine Beziehung zur Allgemeinsprache (1973) 3.66
    3.6583338 = sum of:
      3.6583338 = weight(author_txt:mertens in 1651) [ClassicSimilarity], result of:
        3.6583338 = fieldWeight in 1651, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.755557 = idf(docFreq=6, maxDocs=44421)
          0.375 = fieldNorm(doc=1651)

Similar documents (content)

  1. Usbeck, R.; Yan, X.; Perevalov, A.; Jiang, L.; Schulz, J.; Kraft, A.; Möller, C.; Huang, J.; Reineke, J.; Ngonga Ngomo, A.-C.; Saleem, M.; Both, A.: QALD-10 - The 10th challenge on question answering over linked data: : shifting from DBpedia to Wikidata as a KG for KGQA (2023) 0.07
    0.070057884 = sum of:
      0.070057884 = product of:
        0.87572354 = sum of:
          0.43533772 = weight(abstract_txt:wikidata in 2350) [ClassicSimilarity], result of:
            0.43533772 = score(doc=2350,freq=7.0), product of:
              0.29050496 = queryWeight, product of:
                2.3022 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.013924088 = queryNorm
              1.4985552 = fieldWeight in 2350, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.0625 = fieldNorm(doc=2350)
          0.44038582 = weight(abstract_txt:dbpedia in 2350) [ClassicSimilarity], result of:
            0.44038582 = score(doc=2350,freq=5.0), product of:
              0.37488493 = queryWeight, product of:
                3.2030292 = boost
                8.405631 = idf(docFreq=26, maxDocs=44421)
                0.013924088 = queryNorm
              1.1747227 = fieldWeight in 2350, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                8.405631 = idf(docFreq=26, maxDocs=44421)
                0.0625 = fieldNorm(doc=2350)
        0.08 = coord(2/25)
  2. Ackermann, A.: Zur Rolle der Inhaltsanalyse bei der Sacherschließung : theoretischer Anspruch und praktische Wirklichkeit in der RSWK (2001) 0.07
    0.0688061 = sum of:
      0.0688061 = product of:
        0.4300381 = sum of:
          0.035877723 = weight(abstract_txt:hängt in 3061) [ClassicSimilarity], result of:
            0.035877723 = score(doc=3061,freq=1.0), product of:
              0.11426729 = queryWeight, product of:
                1.0209683 = boost
                8.037906 = idf(docFreq=38, maxDocs=44421)
                0.013924088 = queryNorm
              0.3139807 = fieldWeight in 3061, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.037906 = idf(docFreq=38, maxDocs=44421)
                0.0390625 = fieldNorm(doc=3061)
          0.044249635 = weight(abstract_txt:beschreiben in 3061) [ClassicSimilarity], result of:
            0.044249635 = score(doc=3061,freq=1.0), product of:
              0.16557261 = queryWeight, product of:
                1.7380431 = boost
                6.8416553 = idf(docFreq=128, maxDocs=44421)
                0.013924088 = queryNorm
              0.26725215 = fieldWeight in 3061, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8416553 = idf(docFreq=128, maxDocs=44421)
                0.0390625 = fieldNorm(doc=3061)
          0.045833785 = weight(abstract_txt:auswirkungen in 3061) [ClassicSimilarity], result of:
            0.045833785 = score(doc=3061,freq=1.0), product of:
              0.1695011 = queryWeight, product of:
                1.7585411 = boost
                6.922344 = idf(docFreq=118, maxDocs=44421)
                0.013924088 = queryNorm
              0.27040407 = fieldWeight in 3061, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.922344 = idf(docFreq=118, maxDocs=44421)
                0.0390625 = fieldNorm(doc=3061)
          0.30407694 = weight(abstract_txt:inhaltsanalyse in 3061) [ClassicSimilarity], result of:
            0.30407694 = score(doc=3061,freq=13.0), product of:
              0.25452077 = queryWeight, product of:
                2.154904 = boost
                8.482592 = idf(docFreq=24, maxDocs=44421)
                0.013924088 = queryNorm
              1.1947038 = fieldWeight in 3061, product of:
                3.6055512 = tf(freq=13.0), with freq of:
                  13.0 = termFreq=13.0
                8.482592 = idf(docFreq=24, maxDocs=44421)
                0.0390625 = fieldNorm(doc=3061)
        0.16 = coord(4/25)
  3. Haugg, S.: ¬Die Konstruktion der politischen Wirklichkeit durch Nachrichten : Eine Analyse der aktuellen Forschungsdiskussion (2004) 0.05
    0.053607415 = sum of:
      0.053607415 = product of:
        0.44672847 = sum of:
          0.14586881 = weight(abstract_txt:politischer in 4711) [ClassicSimilarity], result of:
            0.14586881 = score(doc=4711,freq=1.0), product of:
              0.13404387 = queryWeight, product of:
                1.1057954 = boost
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.013924088 = queryNorm
              1.0882169 = fieldWeight in 4711, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.125 = fieldNorm(doc=4711)
          0.15419155 = weight(abstract_txt:annahmen in 4711) [ClassicSimilarity], result of:
            0.15419155 = score(doc=4711,freq=1.0), product of:
              0.13909529 = queryWeight, product of:
                1.1264385 = boost
                8.868255 = idf(docFreq=16, maxDocs=44421)
                0.013924088 = queryNorm
              1.1085318 = fieldWeight in 4711, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.868255 = idf(docFreq=16, maxDocs=44421)
                0.125 = fieldNorm(doc=4711)
          0.14666812 = weight(abstract_txt:auswirkungen in 4711) [ClassicSimilarity], result of:
            0.14666812 = score(doc=4711,freq=1.0), product of:
              0.1695011 = queryWeight, product of:
                1.7585411 = boost
                6.922344 = idf(docFreq=118, maxDocs=44421)
                0.013924088 = queryNorm
              0.865293 = fieldWeight in 4711, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.922344 = idf(docFreq=118, maxDocs=44421)
                0.125 = fieldNorm(doc=4711)
        0.12 = coord(3/25)
  4. Günther, M.: Vermitteln Suchmaschinen vollständige Bilder aktueller Themen? : Untersuchung der Gewichtung inhaltlicher Aspekte von Suchmaschinenergebnissen in Deutschland und den USA (2016) 0.04
    0.04388659 = sum of:
      0.04388659 = product of:
        0.36572158 = sum of:
          0.10538247 = weight(abstract_txt:abdeckung in 4068) [ClassicSimilarity], result of:
            0.10538247 = score(doc=4068,freq=1.0), product of:
              0.14763781 = queryWeight, product of:
                1.1605132 = boost
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.013924088 = queryNorm
              0.71379054 = fieldWeight in 4068, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.078125 = fieldNorm(doc=4068)
          0.09166757 = weight(abstract_txt:auswirkungen in 4068) [ClassicSimilarity], result of:
            0.09166757 = score(doc=4068,freq=1.0), product of:
              0.1695011 = queryWeight, product of:
                1.7585411 = boost
                6.922344 = idf(docFreq=118, maxDocs=44421)
                0.013924088 = queryNorm
              0.54080814 = fieldWeight in 4068, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.922344 = idf(docFreq=118, maxDocs=44421)
                0.078125 = fieldNorm(doc=4068)
          0.16867153 = weight(abstract_txt:inhaltsanalyse in 4068) [ClassicSimilarity], result of:
            0.16867153 = score(doc=4068,freq=1.0), product of:
              0.25452077 = queryWeight, product of:
                2.154904 = boost
                8.482592 = idf(docFreq=24, maxDocs=44421)
                0.013924088 = queryNorm
              0.66270244 = fieldWeight in 4068, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.482592 = idf(docFreq=24, maxDocs=44421)
                0.078125 = fieldNorm(doc=4068)
        0.12 = coord(3/25)
  5. Junger, U.: Quo vadis Inhaltserschließung der Deutschen Nationalbibliothek? : Herausforderungen und Perspektiven (2015) 0.04
    0.035803568 = sum of:
      0.035803568 = product of:
        0.4475446 = sum of:
          0.09166757 = weight(abstract_txt:auswirkungen in 2718) [ClassicSimilarity], result of:
            0.09166757 = score(doc=2718,freq=1.0), product of:
              0.1695011 = queryWeight, product of:
                1.7585411 = boost
                6.922344 = idf(docFreq=118, maxDocs=44421)
                0.013924088 = queryNorm
              0.54080814 = fieldWeight in 2718, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.922344 = idf(docFreq=118, maxDocs=44421)
                0.078125 = fieldNorm(doc=2718)
          0.35587704 = weight(abstract_txt:diversität in 2718) [ClassicSimilarity], result of:
            0.35587704 = score(doc=2718,freq=1.0), product of:
              0.47928345 = queryWeight, product of:
                3.6216638 = boost
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.013924088 = queryNorm
              0.74251896 = fieldWeight in 2718, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.078125 = fieldNorm(doc=2718)
        0.08 = coord(2/25)