Document (#40953)

Drees, B.
Text und data mining : Herausforderungen und Möglichkeiten für Bibliotheken
Perspektive Bibliothek. 5(2016) H.1, S.49-73
Text und Data Mining (TDM) gewinnt als wissenschaftliche Methode zunehmend an Bedeutung und stellt wissenschaftliche Bibliotheken damit vor neue Herausforderungen, bietet gleichzeitig aber auch neue Möglichkeiten. Der vorliegende Beitrag gibt einen Überblick über das Thema TDM aus bibliothekarischer Sicht. Hierzu wird der Begriff Text und Data Mining im Kontext verwandter Begriffe diskutiert sowie Ziele, Aufgaben und Methoden von TDM erläutert. Diese werden anhand beispielhafter TDM-Anwendungen in Wissenschaft und Forschung illustriert. Ferner werden technische und rechtliche Probleme und Hindernisse im TDM-Kontext dargelegt. Abschließend wird die Relevanz von TDM für Bibliotheken, sowohl in ihrer Rolle als Informationsvermittler und -anbieter als auch als Anwender von TDM-Methoden, aufgezeigt. Zudem wurde im Rahmen dieser Arbeit eine Befragung der Betreiber von Dokumentenservern an Bibliotheken in Deutschland zum aktuellen Umgang mit TDM durchgeführt, die zeigt, dass hier noch viel Ausbaupotential besteht. Die dem Artikel zugrunde liegenden Forschungsdaten sind unter dem DOI 10.11588/data/10090 publiziert.
Data Mining
Wissenschaftliche Bibliotheken

Similar documents (content)

  1. Schöning-Walter, C.: Persistant Identifier für Netzpublikationen (2007) 0.12
    0.117870644 = sum of:
      0.117870644 = product of:
        0.4911277 = sum of:
          0.08557231 = weight(abstract_txt:gewinnt in 2409) [ClassicSimilarity], result of:
            0.08557231 = score(doc=2409,freq=1.0), product of:
              0.17401046 = queryWeight, product of:
                1.0426552 = boost
                7.8682456 = idf(docFreq=45, maxDocs=44218)
                0.021210784 = queryNorm
              0.49176535 = fieldWeight in 2409, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.8682456 = idf(docFreq=45, maxDocs=44218)
                0.0625 = fieldNorm(doc=2409)
          0.091072164 = weight(abstract_txt:publiziert in 2409) [ClassicSimilarity], result of:
            0.091072164 = score(doc=2409,freq=1.0), product of:
              0.18138872 = queryWeight, product of:
                1.0645307 = boost
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.021210784 = queryNorm
              0.5020828 = fieldWeight in 2409, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.0625 = fieldNorm(doc=2409)
          0.04002849 = weight(abstract_txt:neue in 2409) [ClassicSimilarity], result of:
            0.04002849 = score(doc=2409,freq=1.0), product of:
              0.13211212 = queryWeight, product of:
                1.2848114 = boost
                4.8478208 = idf(docFreq=942, maxDocs=44218)
                0.021210784 = queryNorm
              0.3029888 = fieldWeight in 2409, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8478208 = idf(docFreq=942, maxDocs=44218)
                0.0625 = fieldNorm(doc=2409)
          0.10642293 = weight(abstract_txt:wissenschaftliche in 2409) [ClassicSimilarity], result of:
            0.10642293 = score(doc=2409,freq=2.0), product of:
              0.20123799 = queryWeight, product of:
                1.5857073 = boost
                5.9831543 = idf(docFreq=302, maxDocs=44218)
                0.021210784 = queryNorm
              0.52884114 = fieldWeight in 2409, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.9831543 = idf(docFreq=302, maxDocs=44218)
                0.0625 = fieldNorm(doc=2409)
          0.09313725 = weight(abstract_txt:herausforderungen in 2409) [ClassicSimilarity], result of:
            0.09313725 = score(doc=2409,freq=1.0), product of:
              0.23197727 = queryWeight, product of:
                1.7025143 = boost
                6.4238877 = idf(docFreq=194, maxDocs=44218)
                0.021210784 = queryNorm
              0.40149298 = fieldWeight in 2409, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.4238877 = idf(docFreq=194, maxDocs=44218)
                0.0625 = fieldNorm(doc=2409)
          0.07489457 = weight(abstract_txt:bibliotheken in 2409) [ClassicSimilarity], result of:
            0.07489457 = score(doc=2409,freq=1.0), product of:
              0.25273967 = queryWeight, product of:
                2.5131578 = boost
                4.7412944 = idf(docFreq=1048, maxDocs=44218)
                0.021210784 = queryNorm
              0.2963309 = fieldWeight in 2409, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7412944 = idf(docFreq=1048, maxDocs=44218)
                0.0625 = fieldNorm(doc=2409)
        0.24 = coord(6/25)
  2. Kreutzkam, E.: Neue Wege in alten Online-Katalogen : Catalog Enrichment als Methode der Sacherschließung? ; Stand, Entwicklung und Umsetzung in Bibliotheken Deutschlands (2007) 0.12
    0.115359545 = sum of:
      0.115359545 = product of:
        0.4806648 = sum of:
          0.033363845 = weight(abstract_txt:wird in 4650) [ClassicSimilarity], result of:
            0.033363845 = score(doc=4650,freq=2.0), product of:
              0.08003204 = queryWeight, product of:
                3.773177 = idf(docFreq=2761, maxDocs=44218)
                0.021210784 = queryNorm
              0.41688108 = fieldWeight in 4650, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.773177 = idf(docFreq=2761, maxDocs=44218)
                0.078125 = fieldNorm(doc=4650)
          0.10696539 = weight(abstract_txt:befragung in 4650) [ClassicSimilarity], result of:
            0.10696539 = score(doc=4650,freq=1.0), product of:
              0.17401046 = queryWeight, product of:
                1.0426552 = boost
                7.8682456 = idf(docFreq=45, maxDocs=44218)
                0.021210784 = queryNorm
              0.6147067 = fieldWeight in 4650, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.8682456 = idf(docFreq=45, maxDocs=44218)
                0.078125 = fieldNorm(doc=4650)
          0.11172732 = weight(abstract_txt:rechtliche in 4650) [ClassicSimilarity], result of:
            0.11172732 = score(doc=4650,freq=1.0), product of:
              0.17913732 = queryWeight, product of:
                1.0579036 = boost
                7.983315 = idf(docFreq=40, maxDocs=44218)
                0.021210784 = queryNorm
              0.6236965 = fieldWeight in 4650, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.983315 = idf(docFreq=40, maxDocs=44218)
                0.078125 = fieldNorm(doc=4650)
          0.050035615 = weight(abstract_txt:neue in 4650) [ClassicSimilarity], result of:
            0.050035615 = score(doc=4650,freq=1.0), product of:
              0.13211212 = queryWeight, product of:
                1.2848114 = boost
                4.8478208 = idf(docFreq=942, maxDocs=44218)
                0.021210784 = queryNorm
              0.378736 = fieldWeight in 4650, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8478208 = idf(docFreq=942, maxDocs=44218)
                0.078125 = fieldNorm(doc=4650)
          0.084954366 = weight(abstract_txt:methoden in 4650) [ClassicSimilarity], result of:
            0.084954366 = score(doc=4650,freq=1.0), product of:
              0.18802415 = queryWeight, product of:
                1.5327625 = boost
                5.7833843 = idf(docFreq=369, maxDocs=44218)
                0.021210784 = queryNorm
              0.4518269 = fieldWeight in 4650, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7833843 = idf(docFreq=369, maxDocs=44218)
                0.078125 = fieldNorm(doc=4650)
          0.09361822 = weight(abstract_txt:bibliotheken in 4650) [ClassicSimilarity], result of:
            0.09361822 = score(doc=4650,freq=1.0), product of:
              0.25273967 = queryWeight, product of:
                2.5131578 = boost
                4.7412944 = idf(docFreq=1048, maxDocs=44218)
                0.021210784 = queryNorm
              0.37041363 = fieldWeight in 4650, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7412944 = idf(docFreq=1048, maxDocs=44218)
                0.078125 = fieldNorm(doc=4650)
        0.24 = coord(6/25)
  3. Schoepe, B.: ¬Die Digitalisierung und ihre Effizienzgewinneaus pädagogischer Perspektive : Datenschutzrechtliche Probleme der Coronakrisen-induzierten"Digitalisierungsoffensive". Mit einer kritischen Bewertungzur Einführung der Microsoft 365 / Microsoft Teams -Software an derSekundarstufe Iund II der Fritz-Schumacher-Schule (FSS), Hamburg (2021) 0.11
    0.1139329 = sum of:
      0.1139329 = product of:
        0.35604033 = sum of:
          0.031209018 = weight(abstract_txt:wird in 115) [ClassicSimilarity], result of:
            0.031209018 = score(doc=115,freq=7.0), product of:
              0.08003204 = queryWeight, product of:
                3.773177 = idf(docFreq=2761, maxDocs=44218)
                0.021210784 = queryNorm
              0.38995653 = fieldWeight in 115, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                3.773177 = idf(docFreq=2761, maxDocs=44218)
                0.0390625 = fieldNorm(doc=115)
          0.060545437 = weight(abstract_txt:dargelegt in 115) [ClassicSimilarity], result of:
            0.060545437 = score(doc=115,freq=1.0), product of:
              0.18901116 = queryWeight, product of:
                1.0866678 = boost
                8.200379 = idf(docFreq=32, maxDocs=44218)
                0.021210784 = queryNorm
              0.3203273 = fieldWeight in 115, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.200379 = idf(docFreq=32, maxDocs=44218)
                0.0390625 = fieldNorm(doc=115)
          0.025017807 = weight(abstract_txt:neue in 115) [ClassicSimilarity], result of:
            0.025017807 = score(doc=115,freq=1.0), product of:
              0.13211212 = queryWeight, product of:
                1.2848114 = boost
                4.8478208 = idf(docFreq=942, maxDocs=44218)
                0.021210784 = queryNorm
              0.189368 = fieldWeight in 115, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8478208 = idf(docFreq=942, maxDocs=44218)
                0.0390625 = fieldNorm(doc=115)
          0.042477183 = weight(abstract_txt:methoden in 115) [ClassicSimilarity], result of:
            0.042477183 = score(doc=115,freq=1.0), product of:
              0.18802415 = queryWeight, product of:
                1.5327625 = boost
                5.7833843 = idf(docFreq=369, maxDocs=44218)
                0.021210784 = queryNorm
              0.22591345 = fieldWeight in 115, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7833843 = idf(docFreq=369, maxDocs=44218)
                0.0390625 = fieldNorm(doc=115)
          0.048705313 = weight(abstract_txt:text in 115) [ClassicSimilarity], result of:
            0.048705313 = score(doc=115,freq=5.0), product of:
              0.13789053 = queryWeight, product of:
                1.6076107 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.021210784 = queryNorm
              0.35321724 = fieldWeight in 115, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0390625 = fieldNorm(doc=115)
          0.073564805 = weight(abstract_txt:kontext in 115) [ClassicSimilarity], result of:
            0.073564805 = score(doc=115,freq=2.0), product of:
              0.21521863 = queryWeight, product of:
                1.6398646 = boost
                6.187499 = idf(docFreq=246, maxDocs=44218)
                0.021210784 = queryNorm
              0.34181428 = fieldWeight in 115, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.187499 = idf(docFreq=246, maxDocs=44218)
                0.0390625 = fieldNorm(doc=115)
          0.05821078 = weight(abstract_txt:herausforderungen in 115) [ClassicSimilarity], result of:
            0.05821078 = score(doc=115,freq=1.0), product of:
              0.23197727 = queryWeight, product of:
                1.7025143 = boost
                6.4238877 = idf(docFreq=194, maxDocs=44218)
                0.021210784 = queryNorm
              0.2509331 = fieldWeight in 115, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.4238877 = idf(docFreq=194, maxDocs=44218)
                0.0390625 = fieldNorm(doc=115)
          0.016309986 = weight(abstract_txt:data in 115) [ClassicSimilarity], result of:
            0.016309986 = score(doc=115,freq=1.0), product of:
              0.1251475 = queryWeight, product of:
                1.7684555 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.021210784 = queryNorm
              0.13032609 = fieldWeight in 115, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.0390625 = fieldNorm(doc=115)
        0.32 = coord(8/25)
  4. Junger, U.; Schwens, U.: ¬Die inhaltliche Erschließung des schriftlichen kulturellen Erbes auf dem Weg in die Zukunft : Automatische Vergabe von Schlagwörtern in der Deutschen Nationalbibliothek (2017) 0.11
    0.10540645 = sum of:
      0.10540645 = product of:
        0.43919355 = sum of:
          0.04002849 = weight(abstract_txt:neue in 3780) [ClassicSimilarity], result of:
            0.04002849 = score(doc=3780,freq=1.0), product of:
              0.13211212 = queryWeight, product of:
                1.2848114 = boost
                4.8478208 = idf(docFreq=942, maxDocs=44218)
                0.021210784 = queryNorm
              0.3029888 = fieldWeight in 3780, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8478208 = idf(docFreq=942, maxDocs=44218)
                0.0625 = fieldNorm(doc=3780)
          0.0843829 = weight(abstract_txt:möglichkeiten in 3780) [ClassicSimilarity], result of:
            0.0843829 = score(doc=3780,freq=2.0), product of:
              0.17239426 = queryWeight, product of:
                1.4676735 = boost
                5.5377917 = idf(docFreq=472, maxDocs=44218)
                0.021210784 = queryNorm
              0.48947626 = fieldWeight in 3780, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.5377917 = idf(docFreq=472, maxDocs=44218)
                0.0625 = fieldNorm(doc=3780)
          0.034850683 = weight(abstract_txt:text in 3780) [ClassicSimilarity], result of:
            0.034850683 = score(doc=3780,freq=1.0), product of:
              0.13789053 = queryWeight, product of:
                1.6076107 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.021210784 = queryNorm
              0.25274166 = fieldWeight in 3780, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=3780)
          0.026095975 = weight(abstract_txt:data in 3780) [ClassicSimilarity], result of:
            0.026095975 = score(doc=3780,freq=1.0), product of:
              0.1251475 = queryWeight, product of:
                1.7684555 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.021210784 = queryNorm
              0.20852174 = fieldWeight in 3780, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.0625 = fieldNorm(doc=3780)
          0.12411427 = weight(abstract_txt:mining in 3780) [ClassicSimilarity], result of:
            0.12411427 = score(doc=3780,freq=1.0), product of:
              0.3215694 = queryWeight, product of:
                2.4549975 = boost
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.021210784 = queryNorm
              0.38596416 = fieldWeight in 3780, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.0625 = fieldNorm(doc=3780)
          0.12972121 = weight(abstract_txt:bibliotheken in 3780) [ClassicSimilarity], result of:
            0.12972121 = score(doc=3780,freq=3.0), product of:
              0.25273967 = queryWeight, product of:
                2.5131578 = boost
                4.7412944 = idf(docFreq=1048, maxDocs=44218)
                0.021210784 = queryNorm
              0.5132602 = fieldWeight in 3780, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.7412944 = idf(docFreq=1048, maxDocs=44218)
                0.0625 = fieldNorm(doc=3780)
        0.24 = coord(6/25)
  5. Mache, B.; Klaffki, L.: ¬Das DARIAH-DE Repository : Elementarer Teil einer modularen Infrastruktur für geistes- und kulturwissenschaftliche Forschungsdaten (2018) 0.10
    0.103037104 = sum of:
      0.103037104 = product of:
        0.64398193 = sum of:
          0.23397793 = weight(abstract_txt:forschungsdaten in 4485) [ClassicSimilarity], result of:
            0.23397793 = score(doc=4485,freq=3.0), product of:
              0.16245659 = queryWeight, product of:
                1.0074458 = boost
                7.602543 = idf(docFreq=59, maxDocs=44218)
                0.021210784 = queryNorm
              1.440249 = fieldWeight in 4485, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.602543 = idf(docFreq=59, maxDocs=44218)
                0.109375 = fieldNorm(doc=4485)
          0.15937628 = weight(abstract_txt:publiziert in 4485) [ClassicSimilarity], result of:
            0.15937628 = score(doc=4485,freq=1.0), product of:
              0.18138872 = queryWeight, product of:
                1.0645307 = boost
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.021210784 = queryNorm
              0.87864494 = fieldWeight in 4485, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.109375 = fieldNorm(doc=4485)
          0.11893611 = weight(abstract_txt:methoden in 4485) [ClassicSimilarity], result of:
            0.11893611 = score(doc=4485,freq=1.0), product of:
              0.18802415 = queryWeight, product of:
                1.5327625 = boost
                5.7833843 = idf(docFreq=369, maxDocs=44218)
                0.021210784 = queryNorm
              0.63255763 = fieldWeight in 4485, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7833843 = idf(docFreq=369, maxDocs=44218)
                0.109375 = fieldNorm(doc=4485)
          0.13169165 = weight(abstract_txt:wissenschaftliche in 4485) [ClassicSimilarity], result of:
            0.13169165 = score(doc=4485,freq=1.0), product of:
              0.20123799 = queryWeight, product of:
                1.5857073 = boost
                5.9831543 = idf(docFreq=302, maxDocs=44218)
                0.021210784 = queryNorm
              0.6544075 = fieldWeight in 4485, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9831543 = idf(docFreq=302, maxDocs=44218)
                0.109375 = fieldNorm(doc=4485)
        0.16 = coord(4/25)