Document (#38018)

Author
Meyer, A.
Title
wiki2rdf: Automatische Extraktion von RDF-Tripeln aus Artikelvolltexten der Wikipedia
Source
Information - Wissenschaft und Praxis. 64(2013) H.2/3, S.115-126
Year
2013
Abstract
Im Projekt DBpedia werden unter anderem Informationen aus Wikipedia-Artikeln in RDF-Tripel umgewandelt. Dabei werden jedoch nicht die Artikeltexte berücksichtigt, sondern vorrangig die sogenannten Infoboxen, die Informationen enthalten, die bereits strukturiert sind. Im Rahmen einer Masterarbeit am Institut für Bibliotheks- und Informationswissenschaft der Humboldt-Universität zu Berlin wurde wiki2rdf entwickelt, eine Software zur regelbasierten Extraktion von RDF-Tripeln aus den unstrukturierten Volltexten der Wikipedia. Die Extraktion erfolgt nach Syntax-Parsing mithilfe eines Dependency-Parsers. Exemplarisch wurde wiki2rdf auf 68820 Artikel aus der Kategorie "Wissenschaftler" der deutschsprachigen Wikipedia angewandt. Es wurden 244563 Tripel extrahiert.
Content
Vgl.: http://www.degruyter.com/view/j/iwp.2013.64.issue-2-3/iwp-2013-0015/iwp-2013-0015.xml?format=INT.
Theme
Semantic Web
Object
DBpedia
Wikipedia

Similar documents (author)

  1. Meyer, A.: ¬Der Realkatalog (1923) 4.68
    4.6762247 = sum of:
      4.6762247 = weight(author_txt:meyer in 100) [ClassicSimilarity], result of:
        4.6762247 = fieldWeight in 100, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.48196 = idf(docFreq=67, maxDocs=44421)
          0.625 = fieldNorm(doc=100)
    
  2. Meyer, T.: ¬Die öffentliche Bibliothek in der Zivilgesellschaft (2001) 4.68
    4.6762247 = sum of:
      4.6762247 = weight(author_txt:meyer in 235) [ClassicSimilarity], result of:
        4.6762247 = fieldWeight in 235, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.48196 = idf(docFreq=67, maxDocs=44421)
          0.625 = fieldNorm(doc=235)
    
  3. Meyer, A.: Probleme des Realkatalogs (1921) 4.68
    4.6762247 = sum of:
      4.6762247 = weight(author_txt:meyer in 1668) [ClassicSimilarity], result of:
        4.6762247 = fieldWeight in 1668, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.48196 = idf(docFreq=67, maxDocs=44421)
          0.625 = fieldNorm(doc=1668)
    
  4. Meyer, R.W.: Selecting electronic alternatives (1993) 4.68
    4.6762247 = sum of:
      4.6762247 = weight(author_txt:meyer in 5914) [ClassicSimilarity], result of:
        4.6762247 = fieldWeight in 5914, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.48196 = idf(docFreq=67, maxDocs=44421)
          0.625 = fieldNorm(doc=5914)
    
  5. Meyer, F.P.: Out with the old, in with the new : why CD-ROM may have a new standard (1992) 4.68
    4.6762247 = sum of:
      4.6762247 = weight(author_txt:meyer in 6376) [ClassicSimilarity], result of:
        4.6762247 = fieldWeight in 6376, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.48196 = idf(docFreq=67, maxDocs=44421)
          0.625 = fieldNorm(doc=6376)
    

Similar documents (content)

  1. Witschel, H.F.: Terminologie-Extraktion : Möglichkeiten der Kombination statistischer uns musterbasierter Verfahren (2004) 0.14
    0.14413634 = sum of:
      0.14413634 = product of:
        0.72068167 = sum of:
          0.028062714 = weight(abstract_txt:werden in 1123) [ClassicSimilarity], result of:
            0.028062714 = score(doc=1123,freq=5.0), product of:
              0.057244126 = queryWeight, product of:
                1.0046704 = boost
                3.507791 = idf(docFreq=3617, maxDocs=44421)
                0.016243275 = queryNorm
              0.4902287 = fieldWeight in 1123, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.507791 = idf(docFreq=3617, maxDocs=44421)
                0.0625 = fieldNorm(doc=1123)
          0.063999236 = weight(abstract_txt:sogenannten in 1123) [ClassicSimilarity], result of:
            0.063999236 = score(doc=1123,freq=1.0), product of:
              0.13460907 = queryWeight, product of:
                1.0893823 = boost
                7.607123 = idf(docFreq=59, maxDocs=44421)
                0.016243275 = queryNorm
              0.47544518 = fieldWeight in 1123, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.607123 = idf(docFreq=59, maxDocs=44421)
                0.0625 = fieldNorm(doc=1123)
          0.09432081 = weight(abstract_txt:unstrukturierten in 1123) [ClassicSimilarity], result of:
            0.09432081 = score(doc=1123,freq=1.0), product of:
              0.17432627 = queryWeight, product of:
                1.2397227 = boost
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.016243275 = queryNorm
              0.5410591 = fieldWeight in 1123, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.0625 = fieldNorm(doc=1123)
          0.035860203 = weight(abstract_txt:informationen in 1123) [ClassicSimilarity], result of:
            0.035860203 = score(doc=1123,freq=1.0), product of:
              0.11526824 = queryWeight, product of:
                1.4256502 = boost
                4.9776354 = idf(docFreq=831, maxDocs=44421)
                0.016243275 = queryNorm
              0.3111022 = fieldWeight in 1123, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9776354 = idf(docFreq=831, maxDocs=44421)
                0.0625 = fieldNorm(doc=1123)
          0.4984387 = weight(abstract_txt:extraktion in 1123) [ClassicSimilarity], result of:
            0.4984387 = score(doc=1123,freq=3.0), product of:
              0.5288904 = queryWeight, product of:
                3.7401292 = boost
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.016243275 = queryNorm
              0.94242346 = fieldWeight in 1123, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.0625 = fieldNorm(doc=1123)
        0.2 = coord(5/25)
    
  2. Lussner, W.: Technologien des Wissensmanagements : READWARE als Instrument des Knowledge Retrieval (2000) 0.14
    0.13776152 = sum of:
      0.13776152 = product of:
        0.86100954 = sum of:
          0.025100054 = weight(abstract_txt:werden in 6238) [ClassicSimilarity], result of:
            0.025100054 = score(doc=6238,freq=1.0), product of:
              0.057244126 = queryWeight, product of:
                1.0046704 = boost
                3.507791 = idf(docFreq=3617, maxDocs=44421)
                0.016243275 = queryNorm
              0.43847388 = fieldWeight in 6238, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.507791 = idf(docFreq=3617, maxDocs=44421)
                0.125 = fieldNorm(doc=6238)
          0.18864162 = weight(abstract_txt:unstrukturierten in 6238) [ClassicSimilarity], result of:
            0.18864162 = score(doc=6238,freq=1.0), product of:
              0.17432627 = queryWeight, product of:
                1.2397227 = boost
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.016243275 = queryNorm
              1.0821182 = fieldWeight in 6238, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.125 = fieldNorm(doc=6238)
          0.07172041 = weight(abstract_txt:informationen in 6238) [ClassicSimilarity], result of:
            0.07172041 = score(doc=6238,freq=1.0), product of:
              0.11526824 = queryWeight, product of:
                1.4256502 = boost
                4.9776354 = idf(docFreq=831, maxDocs=44421)
                0.016243275 = queryNorm
              0.6222044 = fieldWeight in 6238, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9776354 = idf(docFreq=831, maxDocs=44421)
                0.125 = fieldNorm(doc=6238)
          0.57554746 = weight(abstract_txt:extraktion in 6238) [ClassicSimilarity], result of:
            0.57554746 = score(doc=6238,freq=1.0), product of:
              0.5288904 = queryWeight, product of:
                3.7401292 = boost
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.016243275 = queryNorm
              1.0882169 = fieldWeight in 6238, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.125 = fieldNorm(doc=6238)
        0.16 = coord(4/25)
    
  3. Version 8.08 des Standard-Thesaurus Wirtschaft mit Mapping zu anderen Vokabularen veröffentlicht (2012) 0.11
    0.1104549 = sum of:
      0.1104549 = product of:
        0.69034314 = sum of:
          0.1079281 = weight(abstract_txt:dbpedia in 1007) [ClassicSimilarity], result of:
            0.1079281 = score(doc=1007,freq=1.0), product of:
              0.16435169 = queryWeight, product of:
                1.2037332 = boost
                8.405631 = idf(docFreq=26, maxDocs=44421)
                0.016243275 = queryNorm
              0.65668994 = fieldWeight in 1007, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.405631 = idf(docFreq=26, maxDocs=44421)
                0.078125 = fieldNorm(doc=1007)
          0.044825252 = weight(abstract_txt:informationen in 1007) [ClassicSimilarity], result of:
            0.044825252 = score(doc=1007,freq=1.0), product of:
              0.11526824 = queryWeight, product of:
                1.4256502 = boost
                4.9776354 = idf(docFreq=831, maxDocs=44421)
                0.016243275 = queryNorm
              0.38887775 = fieldWeight in 1007, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9776354 = idf(docFreq=831, maxDocs=44421)
                0.078125 = fieldNorm(doc=1007)
          0.17787269 = weight(abstract_txt:wikipedia in 1007) [ClassicSimilarity], result of:
            0.17787269 = score(doc=1007,freq=1.0), product of:
              0.36400777 = queryWeight, product of:
                3.5828488 = boost
                6.25473 = idf(docFreq=231, maxDocs=44421)
                0.016243275 = queryNorm
              0.4886508 = fieldWeight in 1007, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.25473 = idf(docFreq=231, maxDocs=44421)
                0.078125 = fieldNorm(doc=1007)
          0.35971713 = weight(abstract_txt:extraktion in 1007) [ClassicSimilarity], result of:
            0.35971713 = score(doc=1007,freq=1.0), product of:
              0.5288904 = queryWeight, product of:
                3.7401292 = boost
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.016243275 = queryNorm
              0.68013555 = fieldWeight in 1007, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.078125 = fieldNorm(doc=1007)
        0.16 = coord(4/25)
    
  4. Bredack, J.: Terminologieextraktion von Mehrwortgruppen in kunsthistorischen Fachtexten (2013) 0.10
    0.10042043 = sum of:
      0.10042043 = product of:
        0.50210214 = sum of:
          0.022185521 = weight(abstract_txt:werden in 2054) [ClassicSimilarity], result of:
            0.022185521 = score(doc=2054,freq=8.0), product of:
              0.057244126 = queryWeight, product of:
                1.0046704 = boost
                3.507791 = idf(docFreq=3617, maxDocs=44421)
                0.016243275 = queryNorm
              0.3875598 = fieldWeight in 2054, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                3.507791 = idf(docFreq=3617, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2054)
          0.095640026 = weight(abstract_txt:extrahiert in 2054) [ClassicSimilarity], result of:
            0.095640026 = score(doc=2054,freq=2.0), product of:
              0.19103852 = queryWeight, product of:
                1.2977875 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.016243275 = queryNorm
              0.50063217 = fieldWeight in 2054, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2054)
          0.027927132 = weight(abstract_txt:wurde in 2054) [ClassicSimilarity], result of:
            0.027927132 = score(doc=2054,freq=2.0), product of:
              0.10593888 = queryWeight, product of:
                1.3667397 = boost
                4.7719507 = idf(docFreq=1021, maxDocs=44421)
                0.016243275 = queryNorm
              0.26361552 = fieldWeight in 2054, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.7719507 = idf(docFreq=1021, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2054)
          0.044825252 = weight(abstract_txt:informationen in 2054) [ClassicSimilarity], result of:
            0.044825252 = score(doc=2054,freq=4.0), product of:
              0.11526824 = queryWeight, product of:
                1.4256502 = boost
                4.9776354 = idf(docFreq=831, maxDocs=44421)
                0.016243275 = queryNorm
              0.38887775 = fieldWeight in 2054, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.9776354 = idf(docFreq=831, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2054)
          0.31152418 = weight(abstract_txt:extraktion in 2054) [ClassicSimilarity], result of:
            0.31152418 = score(doc=2054,freq=3.0), product of:
              0.5288904 = queryWeight, product of:
                3.7401292 = boost
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.016243275 = queryNorm
              0.58901465 = fieldWeight in 2054, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2054)
        0.2 = coord(5/25)
    
  5. Gross, D.: Maschinelle Bilderkennung mit Big Data und Deep Learning (2017) 0.10
    0.095974974 = sum of:
      0.095974974 = product of:
        0.79979146 = sum of:
          0.2334321 = weight(abstract_txt:unstrukturierten in 4726) [ClassicSimilarity], result of:
            0.2334321 = score(doc=4726,freq=2.0), product of:
              0.17432627 = queryWeight, product of:
                1.2397227 = boost
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.016243275 = queryNorm
              1.3390529 = fieldWeight in 4726, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.109375 = fieldNorm(doc=4726)
          0.06275536 = weight(abstract_txt:informationen in 4726) [ClassicSimilarity], result of:
            0.06275536 = score(doc=4726,freq=1.0), product of:
              0.11526824 = queryWeight, product of:
                1.4256502 = boost
                4.9776354 = idf(docFreq=831, maxDocs=44421)
                0.016243275 = queryNorm
              0.5444289 = fieldWeight in 4726, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9776354 = idf(docFreq=831, maxDocs=44421)
                0.109375 = fieldNorm(doc=4726)
          0.503604 = weight(abstract_txt:extraktion in 4726) [ClassicSimilarity], result of:
            0.503604 = score(doc=4726,freq=1.0), product of:
              0.5288904 = queryWeight, product of:
                3.7401292 = boost
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.016243275 = queryNorm
              0.9521898 = fieldWeight in 4726, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.705735 = idf(docFreq=19, maxDocs=44421)
                0.109375 = fieldNorm(doc=4726)
        0.12 = coord(3/25)