Document (#40924)

Author
Neumaier, S.
Title
Data integration for open data on the Web
Source
Reasoning Web: Semantic Interoperability on the Web, 13th International Summer School 2017, London, UK, July 7-11, 2017, Tutorial Lectures. Eds.: Ianni, G. et al
Imprint
Cham : Springer International Publishing
Year
2017
Pages
S.1-28
Series
Lecture Notes in Computer Scienc;10370) (Information Systems and Applications, incl. Internet/Web, and HCI
Abstract
In this lecture we will discuss and introduce challenges of integrating openly available Web data and how to solve them. Firstly, while we will address this topic from the viewpoint of Semantic Web research, not all data is readily available as RDF or Linked Data, so we will give an introduction to different data formats prevalent on the Web, namely, standard formats for publishing and exchanging tabular, tree-shaped, and graph data. Secondly, not all Open Data is really completely open, so we will discuss and address issues around licences, terms of usage associated with Open Data, as well as documentation of data provenance. Thirdly, we will discuss issues connected with (meta-)data quality issues associated with Open Data on the Web and how Semantic Web techniques and vocabularies can be used to describe and remedy them. Fourth, we will address issues about searchability and integration of Open Data and discuss in how far semantic search can help to overcome these. We close with briefly summarizing further issues not covered explicitly herein, such as multi-linguality, temporal aspects (archiving, evolution, temporal querying), as well as how/whether OWL and RDFS reasoning on top of integrated open data could be help.
Theme
Semantic Web
Semantische Interoperabilität

Similar documents (content)

  1. Koho, M.; Burrows, T.; Hyvönen, E.; Ikkala, E.; Page, K.; Ransom, L.; Tuominen, J.; Emery, D.; Fraas, M.; Heller, B.; Lewis, D.; Morrison, A.; Porte, G.; Thomson, E.; Velios, A.; Wijsman, H.: Harmonizing and publishing heterogeneous premodern manuscript metadata as Linked Open Data (2022) 0.26
    0.26031122 = sum of:
      0.26031122 = product of:
        0.7230867 = sum of:
          0.02376753 = weight(abstract_txt:them in 1467) [ClassicSimilarity], result of:
            0.02376753 = score(doc=1467,freq=1.0), product of:
              0.08859533 = queryWeight, product of:
                1.0811553 = boost
                4.292331 = idf(docFreq=1650, maxDocs=44421)
                0.019091038 = queryNorm
              0.2682707 = fieldWeight in 1467, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.292331 = idf(docFreq=1650, maxDocs=44421)
                0.0625 = fieldNorm(doc=1467)
          0.02397057 = weight(abstract_txt:available in 1467) [ClassicSimilarity], result of:
            0.02397057 = score(doc=1467,freq=1.0), product of:
              0.089099176 = queryWeight, product of:
                1.0842252 = boost
                4.304519 = idf(docFreq=1630, maxDocs=44421)
                0.019091038 = queryNorm
              0.26903245 = fieldWeight in 1467, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.304519 = idf(docFreq=1630, maxDocs=44421)
                0.0625 = fieldNorm(doc=1467)
          0.009348483 = weight(abstract_txt:with in 1467) [ClassicSimilarity], result of:
            0.009348483 = score(doc=1467,freq=1.0), product of:
              0.059922818 = queryWeight, product of:
                1.2574588 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.019091038 = queryNorm
              0.15600874 = fieldWeight in 1467, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.0625 = fieldNorm(doc=1467)
          0.05711543 = weight(abstract_txt:semantic in 1467) [ClassicSimilarity], result of:
            0.05711543 = score(doc=1467,freq=2.0), product of:
              0.14441451 = queryWeight, product of:
                1.6905721 = boost
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.019091038 = queryNorm
              0.39549646 = fieldWeight in 1467, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.0625 = fieldNorm(doc=1467)
          0.072045095 = weight(abstract_txt:address in 1467) [ClassicSimilarity], result of:
            0.072045095 = score(doc=1467,freq=1.0), product of:
              0.21241646 = queryWeight, product of:
                2.0503235 = boost
                5.4267054 = idf(docFreq=530, maxDocs=44421)
                0.019091038 = queryNorm
              0.33916909 = fieldWeight in 1467, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4267054 = idf(docFreq=530, maxDocs=44421)
                0.0625 = fieldNorm(doc=1467)
          0.09136017 = weight(abstract_txt:discuss in 1467) [ClassicSimilarity], result of:
            0.09136017 = score(doc=1467,freq=1.0), product of:
              0.27390674 = queryWeight, product of:
                2.6884317 = boost
                5.3367167 = idf(docFreq=580, maxDocs=44421)
                0.019091038 = queryNorm
              0.3335448 = fieldWeight in 1467, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3367167 = idf(docFreq=580, maxDocs=44421)
                0.0625 = fieldNorm(doc=1467)
          0.0734218 = weight(abstract_txt:will in 1467) [ClassicSimilarity], result of:
            0.0734218 = score(doc=1467,freq=2.0), product of:
              0.21511394 = queryWeight, product of:
                2.917948 = boost
                3.8615482 = idf(docFreq=2539, maxDocs=44421)
                0.019091038 = queryNorm
              0.34131587 = fieldWeight in 1467, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.8615482 = idf(docFreq=2539, maxDocs=44421)
                0.0625 = fieldNorm(doc=1467)
          0.16648166 = weight(abstract_txt:open in 1467) [ClassicSimilarity], result of:
            0.16648166 = score(doc=1467,freq=2.0), product of:
              0.39085147 = queryWeight, product of:
                4.248372 = boost
                4.8190303 = idf(docFreq=974, maxDocs=44421)
                0.019091038 = queryNorm
              0.42594612 = fieldWeight in 1467, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.8190303 = idf(docFreq=974, maxDocs=44421)
                0.0625 = fieldNorm(doc=1467)
          0.20557602 = weight(abstract_txt:data in 1467) [ClassicSimilarity], result of:
            0.20557602 = score(doc=1467,freq=7.0), product of:
              0.37331012 = queryWeight, product of:
                5.8717365 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.019091038 = queryNorm
              0.5506843 = fieldWeight in 1467, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.0625 = fieldNorm(doc=1467)
        0.36 = coord(9/25)
    
  2. Wright, H.: Semantic Web and ontologies (2018) 0.20
    0.19913003 = sum of:
      0.19913003 = product of:
        0.71117866 = sum of:
          0.11806321 = weight(abstract_txt:openly in 1081) [ClassicSimilarity], result of:
            0.11806321 = score(doc=1081,freq=1.0), product of:
              0.17642002 = queryWeight, product of:
                1.0788016 = boost
                8.565973 = idf(docFreq=22, maxDocs=44421)
                0.019091038 = queryNorm
              0.66921663 = fieldWeight in 1081, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.565973 = idf(docFreq=22, maxDocs=44421)
                0.078125 = fieldNorm(doc=1081)
          0.029709416 = weight(abstract_txt:them in 1081) [ClassicSimilarity], result of:
            0.029709416 = score(doc=1081,freq=1.0), product of:
              0.08859533 = queryWeight, product of:
                1.0811553 = boost
                4.292331 = idf(docFreq=1650, maxDocs=44421)
                0.019091038 = queryNorm
              0.33533838 = fieldWeight in 1081, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.292331 = idf(docFreq=1650, maxDocs=44421)
                0.078125 = fieldNorm(doc=1081)
          0.029963212 = weight(abstract_txt:available in 1081) [ClassicSimilarity], result of:
            0.029963212 = score(doc=1081,freq=1.0), product of:
              0.089099176 = queryWeight, product of:
                1.0842252 = boost
                4.304519 = idf(docFreq=1630, maxDocs=44421)
                0.019091038 = queryNorm
              0.33629057 = fieldWeight in 1081, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.304519 = idf(docFreq=1630, maxDocs=44421)
                0.078125 = fieldNorm(doc=1081)
          0.041882608 = weight(abstract_txt:help in 1081) [ClassicSimilarity], result of:
            0.041882608 = score(doc=1081,freq=1.0), product of:
              0.11138771 = queryWeight, product of:
                1.2122754 = boost
                4.8128953 = idf(docFreq=980, maxDocs=44421)
                0.019091038 = queryNorm
              0.37600744 = fieldWeight in 1081, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8128953 = idf(docFreq=980, maxDocs=44421)
                0.078125 = fieldNorm(doc=1081)
          0.08743978 = weight(abstract_txt:semantic in 1081) [ClassicSimilarity], result of:
            0.08743978 = score(doc=1081,freq=3.0), product of:
              0.14441451 = queryWeight, product of:
                1.6905721 = boost
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.019091038 = queryNorm
              0.6054778 = fieldWeight in 1081, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.078125 = fieldNorm(doc=1081)
          0.1471504 = weight(abstract_txt:open in 1081) [ClassicSimilarity], result of:
            0.1471504 = score(doc=1081,freq=1.0), product of:
              0.39085147 = queryWeight, product of:
                4.248372 = boost
                4.8190303 = idf(docFreq=974, maxDocs=44421)
                0.019091038 = queryNorm
              0.37648675 = fieldWeight in 1081, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8190303 = idf(docFreq=974, maxDocs=44421)
                0.078125 = fieldNorm(doc=1081)
          0.25697002 = weight(abstract_txt:data in 1081) [ClassicSimilarity], result of:
            0.25697002 = score(doc=1081,freq=7.0), product of:
              0.37331012 = queryWeight, product of:
                5.8717365 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.019091038 = queryNorm
              0.6883553 = fieldWeight in 1081, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.078125 = fieldNorm(doc=1081)
        0.28 = coord(7/25)
    
  3. Fry, J.; Schroeder, R.; Besten, M. den: Open science in e-science : contingency or policy? (2009) 0.18
    0.17849508 = sum of:
      0.17849508 = product of:
        0.55779713 = sum of:
          0.082644254 = weight(abstract_txt:openly in 3681) [ClassicSimilarity], result of:
            0.082644254 = score(doc=3681,freq=1.0), product of:
              0.17642002 = queryWeight, product of:
                1.0788016 = boost
                8.565973 = idf(docFreq=22, maxDocs=44421)
                0.019091038 = queryNorm
              0.46845168 = fieldWeight in 3681, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.565973 = idf(docFreq=22, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3681)
          0.011568157 = weight(abstract_txt:with in 3681) [ClassicSimilarity], result of:
            0.011568157 = score(doc=3681,freq=2.0), product of:
              0.059922818 = queryWeight, product of:
                1.2574588 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.019091038 = queryNorm
              0.19305095 = fieldWeight in 3681, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3681)
          0.038994476 = weight(abstract_txt:integration in 3681) [ClassicSimilarity], result of:
            0.038994476 = score(doc=3681,freq=1.0), product of:
              0.13471568 = queryWeight, product of:
                1.333189 = boost
                5.2929387 = idf(docFreq=606, maxDocs=44421)
                0.019091038 = queryNorm
              0.2894576 = fieldWeight in 3681, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2929387 = idf(docFreq=606, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3681)
          0.06303946 = weight(abstract_txt:address in 3681) [ClassicSimilarity], result of:
            0.06303946 = score(doc=3681,freq=1.0), product of:
              0.21241646 = queryWeight, product of:
                2.0503235 = boost
                5.4267054 = idf(docFreq=530, maxDocs=44421)
                0.019091038 = queryNorm
              0.29677296 = fieldWeight in 3681, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4267054 = idf(docFreq=530, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3681)
          0.07994015 = weight(abstract_txt:discuss in 3681) [ClassicSimilarity], result of:
            0.07994015 = score(doc=3681,freq=1.0), product of:
              0.27390674 = queryWeight, product of:
                2.6884317 = boost
                5.3367167 = idf(docFreq=580, maxDocs=44421)
                0.019091038 = queryNorm
              0.2918517 = fieldWeight in 3681, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3367167 = idf(docFreq=580, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3681)
          0.09051178 = weight(abstract_txt:issues in 3681) [ClassicSimilarity], result of:
            0.09051178 = score(doc=3681,freq=3.0), product of:
              0.22224179 = queryWeight, product of:
                2.7074816 = boost
                4.299626 = idf(docFreq=1638, maxDocs=44421)
                0.019091038 = queryNorm
              0.40726712 = fieldWeight in 3681, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.299626 = idf(docFreq=1638, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3681)
          0.045427423 = weight(abstract_txt:will in 3681) [ClassicSimilarity], result of:
            0.045427423 = score(doc=3681,freq=1.0), product of:
              0.21511394 = queryWeight, product of:
                2.917948 = boost
                3.8615482 = idf(docFreq=2539, maxDocs=44421)
                0.019091038 = queryNorm
              0.21117842 = fieldWeight in 3681, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.8615482 = idf(docFreq=2539, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3681)
          0.14567146 = weight(abstract_txt:open in 3681) [ClassicSimilarity], result of:
            0.14567146 = score(doc=3681,freq=2.0), product of:
              0.39085147 = queryWeight, product of:
                4.248372 = boost
                4.8190303 = idf(docFreq=974, maxDocs=44421)
                0.019091038 = queryNorm
              0.37270284 = fieldWeight in 3681, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.8190303 = idf(docFreq=974, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3681)
        0.32 = coord(8/25)
    
  4. Daquino, M.; Peroni, S.; Shotton, D.; Colavizza, G.; Ghavimi, B.; Lauscher, A.; Mayr, P.; Romanello, M.; Zumstein, P.: ¬The OpenCitations Data Model (2020) 0.18
    0.1777346 = sum of:
      0.1777346 = product of:
        0.7405608 = sum of:
          0.014022725 = weight(abstract_txt:with in 1039) [ClassicSimilarity], result of:
            0.014022725 = score(doc=1039,freq=1.0), product of:
              0.059922818 = queryWeight, product of:
                1.2574588 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.019091038 = queryNorm
              0.23401311 = fieldWeight in 1039, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.09375 = fieldNorm(doc=1039)
          0.066847675 = weight(abstract_txt:integration in 1039) [ClassicSimilarity], result of:
            0.066847675 = score(doc=1039,freq=1.0), product of:
              0.13471568 = queryWeight, product of:
                1.333189 = boost
                5.2929387 = idf(docFreq=606, maxDocs=44421)
                0.019091038 = queryNorm
              0.49621302 = fieldWeight in 1039, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2929387 = idf(docFreq=606, maxDocs=44421)
                0.09375 = fieldNorm(doc=1039)
          0.060580064 = weight(abstract_txt:semantic in 1039) [ClassicSimilarity], result of:
            0.060580064 = score(doc=1039,freq=1.0), product of:
              0.14441451 = queryWeight, product of:
                1.6905721 = boost
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.019091038 = queryNorm
              0.41948736 = fieldWeight in 1039, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.09375 = fieldNorm(doc=1039)
          0.13704026 = weight(abstract_txt:discuss in 1039) [ClassicSimilarity], result of:
            0.13704026 = score(doc=1039,freq=1.0), product of:
              0.27390674 = queryWeight, product of:
                2.6884317 = boost
                5.3367167 = idf(docFreq=580, maxDocs=44421)
                0.019091038 = queryNorm
              0.5003172 = fieldWeight in 1039, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3367167 = idf(docFreq=580, maxDocs=44421)
                0.09375 = fieldNorm(doc=1039)
          0.17658047 = weight(abstract_txt:open in 1039) [ClassicSimilarity], result of:
            0.17658047 = score(doc=1039,freq=1.0), product of:
              0.39085147 = queryWeight, product of:
                4.248372 = boost
                4.8190303 = idf(docFreq=974, maxDocs=44421)
                0.019091038 = queryNorm
              0.45178407 = fieldWeight in 1039, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8190303 = idf(docFreq=974, maxDocs=44421)
                0.09375 = fieldNorm(doc=1039)
          0.28548962 = weight(abstract_txt:data in 1039) [ClassicSimilarity], result of:
            0.28548962 = score(doc=1039,freq=6.0), product of:
              0.37331012 = queryWeight, product of:
                5.8717365 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.019091038 = queryNorm
              0.7647519 = fieldWeight in 1039, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.09375 = fieldNorm(doc=1039)
        0.24 = coord(6/25)
    
  5. Timmermann, M.: ¬A collective challenge : open science from the perspective of Science Europe (2019) 0.16
    0.16108611 = sum of:
      0.16108611 = product of:
        0.80543053 = sum of:
          0.14167586 = weight(abstract_txt:openly in 698) [ClassicSimilarity], result of:
            0.14167586 = score(doc=698,freq=1.0), product of:
              0.17642002 = queryWeight, product of:
                1.0788016 = boost
                8.565973 = idf(docFreq=22, maxDocs=44421)
                0.019091038 = queryNorm
              0.80306 = fieldWeight in 698, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.565973 = idf(docFreq=22, maxDocs=44421)
                0.09375 = fieldNorm(doc=698)
          0.035955854 = weight(abstract_txt:available in 698) [ClassicSimilarity], result of:
            0.035955854 = score(doc=698,freq=1.0), product of:
              0.089099176 = queryWeight, product of:
                1.0842252 = boost
                4.304519 = idf(docFreq=1630, maxDocs=44421)
                0.019091038 = queryNorm
              0.40354866 = fieldWeight in 698, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.304519 = idf(docFreq=1630, maxDocs=44421)
                0.09375 = fieldNorm(doc=698)
          0.014022725 = weight(abstract_txt:with in 698) [ClassicSimilarity], result of:
            0.014022725 = score(doc=698,freq=1.0), product of:
              0.059922818 = queryWeight, product of:
                1.2574588 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.019091038 = queryNorm
              0.23401311 = fieldWeight in 698, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.09375 = fieldNorm(doc=698)
          0.35316095 = weight(abstract_txt:open in 698) [ClassicSimilarity], result of:
            0.35316095 = score(doc=698,freq=4.0), product of:
              0.39085147 = queryWeight, product of:
                4.248372 = boost
                4.8190303 = idf(docFreq=974, maxDocs=44421)
                0.019091038 = queryNorm
              0.90356815 = fieldWeight in 698, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.8190303 = idf(docFreq=974, maxDocs=44421)
                0.09375 = fieldNorm(doc=698)
          0.26061517 = weight(abstract_txt:data in 698) [ClassicSimilarity], result of:
            0.26061517 = score(doc=698,freq=5.0), product of:
              0.37331012 = queryWeight, product of:
                5.8717365 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.019091038 = queryNorm
              0.69811976 = fieldWeight in 698, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.09375 = fieldNorm(doc=698)
        0.2 = coord(5/25)