Document (#32912)

Author
Nottelmann, H.
Straccia, U.
Title
Information retrieval and machine learning for probabilistic schema matching
Source
Information processing and management. 43(2007) no.3, S.552-576
Year
2007
Abstract
Schema matching is the problem of finding correspondences (mapping rules, e.g. logical formulae) between heterogeneous schemas e.g. in the data exchange domain, or for distributed IR in federated digital libraries. This paper introduces a probabilistic framework, called sPLMap, for automatically learning schema mapping rules, based on given instances of both schemas. Different techniques, mostly from the IR and machine learning fields, are combined for finding suitable mapping candidates. Our approach gives a probabilistic interpretation of the prediction weights of the candidates, selects the rule set with highest matching probability, and outputs probabilistic rules which are capable to deal with the intrinsic uncertainty of the mapping process. Our approach with different variants has been evaluated on several test sets.
Footnote
Beitrag in: Special issue on Heterogeneous and Distributed IR

Similar documents (content)

  1. Mao, M.: Ontology mapping : towards semantic interoperability in distributed and heterogeneous environments (2008) 0.41
    0.41238737 = sum of:
      0.41238737 = product of:
        1.0309684 = sum of:
          0.02612883 = weight(abstract_txt:different in 659) [ClassicSimilarity], result of:
            0.02612883 = score(doc=659,freq=4.0), product of:
              0.06527585 = queryWeight, product of:
                1.0636925 = boost
                3.6597328 = idf(docFreq=3107, maxDocs=44421)
                0.016768225 = queryNorm
              0.40028328 = fieldWeight in 659, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.6597328 = idf(docFreq=3107, maxDocs=44421)
                0.0546875 = fieldNorm(doc=659)
          0.024172174 = weight(abstract_txt:approach in 659) [ClassicSimilarity], result of:
            0.024172174 = score(doc=659,freq=3.0), product of:
              0.068212286 = queryWeight, product of:
                1.0873544 = boost
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.016768225 = queryNorm
              0.35436687 = fieldWeight in 659, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.0546875 = fieldNorm(doc=659)
          0.12656021 = weight(abstract_txt:correspondences in 659) [ClassicSimilarity], result of:
            0.12656021 = score(doc=659,freq=2.0), product of:
              0.18686903 = queryWeight, product of:
                1.2726046 = boost
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.016768225 = queryNorm
              0.6772669 = fieldWeight in 659, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.0546875 = fieldNorm(doc=659)
          0.0553247 = weight(abstract_txt:machine in 659) [ClassicSimilarity], result of:
            0.0553247 = score(doc=659,freq=2.0), product of:
              0.1356111 = queryWeight, product of:
                1.5331599 = boost
                5.274979 = idf(docFreq=617, maxDocs=44421)
                0.016768225 = queryNorm
              0.40796587 = fieldWeight in 659, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.274979 = idf(docFreq=617, maxDocs=44421)
                0.0546875 = fieldNorm(doc=659)
          0.061790664 = weight(abstract_txt:finding in 659) [ClassicSimilarity], result of:
            0.061790664 = score(doc=659,freq=2.0), product of:
              0.14598149 = queryWeight, product of:
                1.5907016 = boost
                5.4729567 = idf(docFreq=506, maxDocs=44421)
                0.016768225 = queryNorm
              0.42327738 = fieldWeight in 659, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4729567 = idf(docFreq=506, maxDocs=44421)
                0.0546875 = fieldNorm(doc=659)
          0.08526421 = weight(abstract_txt:learning in 659) [ClassicSimilarity], result of:
            0.08526421 = score(doc=659,freq=4.0), product of:
              0.16439205 = queryWeight, product of:
                2.0674064 = boost
                4.7420692 = idf(docFreq=1052, maxDocs=44421)
                0.016768225 = queryNorm
              0.5186638 = fieldWeight in 659, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.7420692 = idf(docFreq=1052, maxDocs=44421)
                0.0546875 = fieldNorm(doc=659)
          0.05746965 = weight(abstract_txt:rules in 659) [ClassicSimilarity], result of:
            0.05746965 = score(doc=659,freq=1.0), product of:
              0.2006082 = queryWeight, product of:
                2.283809 = boost
                5.238438 = idf(docFreq=640, maxDocs=44421)
                0.016768225 = queryNorm
              0.2864771 = fieldWeight in 659, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.238438 = idf(docFreq=640, maxDocs=44421)
                0.0546875 = fieldNorm(doc=659)
          0.08818043 = weight(abstract_txt:matching in 659) [ClassicSimilarity], result of:
            0.08818043 = score(doc=659,freq=1.0), product of:
              0.26687288 = queryWeight, product of:
                2.6341329 = boost
                6.0419855 = idf(docFreq=286, maxDocs=44421)
                0.016768225 = queryNorm
              0.3304211 = fieldWeight in 659, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0419855 = idf(docFreq=286, maxDocs=44421)
                0.0546875 = fieldNorm(doc=659)
          0.34089693 = weight(abstract_txt:mapping in 659) [ClassicSimilarity], result of:
            0.34089693 = score(doc=659,freq=11.0), product of:
              0.3253272 = queryWeight, product of:
                3.3582652 = boost
                5.7772117 = idf(docFreq=373, maxDocs=44421)
                0.016768225 = queryNorm
              1.0478587 = fieldWeight in 659, product of:
                3.3166249 = tf(freq=11.0), with freq of:
                  11.0 = termFreq=11.0
                5.7772117 = idf(docFreq=373, maxDocs=44421)
                0.0546875 = fieldNorm(doc=659)
          0.16518062 = weight(abstract_txt:probabilistic in 659) [ClassicSimilarity], result of:
            0.16518062 = score(doc=659,freq=1.0), product of:
              0.44634974 = queryWeight, product of:
                3.9336205 = boost
                6.7669935 = idf(docFreq=138, maxDocs=44421)
                0.016768225 = queryNorm
              0.37006995 = fieldWeight in 659, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7669935 = idf(docFreq=138, maxDocs=44421)
                0.0546875 = fieldNorm(doc=659)
        0.4 = coord(10/25)
    
  2. Fuhr, N.: Probabilistic datalog : implementing logical information retrieval for advanced applications (2000) 0.18
    0.1848513 = sum of:
      0.1848513 = product of:
        0.7702138 = sum of:
          0.09924844 = weight(abstract_txt:probability in 5380) [ClassicSimilarity], result of:
            0.09924844 = score(doc=5380,freq=1.0), product of:
              0.11538527 = queryWeight, product of:
                6.881186 = idf(docFreq=123, maxDocs=44421)
                0.016768225 = queryNorm
              0.86014825 = fieldWeight in 5380, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.881186 = idf(docFreq=123, maxDocs=44421)
                0.125 = fieldNorm(doc=5380)
          0.11593918 = weight(abstract_txt:weights in 5380) [ClassicSimilarity], result of:
            0.11593918 = score(doc=5380,freq=1.0), product of:
              0.12798372 = queryWeight, product of:
                1.053179 = boost
                7.2471204 = idf(docFreq=85, maxDocs=44421)
                0.016768225 = queryNorm
              0.90589005 = fieldWeight in 5380, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2471204 = idf(docFreq=85, maxDocs=44421)
                0.125 = fieldNorm(doc=5380)
          0.031898998 = weight(abstract_txt:approach in 5380) [ClassicSimilarity], result of:
            0.031898998 = score(doc=5380,freq=1.0), product of:
              0.068212286 = queryWeight, product of:
                1.0873544 = boost
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.016768225 = queryNorm
              0.467643 = fieldWeight in 5380, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.125 = fieldNorm(doc=5380)
          0.014212214 = weight(abstract_txt:with in 5380) [ClassicSimilarity], result of:
            0.014212214 = score(doc=5380,freq=1.0), product of:
              0.04554942 = queryWeight, product of:
                1.0882455 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.016768225 = queryNorm
              0.31201747 = fieldWeight in 5380, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.125 = fieldNorm(doc=5380)
          0.1313592 = weight(abstract_txt:rules in 5380) [ClassicSimilarity], result of:
            0.1313592 = score(doc=5380,freq=1.0), product of:
              0.2006082 = queryWeight, product of:
                2.283809 = boost
                5.238438 = idf(docFreq=640, maxDocs=44421)
                0.016768225 = queryNorm
              0.65480477 = fieldWeight in 5380, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.238438 = idf(docFreq=640, maxDocs=44421)
                0.125 = fieldNorm(doc=5380)
          0.37755573 = weight(abstract_txt:probabilistic in 5380) [ClassicSimilarity], result of:
            0.37755573 = score(doc=5380,freq=1.0), product of:
              0.44634974 = queryWeight, product of:
                3.9336205 = boost
                6.7669935 = idf(docFreq=138, maxDocs=44421)
                0.016768225 = queryNorm
              0.8458742 = fieldWeight in 5380, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7669935 = idf(docFreq=138, maxDocs=44421)
                0.125 = fieldNorm(doc=5380)
        0.24 = coord(6/25)
    
  3. Euzenat, J.; Shvaiko, P.: Ontology matching (2010) 0.17
    0.17457038 = sum of:
      0.17457038 = product of:
        0.7273766 = sum of:
          0.032001153 = weight(abstract_txt:different in 1168) [ClassicSimilarity], result of:
            0.032001153 = score(doc=1168,freq=6.0), product of:
              0.06527585 = queryWeight, product of:
                1.0636925 = boost
                3.6597328 = idf(docFreq=3107, maxDocs=44421)
                0.016768225 = queryNorm
              0.49024493 = fieldWeight in 1168, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                3.6597328 = idf(docFreq=3107, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1168)
          0.006217844 = weight(abstract_txt:with in 1168) [ClassicSimilarity], result of:
            0.006217844 = score(doc=1168,freq=1.0), product of:
              0.04554942 = queryWeight, product of:
                1.0882455 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.016768225 = queryNorm
              0.13650765 = fieldWeight in 1168, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1168)
          0.12656021 = weight(abstract_txt:correspondences in 1168) [ClassicSimilarity], result of:
            0.12656021 = score(doc=1168,freq=2.0), product of:
              0.18686903 = queryWeight, product of:
                1.2726046 = boost
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.016768225 = queryNorm
              0.6772669 = fieldWeight in 1168, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1168)
          0.043692596 = weight(abstract_txt:finding in 1168) [ClassicSimilarity], result of:
            0.043692596 = score(doc=1168,freq=1.0), product of:
              0.14598149 = queryWeight, product of:
                1.5907016 = boost
                5.4729567 = idf(docFreq=506, maxDocs=44421)
                0.016768225 = queryNorm
              0.2993023 = fieldWeight in 1168, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4729567 = idf(docFreq=506, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1168)
          0.30546597 = weight(abstract_txt:matching in 1168) [ClassicSimilarity], result of:
            0.30546597 = score(doc=1168,freq=12.0), product of:
              0.26687288 = queryWeight, product of:
                2.6341329 = boost
                6.0419855 = idf(docFreq=286, maxDocs=44421)
                0.016768225 = queryNorm
              1.1446122 = fieldWeight in 1168, product of:
                3.4641016 = tf(freq=12.0), with freq of:
                  12.0 = termFreq=12.0
                6.0419855 = idf(docFreq=286, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1168)
          0.21343878 = weight(abstract_txt:schema in 1168) [ClassicSimilarity], result of:
            0.21343878 = score(doc=1168,freq=4.0), product of:
              0.3030762 = queryWeight, product of:
                2.8071225 = boost
                6.4387774 = idf(docFreq=192, maxDocs=44421)
                0.016768225 = queryNorm
              0.7042413 = fieldWeight in 1168, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.4387774 = idf(docFreq=192, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1168)
        0.24 = coord(6/25)
    
  4. Chan, L.M.; Zeng, M.L.: Metadata interoperability and standardization - a study of methodology, part II : achieving interoperability at the record and repository levels (2006) 0.14
    0.14322402 = sum of:
      0.14322402 = product of:
        0.59676677 = sum of:
          0.02262823 = weight(abstract_txt:different in 2177) [ClassicSimilarity], result of:
            0.02262823 = score(doc=2177,freq=3.0), product of:
              0.06527585 = queryWeight, product of:
                1.0636925 = boost
                3.6597328 = idf(docFreq=3107, maxDocs=44421)
                0.016768225 = queryNorm
              0.34665546 = fieldWeight in 2177, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.6597328 = idf(docFreq=3107, maxDocs=44421)
                0.0546875 = fieldNorm(doc=2177)
          0.0107696215 = weight(abstract_txt:with in 2177) [ClassicSimilarity], result of:
            0.0107696215 = score(doc=2177,freq=3.0), product of:
              0.04554942 = queryWeight, product of:
                1.0882455 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.016768225 = queryNorm
              0.23643818 = fieldWeight in 2177, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.0546875 = fieldNorm(doc=2177)
          0.080215715 = weight(abstract_txt:federated in 2177) [ClassicSimilarity], result of:
            0.080215715 = score(doc=2177,freq=1.0), product of:
              0.17372228 = queryWeight, product of:
                1.2270226 = boost
                8.443371 = idf(docFreq=25, maxDocs=44421)
                0.016768225 = queryNorm
              0.46174684 = fieldWeight in 2177, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.443371 = idf(docFreq=25, maxDocs=44421)
                0.0546875 = fieldNorm(doc=2177)
          0.18687025 = weight(abstract_txt:schemas in 2177) [ClassicSimilarity], result of:
            0.18687025 = score(doc=2177,freq=3.0), product of:
              0.26669338 = queryWeight, product of:
                2.150037 = boost
                7.3974023 = idf(docFreq=73, maxDocs=44421)
                0.016768225 = queryNorm
              0.70069325 = fieldWeight in 2177, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.3974023 = idf(docFreq=73, maxDocs=44421)
                0.0546875 = fieldNorm(doc=2177)
          0.15092401 = weight(abstract_txt:schema in 2177) [ClassicSimilarity], result of:
            0.15092401 = score(doc=2177,freq=2.0), product of:
              0.3030762 = queryWeight, product of:
                2.8071225 = boost
                6.4387774 = idf(docFreq=192, maxDocs=44421)
                0.016768225 = queryNorm
              0.4979738 = fieldWeight in 2177, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.4387774 = idf(docFreq=192, maxDocs=44421)
                0.0546875 = fieldNorm(doc=2177)
          0.14535892 = weight(abstract_txt:mapping in 2177) [ClassicSimilarity], result of:
            0.14535892 = score(doc=2177,freq=2.0), product of:
              0.3253272 = queryWeight, product of:
                3.3582652 = boost
                5.7772117 = idf(docFreq=373, maxDocs=44421)
                0.016768225 = queryNorm
              0.4468084 = fieldWeight in 2177, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.7772117 = idf(docFreq=373, maxDocs=44421)
                0.0546875 = fieldNorm(doc=2177)
        0.24 = coord(6/25)
    
  5. Euzenat, J.; Bach, T.Le; Barrasa, J.; Bouquet, P.; Bo, J.De; Dieng, R.; Ehrig, M.; Hauswirth, M.; Jarrar, M.; Lara, R.; Maynard, D.; Napoli, A.; Stamou, G.; Stuckenschmidt, H.; Shvaiko, P.; Tessaris, S.; Acker, S. Van; Zaihrayeu, I.: State of the art on ontology alignment (2004) 0.12
    0.11842252 = sum of:
      0.11842252 = product of:
        0.42293757 = sum of:
          0.046272766 = weight(abstract_txt:instances in 1172) [ClassicSimilarity], result of:
            0.046272766 = score(doc=1172,freq=1.0), product of:
              0.120383285 = queryWeight, product of:
                1.0214283 = boost
                7.028639 = idf(docFreq=106, maxDocs=44421)
                0.016768225 = queryNorm
              0.38437867 = fieldWeight in 1172, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.028639 = idf(docFreq=106, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1172)
          0.013064415 = weight(abstract_txt:different in 1172) [ClassicSimilarity], result of:
            0.013064415 = score(doc=1172,freq=1.0), product of:
              0.06527585 = queryWeight, product of:
                1.0636925 = boost
                3.6597328 = idf(docFreq=3107, maxDocs=44421)
                0.016768225 = queryNorm
              0.20014164 = fieldWeight in 1172, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6597328 = idf(docFreq=3107, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1172)
          0.006217844 = weight(abstract_txt:with in 1172) [ClassicSimilarity], result of:
            0.006217844 = score(doc=1172,freq=1.0), product of:
              0.04554942 = queryWeight, product of:
                1.0882455 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.016768225 = queryNorm
              0.13650765 = fieldWeight in 1172, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1172)
          0.03912047 = weight(abstract_txt:machine in 1172) [ClassicSimilarity], result of:
            0.03912047 = score(doc=1172,freq=1.0), product of:
              0.1356111 = queryWeight, product of:
                1.5331599 = boost
                5.274979 = idf(docFreq=617, maxDocs=44421)
                0.016768225 = queryNorm
              0.28847542 = fieldWeight in 1172, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.274979 = idf(docFreq=617, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1172)
          0.042632107 = weight(abstract_txt:learning in 1172) [ClassicSimilarity], result of:
            0.042632107 = score(doc=1172,freq=1.0), product of:
              0.16439205 = queryWeight, product of:
                2.0674064 = boost
                4.7420692 = idf(docFreq=1052, maxDocs=44421)
                0.016768225 = queryNorm
              0.2593319 = fieldWeight in 1172, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7420692 = idf(docFreq=1052, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1172)
          0.12470595 = weight(abstract_txt:matching in 1172) [ClassicSimilarity], result of:
            0.12470595 = score(doc=1172,freq=2.0), product of:
              0.26687288 = queryWeight, product of:
                2.6341329 = boost
                6.0419855 = idf(docFreq=286, maxDocs=44421)
                0.016768225 = queryNorm
              0.46728596 = fieldWeight in 1172, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.0419855 = idf(docFreq=286, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1172)
          0.15092401 = weight(abstract_txt:schema in 1172) [ClassicSimilarity], result of:
            0.15092401 = score(doc=1172,freq=2.0), product of:
              0.3030762 = queryWeight, product of:
                2.8071225 = boost
                6.4387774 = idf(docFreq=192, maxDocs=44421)
                0.016768225 = queryNorm
              0.4979738 = fieldWeight in 1172, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.4387774 = idf(docFreq=192, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1172)
        0.28 = coord(7/25)