Document (#38548)

Author
Barbu, E.
Title
What kind of knowledge is in Wikipedia? : unsupervised extraction of properties for similar concepts
Source
Journal of the Association for Information Science and Technology. 65(2014) no.12, S.2489-2497
Year
2014
Abstract
This article presents a novel method for extracting knowledge from Wikipedia and a classification schema for annotating the extracted knowledge. Unlike the majority of approaches in the literature, we use the raw Wikipedia text for knowledge acquisition. The main assumption made is that the concepts classified under the same node in a taxonomy are described in a comparable way in Wikipedia. The annotation of the extracted knowledge is done at two levels: ontological and logical. The extracted properties are evaluated in the traditional way, that is, by computing the precision of the extraction procedure and in a clustering task. The second method of evaluation is seldom used in the natural language processing community, but it is regularly employed in cognitive psychology.
Theme
Automatisches Klassifizieren
Object
Wikipedia

Similar documents (content)

  1. Boer, V. de; Porter, A.L.; Someren, M. v.: Extracting historical time periods from the Web (2010) 0.21
    0.21306048 = sum of:
      0.21306048 = product of:
        0.88775206 = sum of:
          0.083781585 = weight(abstract_txt:ontological in 975) [ClassicSimilarity], result of:
            0.083781585 = score(doc=975,freq=1.0), product of:
              0.13574448 = queryWeight, product of:
                1.0582954 = boost
                6.5834737 = idf(docFreq=166, maxDocs=44421)
                0.019483196 = queryNorm
              0.6172007 = fieldWeight in 975, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5834737 = idf(docFreq=166, maxDocs=44421)
                0.09375 = fieldNorm(doc=975)
          0.101548 = weight(abstract_txt:annotation in 975) [ClassicSimilarity], result of:
            0.101548 = score(doc=975,freq=1.0), product of:
              0.15431355 = queryWeight, product of:
                1.1283604 = boost
                7.019336 = idf(docFreq=107, maxDocs=44421)
                0.019483196 = queryNorm
              0.65806276 = fieldWeight in 975, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.019336 = idf(docFreq=107, maxDocs=44421)
                0.09375 = fieldNorm(doc=975)
          0.092611685 = weight(abstract_txt:method in 975) [ClassicSimilarity], result of:
            0.092611685 = score(doc=975,freq=3.0), product of:
              0.12677586 = queryWeight, product of:
                1.4463689 = boost
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.019483196 = queryNorm
              0.7305151 = fieldWeight in 975, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.09375 = fieldNorm(doc=975)
          0.0782271 = weight(abstract_txt:concepts in 975) [ClassicSimilarity], result of:
            0.0782271 = score(doc=975,freq=2.0), product of:
              0.1296765 = queryWeight, product of:
                1.4628218 = boost
                4.549982 = idf(docFreq=1275, maxDocs=44421)
                0.019483196 = queryNorm
              0.60324806 = fieldWeight in 975, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.549982 = idf(docFreq=1275, maxDocs=44421)
                0.09375 = fieldNorm(doc=975)
          0.2414812 = weight(abstract_txt:extraction in 975) [ClassicSimilarity], result of:
            0.2414812 = score(doc=975,freq=3.0), product of:
              0.24016787 = queryWeight, product of:
                1.990757 = boost
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.019483196 = queryNorm
              1.0054684 = fieldWeight in 975, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.09375 = fieldNorm(doc=975)
          0.29010245 = weight(abstract_txt:extracted in 975) [ClassicSimilarity], result of:
            0.29010245 = score(doc=975,freq=2.0), product of:
              0.35564864 = queryWeight, product of:
                2.9669962 = boost
                6.1523914 = idf(docFreq=256, maxDocs=44421)
                0.019483196 = queryNorm
              0.8156996 = fieldWeight in 975, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.1523914 = idf(docFreq=256, maxDocs=44421)
                0.09375 = fieldNorm(doc=975)
        0.24 = coord(6/25)
    
  2. Vlachidis, A.; Binding, C.; Tudhope, D.; May, K.: Excavating grey literature : a case study on the rich indexing of archaeological documents via natural language-processing techniques and knowledge-based resources (2010) 0.14
    0.14144121 = sum of:
      0.14144121 = product of:
        0.50514716 = sum of:
          0.055854388 = weight(abstract_txt:ontological in 935) [ClassicSimilarity], result of:
            0.055854388 = score(doc=935,freq=1.0), product of:
              0.13574448 = queryWeight, product of:
                1.0582954 = boost
                6.5834737 = idf(docFreq=166, maxDocs=44421)
                0.019483196 = queryNorm
              0.4114671 = fieldWeight in 935, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5834737 = idf(docFreq=166, maxDocs=44421)
                0.0625 = fieldNorm(doc=935)
          0.067698665 = weight(abstract_txt:annotation in 935) [ClassicSimilarity], result of:
            0.067698665 = score(doc=935,freq=1.0), product of:
              0.15431355 = queryWeight, product of:
                1.1283604 = boost
                7.019336 = idf(docFreq=107, maxDocs=44421)
                0.019483196 = queryNorm
              0.4387085 = fieldWeight in 935, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.019336 = idf(docFreq=107, maxDocs=44421)
                0.0625 = fieldNorm(doc=935)
          0.035646252 = weight(abstract_txt:method in 935) [ClassicSimilarity], result of:
            0.035646252 = score(doc=935,freq=1.0), product of:
              0.12677586 = queryWeight, product of:
                1.4463689 = boost
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.019483196 = queryNorm
              0.2811754 = fieldWeight in 935, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.0625 = fieldNorm(doc=935)
          0.03687661 = weight(abstract_txt:concepts in 935) [ClassicSimilarity], result of:
            0.03687661 = score(doc=935,freq=1.0), product of:
              0.1296765 = queryWeight, product of:
                1.4628218 = boost
                4.549982 = idf(docFreq=1275, maxDocs=44421)
                0.019483196 = queryNorm
              0.28437388 = fieldWeight in 935, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.549982 = idf(docFreq=1275, maxDocs=44421)
                0.0625 = fieldNorm(doc=935)
          0.0795251 = weight(abstract_txt:properties in 935) [ClassicSimilarity], result of:
            0.0795251 = score(doc=935,freq=1.0), product of:
              0.21645293 = queryWeight, product of:
                1.8899161 = boost
                5.878422 = idf(docFreq=337, maxDocs=44421)
                0.019483196 = queryNorm
              0.36740136 = fieldWeight in 935, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.878422 = idf(docFreq=337, maxDocs=44421)
                0.0625 = fieldNorm(doc=935)
          0.1858923 = weight(abstract_txt:extraction in 935) [ClassicSimilarity], result of:
            0.1858923 = score(doc=935,freq=4.0), product of:
              0.24016787 = queryWeight, product of:
                1.990757 = boost
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.019483196 = queryNorm
              0.7740099 = fieldWeight in 935, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.0625 = fieldNorm(doc=935)
          0.04365384 = weight(abstract_txt:knowledge in 935) [ClassicSimilarity], result of:
            0.04365384 = score(doc=935,freq=1.0), product of:
              0.19694982 = queryWeight, product of:
                2.8504183 = boost
                3.5463927 = idf(docFreq=3480, maxDocs=44421)
                0.019483196 = queryNorm
              0.22164954 = fieldWeight in 935, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5463927 = idf(docFreq=3480, maxDocs=44421)
                0.0625 = fieldNorm(doc=935)
        0.28 = coord(7/25)
    
  3. Zarrad, R.; Doggaz, N.; Zagrouba, E.: Wikipedia HTML structure analysis for ontology construction (2018) 0.13
    0.12755321 = sum of:
      0.12755321 = product of:
        0.637766 = sum of:
          0.03687661 = weight(abstract_txt:concepts in 302) [ClassicSimilarity], result of:
            0.03687661 = score(doc=302,freq=1.0), product of:
              0.1296765 = queryWeight, product of:
                1.4628218 = boost
                4.549982 = idf(docFreq=1275, maxDocs=44421)
                0.019483196 = queryNorm
              0.28437388 = fieldWeight in 302, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.549982 = idf(docFreq=1275, maxDocs=44421)
                0.0625 = fieldNorm(doc=302)
          0.1314457 = weight(abstract_txt:extraction in 302) [ClassicSimilarity], result of:
            0.1314457 = score(doc=302,freq=2.0), product of:
              0.24016787 = queryWeight, product of:
                1.990757 = boost
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.019483196 = queryNorm
              0.5473076 = fieldWeight in 302, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.0625 = fieldNorm(doc=302)
          0.06173585 = weight(abstract_txt:knowledge in 302) [ClassicSimilarity], result of:
            0.06173585 = score(doc=302,freq=2.0), product of:
              0.19694982 = queryWeight, product of:
                2.8504183 = boost
                3.5463927 = idf(docFreq=3480, maxDocs=44421)
                0.019483196 = queryNorm
              0.31345978 = fieldWeight in 302, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5463927 = idf(docFreq=3480, maxDocs=44421)
                0.0625 = fieldNorm(doc=302)
          0.1367556 = weight(abstract_txt:extracted in 302) [ClassicSimilarity], result of:
            0.1367556 = score(doc=302,freq=1.0), product of:
              0.35564864 = queryWeight, product of:
                2.9669962 = boost
                6.1523914 = idf(docFreq=256, maxDocs=44421)
                0.019483196 = queryNorm
              0.38452446 = fieldWeight in 302, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1523914 = idf(docFreq=256, maxDocs=44421)
                0.0625 = fieldNorm(doc=302)
          0.2709522 = weight(abstract_txt:wikipedia in 302) [ClassicSimilarity], result of:
            0.2709522 = score(doc=302,freq=2.0), product of:
              0.49010497 = queryWeight, product of:
                4.0217986 = boost
                6.25473 = idf(docFreq=231, maxDocs=44421)
                0.019483196 = queryNorm
              0.55284524 = fieldWeight in 302, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.25473 = idf(docFreq=231, maxDocs=44421)
                0.0625 = fieldNorm(doc=302)
        0.2 = coord(5/25)
    
  4. Auer, S.; Lehmann, J.: What have Innsbruck and Leipzig in common? : extracting semantics from Wiki content (2007) 0.13
    0.12521277 = sum of:
      0.12521277 = product of:
        0.78257984 = sum of:
          0.08146084 = weight(abstract_txt:extracting in 3481) [ClassicSimilarity], result of:
            0.08146084 = score(doc=3481,freq=1.0), product of:
              0.15044458 = queryWeight, product of:
                1.1141254 = boost
                6.930783 = idf(docFreq=117, maxDocs=44421)
                0.019483196 = queryNorm
              0.5414674 = fieldWeight in 3481, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.930783 = idf(docFreq=117, maxDocs=44421)
                0.078125 = fieldNorm(doc=3481)
          0.044557817 = weight(abstract_txt:method in 3481) [ClassicSimilarity], result of:
            0.044557817 = score(doc=3481,freq=1.0), product of:
              0.12677586 = queryWeight, product of:
                1.4463689 = boost
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.019483196 = queryNorm
              0.35146925 = fieldWeight in 3481, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.078125 = fieldNorm(doc=3481)
          0.24175203 = weight(abstract_txt:extracted in 3481) [ClassicSimilarity], result of:
            0.24175203 = score(doc=3481,freq=2.0), product of:
              0.35564864 = queryWeight, product of:
                2.9669962 = boost
                6.1523914 = idf(docFreq=256, maxDocs=44421)
                0.019483196 = queryNorm
              0.6797496 = fieldWeight in 3481, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.1523914 = idf(docFreq=256, maxDocs=44421)
                0.078125 = fieldNorm(doc=3481)
          0.41480917 = weight(abstract_txt:wikipedia in 3481) [ClassicSimilarity], result of:
            0.41480917 = score(doc=3481,freq=3.0), product of:
              0.49010497 = queryWeight, product of:
                4.0217986 = boost
                6.25473 = idf(docFreq=231, maxDocs=44421)
                0.019483196 = queryNorm
              0.846368 = fieldWeight in 3481, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.25473 = idf(docFreq=231, maxDocs=44421)
                0.078125 = fieldNorm(doc=3481)
        0.16 = coord(4/25)
    
  5. Cruys, T. van de; Moirón, B.V.: Semantics-based multiword expression extraction (2007) 0.12
    0.12419353 = sum of:
      0.12419353 = product of:
        0.6209676 = sum of:
          0.0706851 = weight(abstract_txt:clustering in 3919) [ClassicSimilarity], result of:
            0.0706851 = score(doc=3919,freq=1.0), product of:
              0.12120162 = queryWeight, product of:
                6.2208285 = idf(docFreq=239, maxDocs=44421)
                0.019483196 = queryNorm
              0.58320266 = fieldWeight in 3919, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2208285 = idf(docFreq=239, maxDocs=44421)
                0.09375 = fieldNorm(doc=3919)
          0.13011277 = weight(abstract_txt:unsupervised in 3919) [ClassicSimilarity], result of:
            0.13011277 = score(doc=3919,freq=1.0), product of:
              0.18204123 = queryWeight, product of:
                1.225549 = boost
                7.62393 = idf(docFreq=58, maxDocs=44421)
                0.019483196 = queryNorm
              0.71474344 = fieldWeight in 3919, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.62393 = idf(docFreq=58, maxDocs=44421)
                0.09375 = fieldNorm(doc=3919)
          0.07561712 = weight(abstract_txt:method in 3919) [ClassicSimilarity], result of:
            0.07561712 = score(doc=3919,freq=2.0), product of:
              0.12677586 = queryWeight, product of:
                1.4463689 = boost
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.019483196 = queryNorm
              0.5964631 = fieldWeight in 3919, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.09375 = fieldNorm(doc=3919)
          0.13941923 = weight(abstract_txt:extraction in 3919) [ClassicSimilarity], result of:
            0.13941923 = score(doc=3919,freq=1.0), product of:
              0.24016787 = queryWeight, product of:
                1.990757 = boost
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.019483196 = queryNorm
              0.5805074 = fieldWeight in 3919, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.09375 = fieldNorm(doc=3919)
          0.20513341 = weight(abstract_txt:extracted in 3919) [ClassicSimilarity], result of:
            0.20513341 = score(doc=3919,freq=1.0), product of:
              0.35564864 = queryWeight, product of:
                2.9669962 = boost
                6.1523914 = idf(docFreq=256, maxDocs=44421)
                0.019483196 = queryNorm
              0.5767867 = fieldWeight in 3919, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1523914 = idf(docFreq=256, maxDocs=44421)
                0.09375 = fieldNorm(doc=3919)
        0.2 = coord(5/25)