Document (#36221)

Author
Tsui, E.
Wang, W.M.
Cheung, C.F.
Lau, A.S.M.
Title
¬A concept-relationship acquisition and inference approach for hierarchical taxonomy construction from tags
Source
Information processing and management. 46(2010) no.1, S.44-57
Year
2010
Abstract
Taxonomy construction is a resource-demanding, top-down, and time consuming effort. It does not always cater for the prevailing context of the captured information. This paper proposes a novel approach to automatically convert tags into a hierarchical taxonomy. Folksonomy describes the process by which many users add metadata in the form of keywords or tags to shared content. Using folksonomy as a knowledge source for nominating tags, the proposed method first converts the tags into a hierarchy. This serves to harness a core set of taxonomy terms; the generated hierarchical structure facilitates users' information navigation behavior and permits personalizations. Newly acquired tags are then progressively integrated into a taxonomy in a largely automated way to complete the taxonomy creation process. Common taxonomy construction techniques are based on 3 main approaches: clustering, lexico-syntactic pattern matching, and automatic acquisition from machine-readable dictionaries. In contrast to these prevailing approaches, this paper proposes a taxonomy construction analysis based on heuristic rules and deep syntactic analysis. The proposed method requires only a relatively small corpus to create a preliminary taxonomy. The approach has been evaluated using an expert-defined taxonomy in the environmental protection domain and encouraging results were yielded.
Theme
Social tagging

Similar documents (author)

  1. Wang, W.M.; Cheung, C.F.; Lee, W.B.; Kwok, S.K.: Mining knowledge from natural language texts using fuzzy associated concept mapping (2008) 3.49
    3.4909225 = sum of:
      3.4909225 = sum of:
        0.91792095 = weight(author_txt:wang in 3121) [ClassicSimilarity], result of:
          0.91792095 = score(doc=3121,freq=1.0), product of:
            0.44936365 = queryWeight, product of:
              6.5366817 = idf(docFreq=174, maxDocs=44421)
              0.06874492 = queryNorm
            2.042713 = fieldWeight in 3121, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              6.5366817 = idf(docFreq=174, maxDocs=44421)
              0.3125 = fieldNorm(doc=3121)
        2.5730014 = weight(author_txt:cheung in 3121) [ClassicSimilarity], result of:
          2.5730014 = score(doc=3121,freq=1.0), product of:
            0.8933489 = queryWeight, product of:
              1.4099755 = boost
              9.216561 = idf(docFreq=11, maxDocs=44421)
              0.06874492 = queryNorm
            2.8801754 = fieldWeight in 3121, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              9.216561 = idf(docFreq=11, maxDocs=44421)
              0.3125 = fieldNorm(doc=3121)
    
  2. Cheung, W.; Hsu, C.: ¬The model-assisted global query system for multiple databases in distributed enterprises (1996) 2.06
    2.058401 = sum of:
      2.058401 = product of:
        4.116802 = sum of:
          4.116802 = weight(author_txt:cheung in 348) [ClassicSimilarity], result of:
            4.116802 = score(doc=348,freq=1.0), product of:
              0.8933489 = queryWeight, product of:
                1.4099755 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.06874492 = queryNorm
              4.6082807 = fieldWeight in 348, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.5 = fieldNorm(doc=348)
        0.5 = coord(1/2)
    
  3. Cheung, C.M.K.; Lee, M.K.O.: Understanding consumer trust in Internet shopping : a multidisciplinary approach (2006) 2.06
    2.058401 = sum of:
      2.058401 = product of:
        4.116802 = sum of:
          4.116802 = weight(author_txt:cheung in 280) [ClassicSimilarity], result of:
            4.116802 = score(doc=280,freq=1.0), product of:
              0.8933489 = queryWeight, product of:
                1.4099755 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.06874492 = queryNorm
              4.6082807 = fieldWeight in 280, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.5 = fieldNorm(doc=280)
        0.5 = coord(1/2)
    
  4. Cheung, C.M.K.; Lee, M.K.O.: ¬The structure of Web-based information systems satisfaction : testing of competing models (2008) 2.06
    2.058401 = sum of:
      2.058401 = product of:
        4.116802 = sum of:
          4.116802 = weight(author_txt:cheung in 3005) [ClassicSimilarity], result of:
            4.116802 = score(doc=3005,freq=1.0), product of:
              0.8933489 = queryWeight, product of:
                1.4099755 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.06874492 = queryNorm
              4.6082807 = fieldWeight in 3005, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.5 = fieldNorm(doc=3005)
        0.5 = coord(1/2)
    
  5. Cheung, C.M.K.; Lee, M.K.O.: User satisfaction with an internet-based portal : an asymmetric and nonlinear approach (2009) 2.06
    2.058401 = sum of:
      2.058401 = product of:
        4.116802 = sum of:
          4.116802 = weight(author_txt:cheung in 3701) [ClassicSimilarity], result of:
            4.116802 = score(doc=3701,freq=1.0), product of:
              0.8933489 = queryWeight, product of:
                1.4099755 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.06874492 = queryNorm
              4.6082807 = fieldWeight in 3701, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.5 = fieldNorm(doc=3701)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Alexander, F.: Assessing information taxonomies using epistemology and the sociology of science (2012) 0.21
    0.20562346 = sum of:
      0.20562346 = product of:
        0.7343695 = sum of:
          0.012033094 = weight(abstract_txt:process in 1397) [ClassicSimilarity], result of:
            0.012033094 = score(doc=1397,freq=1.0), product of:
              0.054343775 = queryWeight, product of:
                1.0926778 = boost
                4.048922 = idf(docFreq=2105, maxDocs=44421)
                0.012283391 = queryNorm
              0.22142543 = fieldWeight in 1397, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.048922 = idf(docFreq=2105, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1397)
          0.01762411 = weight(abstract_txt:approaches in 1397) [ClassicSimilarity], result of:
            0.01762411 = score(doc=1397,freq=1.0), product of:
              0.07008683 = queryWeight, product of:
                1.2408961 = boost
                4.5981455 = idf(docFreq=1215, maxDocs=44421)
                0.012283391 = queryNorm
              0.2514611 = fieldWeight in 1397, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5981455 = idf(docFreq=1215, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1397)
          0.017738396 = weight(abstract_txt:proposed in 1397) [ClassicSimilarity], result of:
            0.017738396 = score(doc=1397,freq=1.0), product of:
              0.070389494 = queryWeight, product of:
                1.2435726 = boost
                4.608063 = idf(docFreq=1203, maxDocs=44421)
                0.012283391 = queryNorm
              0.25200346 = fieldWeight in 1397, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.608063 = idf(docFreq=1203, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1397)
          0.0137415165 = weight(abstract_txt:into in 1397) [ClassicSimilarity], result of:
            0.0137415165 = score(doc=1397,freq=1.0), product of:
              0.06796497 = queryWeight, product of:
                1.4965988 = boost
                3.697102 = idf(docFreq=2993, maxDocs=44421)
                0.012283391 = queryNorm
              0.20218527 = fieldWeight in 1397, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.697102 = idf(docFreq=2993, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1397)
          0.024661766 = weight(abstract_txt:approach in 1397) [ClassicSimilarity], result of:
            0.024661766 = score(doc=1397,freq=3.0), product of:
              0.069593884 = queryWeight, product of:
                1.5144271 = boost
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.012283391 = queryNorm
              0.35436687 = fieldWeight in 1397, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1397)
          0.058740173 = weight(abstract_txt:construction in 1397) [ClassicSimilarity], result of:
            0.058740173 = score(doc=1397,freq=1.0), product of:
              0.19702972 = queryWeight, product of:
                2.9423754 = boost
                5.4514923 = idf(docFreq=517, maxDocs=44421)
                0.012283391 = queryNorm
              0.2981285 = fieldWeight in 1397, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4514923 = idf(docFreq=517, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1397)
          0.58983046 = weight(abstract_txt:taxonomy in 1397) [ClassicSimilarity], result of:
            0.58983046 = score(doc=1397,freq=6.0), product of:
              0.68494546 = queryWeight, product of:
                8.674215 = boost
                6.428468 = idf(docFreq=194, maxDocs=44421)
                0.012283391 = queryNorm
              0.86113495 = fieldWeight in 1397, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.428468 = idf(docFreq=194, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1397)
        0.28 = coord(7/25)
    
  2. Esteban, M.A.: ¬Los lenguajes documentales ante el paso de la organizacion de la realidad y el saber a la organizacion del conocimiento (1995) 0.19
    0.187305 = sum of:
      0.187305 = product of:
        0.7804375 = sum of:
          0.0787757 = weight(abstract_txt:permits in 6798) [ClassicSimilarity], result of:
            0.0787757 = score(doc=6798,freq=1.0), product of:
              0.09508889 = queryWeight, product of:
                1.0220381 = boost
                7.574333 = idf(docFreq=61, maxDocs=44421)
                0.012283391 = queryNorm
              0.8284427 = fieldWeight in 6798, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.574333 = idf(docFreq=61, maxDocs=44421)
                0.109375 = fieldNorm(doc=6798)
          0.03524822 = weight(abstract_txt:approaches in 6798) [ClassicSimilarity], result of:
            0.03524822 = score(doc=6798,freq=1.0), product of:
              0.07008683 = queryWeight, product of:
                1.2408961 = boost
                4.5981455 = idf(docFreq=1215, maxDocs=44421)
                0.012283391 = queryNorm
              0.5029222 = fieldWeight in 6798, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5981455 = idf(docFreq=1215, maxDocs=44421)
                0.109375 = fieldNorm(doc=6798)
          0.05401496 = weight(abstract_txt:proposes in 6798) [ClassicSimilarity], result of:
            0.05401496 = score(doc=6798,freq=1.0), product of:
              0.09315817 = queryWeight, product of:
                1.4306312 = boost
                5.30121 = idf(docFreq=601, maxDocs=44421)
                0.012283391 = queryNorm
              0.57981986 = fieldWeight in 6798, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.30121 = idf(docFreq=601, maxDocs=44421)
                0.109375 = fieldNorm(doc=6798)
          0.028476955 = weight(abstract_txt:approach in 6798) [ClassicSimilarity], result of:
            0.028476955 = score(doc=6798,freq=1.0), product of:
              0.069593884 = queryWeight, product of:
                1.5144271 = boost
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.012283391 = queryNorm
              0.40918761 = fieldWeight in 6798, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.109375 = fieldNorm(doc=6798)
          0.10232715 = weight(abstract_txt:hierarchical in 6798) [ClassicSimilarity], result of:
            0.10232715 = score(doc=6798,freq=1.0), product of:
              0.16326858 = queryWeight, product of:
                2.3196056 = boost
                5.7302055 = idf(docFreq=391, maxDocs=44421)
                0.012283391 = queryNorm
              0.62674123 = fieldWeight in 6798, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7302055 = idf(docFreq=391, maxDocs=44421)
                0.109375 = fieldNorm(doc=6798)
          0.48159456 = weight(abstract_txt:taxonomy in 6798) [ClassicSimilarity], result of:
            0.48159456 = score(doc=6798,freq=1.0), product of:
              0.68494546 = queryWeight, product of:
                8.674215 = boost
                6.428468 = idf(docFreq=194, maxDocs=44421)
                0.012283391 = queryNorm
              0.70311373 = fieldWeight in 6798, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.428468 = idf(docFreq=194, maxDocs=44421)
                0.109375 = fieldNorm(doc=6798)
        0.24 = coord(6/25)
    
  3. Wu, Y.; Yang, L.: Construction and evaluation of an oil spill semantic relation taxonomy for supporting knowledge discovery (2015) 0.18
    0.17858268 = sum of:
      0.17858268 = product of:
        1.1161418 = sum of:
          0.0282966 = weight(abstract_txt:method in 3202) [ClassicSimilarity], result of:
            0.0282966 = score(doc=3202,freq=1.0), product of:
              0.06709121 = queryWeight, product of:
                1.2140876 = boost
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.012283391 = queryNorm
              0.42176312 = fieldWeight in 3202, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.09375 = fieldNorm(doc=3202)
          0.03040868 = weight(abstract_txt:proposed in 3202) [ClassicSimilarity], result of:
            0.03040868 = score(doc=3202,freq=1.0), product of:
              0.070389494 = queryWeight, product of:
                1.2435726 = boost
                4.608063 = idf(docFreq=1203, maxDocs=44421)
                0.012283391 = queryNorm
              0.43200594 = fieldWeight in 3202, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.608063 = idf(docFreq=1203, maxDocs=44421)
                0.09375 = fieldNorm(doc=3202)
          0.04629853 = weight(abstract_txt:proposes in 3202) [ClassicSimilarity], result of:
            0.04629853 = score(doc=3202,freq=1.0), product of:
              0.09315817 = queryWeight, product of:
                1.4306312 = boost
                5.30121 = idf(docFreq=601, maxDocs=44421)
                0.012283391 = queryNorm
              0.49698842 = fieldWeight in 3202, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.30121 = idf(docFreq=601, maxDocs=44421)
                0.09375 = fieldNorm(doc=3202)
          1.011138 = weight(abstract_txt:taxonomy in 3202) [ClassicSimilarity], result of:
            1.011138 = score(doc=3202,freq=6.0), product of:
              0.68494546 = queryWeight, product of:
                8.674215 = boost
                6.428468 = idf(docFreq=194, maxDocs=44421)
                0.012283391 = queryNorm
              1.4762313 = fieldWeight in 3202, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.428468 = idf(docFreq=194, maxDocs=44421)
                0.09375 = fieldNorm(doc=3202)
        0.16 = coord(4/25)
    
  4. Cheng, Y.-Y.; Xia, Y.: ¬A systematic review of methods for aligning, mapping, merging taxonomies in information sciences (2023) 0.16
    0.1635121 = sum of:
      0.1635121 = product of:
        0.81756043 = sum of:
          0.013752107 = weight(abstract_txt:process in 2031) [ClassicSimilarity], result of:
            0.013752107 = score(doc=2031,freq=1.0), product of:
              0.054343775 = queryWeight, product of:
                1.0926778 = boost
                4.048922 = idf(docFreq=2105, maxDocs=44421)
                0.012283391 = queryNorm
              0.25305763 = fieldWeight in 2031, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.048922 = idf(docFreq=2105, maxDocs=44421)
                0.0625 = fieldNorm(doc=2031)
          0.02014184 = weight(abstract_txt:approaches in 2031) [ClassicSimilarity], result of:
            0.02014184 = score(doc=2031,freq=1.0), product of:
              0.07008683 = queryWeight, product of:
                1.2408961 = boost
                4.5981455 = idf(docFreq=1215, maxDocs=44421)
                0.012283391 = queryNorm
              0.2873841 = fieldWeight in 2031, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5981455 = idf(docFreq=1215, maxDocs=44421)
                0.0625 = fieldNorm(doc=2031)
          0.015704589 = weight(abstract_txt:into in 2031) [ClassicSimilarity], result of:
            0.015704589 = score(doc=2031,freq=1.0), product of:
              0.06796497 = queryWeight, product of:
                1.4965988 = boost
                3.697102 = idf(docFreq=2993, maxDocs=44421)
                0.012283391 = queryNorm
              0.23106888 = fieldWeight in 2031, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.697102 = idf(docFreq=2993, maxDocs=44421)
                0.0625 = fieldNorm(doc=2031)
          0.039859436 = weight(abstract_txt:approach in 2031) [ClassicSimilarity], result of:
            0.039859436 = score(doc=2031,freq=6.0), product of:
              0.069593884 = queryWeight, product of:
                1.5144271 = boost
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.012283391 = queryNorm
              0.57274336 = fieldWeight in 2031, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.0625 = fieldNorm(doc=2031)
          0.72810245 = weight(abstract_txt:taxonomy in 2031) [ClassicSimilarity], result of:
            0.72810245 = score(doc=2031,freq=7.0), product of:
              0.68494546 = queryWeight, product of:
                8.674215 = boost
                6.428468 = idf(docFreq=194, maxDocs=44421)
                0.012283391 = queryNorm
              1.063008 = fieldWeight in 2031, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                6.428468 = idf(docFreq=194, maxDocs=44421)
                0.0625 = fieldNorm(doc=2031)
        0.2 = coord(5/25)
    
  5. Wang, Z.; Khoo, C.S.G.; Chaudhry, A.S.: Evaluation of the navigation effectiveness of an organizational taxonomy built on a general classification scheme and domain thesauri (2014) 0.15
    0.14828287 = sum of:
      0.14828287 = product of:
        0.92676795 = sum of:
          0.020340681 = weight(abstract_txt:approach in 2251) [ClassicSimilarity], result of:
            0.020340681 = score(doc=2251,freq=1.0), product of:
              0.069593884 = queryWeight, product of:
                1.5144271 = boost
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.012283391 = queryNorm
              0.29227686 = fieldWeight in 2251, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.078125 = fieldNorm(doc=2251)
          0.07309082 = weight(abstract_txt:hierarchical in 2251) [ClassicSimilarity], result of:
            0.07309082 = score(doc=2251,freq=1.0), product of:
              0.16326858 = queryWeight, product of:
                2.3196056 = boost
                5.7302055 = idf(docFreq=391, maxDocs=44421)
                0.012283391 = queryNorm
              0.4476723 = fieldWeight in 2251, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7302055 = idf(docFreq=391, maxDocs=44421)
                0.078125 = fieldNorm(doc=2251)
          0.14534423 = weight(abstract_txt:construction in 2251) [ClassicSimilarity], result of:
            0.14534423 = score(doc=2251,freq=3.0), product of:
              0.19702972 = queryWeight, product of:
                2.9423754 = boost
                5.4514923 = idf(docFreq=517, maxDocs=44421)
                0.012283391 = queryNorm
              0.7376767 = fieldWeight in 2251, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.4514923 = idf(docFreq=517, maxDocs=44421)
                0.078125 = fieldNorm(doc=2251)
          0.6879922 = weight(abstract_txt:taxonomy in 2251) [ClassicSimilarity], result of:
            0.6879922 = score(doc=2251,freq=4.0), product of:
              0.68494546 = queryWeight, product of:
                8.674215 = boost
                6.428468 = idf(docFreq=194, maxDocs=44421)
                0.012283391 = queryNorm
              1.0044482 = fieldWeight in 2251, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.428468 = idf(docFreq=194, maxDocs=44421)
                0.078125 = fieldNorm(doc=2251)
        0.16 = coord(4/25)