Document (#20913)

Author
Wu, X.
Title
Rule induction with extension matrices
Source
Journal of the American Society for Information Science. 49(1998) no.5, S.435-454
Year
1998
Abstract
Presents a heuristic, attribute-based, noise-tolerant data mining program, HCV (Version 2.0), absed on the newly-developed extension matrix approach. Gives a simple example of attribute-based induction to show the difference between the rules in variable-valued logic produced by HCV, the decision tree generated by C4.5 and the decision tree's decompiled rules by C4.5 rules. Outlines the extension matrix approach for data mining. Describes the HCV algorithm in detail. Outlines techniques developed and implemented in the HCV program for noise handling and discretization of continuous domains respectively. Follows these with a performance comparison of HCV with famous ID3-like algorithms including C4.5 and C4.5 rules on a collection of standard databases including the famous MONK's problems
Footnote
Contribution to a special issue devoted to knowledge discovery and data mining
Theme
Data Mining

Similar documents (content)

  1. Yang, H.; King, I.; Lyu, M.R.: ¬The generalized dependency degree between attributes (2007) 0.22
    0.21976417 = sum of:
      0.21976417 = product of:
        0.9156841 = sum of:
          0.03553027 = weight(abstract_txt:tree in 2322) [ClassicSimilarity], result of:
            0.03553027 = score(doc=2322,freq=1.0), product of:
              0.08299723 = queryWeight, product of:
                6.849437 = idf(docFreq=127, maxDocs=44421)
                0.012117379 = queryNorm
              0.42808983 = fieldWeight in 2322, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.849437 = idf(docFreq=127, maxDocs=44421)
                0.0625 = fieldNorm(doc=2322)
          0.008935573 = weight(abstract_txt:with in 2322) [ClassicSimilarity], result of:
            0.008935573 = score(doc=2322,freq=3.0), product of:
              0.033068374 = queryWeight, product of:
                1.0932897 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.012117379 = queryNorm
              0.27021506 = fieldWeight in 2322, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.0625 = fieldNorm(doc=2322)
          0.03871995 = weight(abstract_txt:decision in 2322) [ClassicSimilarity], result of:
            0.03871995 = score(doc=2322,freq=1.0), product of:
              0.11073828 = queryWeight, product of:
                1.6335487 = boost
                5.5944448 = idf(docFreq=448, maxDocs=44421)
                0.012117379 = queryNorm
              0.3496528 = fieldWeight in 2322, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5944448 = idf(docFreq=448, maxDocs=44421)
                0.0625 = fieldNorm(doc=2322)
          0.11369666 = weight(abstract_txt:attribute in 2322) [ClassicSimilarity], result of:
            0.11369666 = score(doc=2322,freq=2.0), product of:
              0.18023111 = queryWeight, product of:
                2.0840018 = boost
                7.1371193 = idf(docFreq=95, maxDocs=44421)
                0.012117379 = queryNorm
              0.63083816 = fieldWeight in 2322, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.1371193 = idf(docFreq=95, maxDocs=44421)
                0.0625 = fieldNorm(doc=2322)
          0.110118404 = weight(abstract_txt:rules in 2322) [ClassicSimilarity], result of:
            0.110118404 = score(doc=2322,freq=3.0), product of:
              0.19418578 = queryWeight, product of:
                3.0591934 = boost
                5.238438 = idf(docFreq=640, maxDocs=44421)
                0.012117379 = queryNorm
              0.5670776 = fieldWeight in 2322, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.238438 = idf(docFreq=640, maxDocs=44421)
                0.0625 = fieldNorm(doc=2322)
          0.6086832 = weight(abstract_txt:c4.5 in 2322) [ClassicSimilarity], result of:
            0.6086832 = score(doc=2322,freq=2.0), product of:
              0.6949211 = queryWeight, product of:
                5.787166 = boost
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.012117379 = queryNorm
              0.8759027 = fieldWeight in 2322, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.0625 = fieldNorm(doc=2322)
        0.24 = coord(6/25)
    
  2. Tang, X.-B.; Liu, G.-C.; Yang, J.; Wei, W.: Knowledge-based financial statement fraud detection system : based on an ontology and a decision tree (2018) 0.16
    0.15673514 = sum of:
      0.15673514 = product of:
        0.7836757 = sum of:
          0.044412836 = weight(abstract_txt:tree in 306) [ClassicSimilarity], result of:
            0.044412836 = score(doc=306,freq=1.0), product of:
              0.08299723 = queryWeight, product of:
                6.849437 = idf(docFreq=127, maxDocs=44421)
                0.012117379 = queryNorm
              0.53511226 = fieldWeight in 306, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.849437 = idf(docFreq=127, maxDocs=44421)
                0.078125 = fieldNorm(doc=306)
          0.020420793 = weight(abstract_txt:developed in 306) [ClassicSimilarity], result of:
            0.020420793 = score(doc=306,freq=1.0), product of:
              0.06229449 = queryWeight, product of:
                1.2252029 = boost
                4.1959753 = idf(docFreq=1817, maxDocs=44421)
                0.012117379 = queryNorm
              0.3278106 = fieldWeight in 306, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1959753 = idf(docFreq=1817, maxDocs=44421)
                0.078125 = fieldNorm(doc=306)
          0.06844784 = weight(abstract_txt:decision in 306) [ClassicSimilarity], result of:
            0.06844784 = score(doc=306,freq=2.0), product of:
              0.11073828 = queryWeight, product of:
                1.6335487 = boost
                5.5944448 = idf(docFreq=448, maxDocs=44421)
                0.012117379 = queryNorm
              0.61810464 = fieldWeight in 306, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.5944448 = idf(docFreq=448, maxDocs=44421)
                0.078125 = fieldNorm(doc=306)
          0.11238912 = weight(abstract_txt:rules in 306) [ClassicSimilarity], result of:
            0.11238912 = score(doc=306,freq=2.0), product of:
              0.19418578 = queryWeight, product of:
                3.0591934 = boost
                5.238438 = idf(docFreq=640, maxDocs=44421)
                0.012117379 = queryNorm
              0.5787711 = fieldWeight in 306, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.238438 = idf(docFreq=640, maxDocs=44421)
                0.078125 = fieldNorm(doc=306)
          0.53800505 = weight(abstract_txt:c4.5 in 306) [ClassicSimilarity], result of:
            0.53800505 = score(doc=306,freq=1.0), product of:
              0.6949211 = queryWeight, product of:
                5.787166 = boost
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.012117379 = queryNorm
              0.7741959 = fieldWeight in 306, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.078125 = fieldNorm(doc=306)
        0.2 = coord(5/25)
    
  3. Fletcher, G.P.; Hinde, C.J.: Using a neural network as a tool for constructing rule based systems (1995) 0.09
    0.09169588 = sum of:
      0.09169588 = product of:
        0.57309926 = sum of:
          0.009028172 = weight(abstract_txt:with in 3282) [ClassicSimilarity], result of:
            0.009028172 = score(doc=3282,freq=1.0), product of:
              0.033068374 = queryWeight, product of:
                1.0932897 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.012117379 = queryNorm
              0.2730153 = fieldWeight in 3282, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.109375 = fieldNorm(doc=3282)
          0.18290322 = weight(abstract_txt:noise in 3282) [ClassicSimilarity], result of:
            0.18290322 = score(doc=3282,freq=1.0), product of:
              0.21468256 = queryWeight, product of:
                2.2744772 = boost
                7.7894444 = idf(docFreq=49, maxDocs=44421)
                0.012117379 = queryNorm
              0.8519705 = fieldWeight in 3282, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7894444 = idf(docFreq=49, maxDocs=44421)
                0.109375 = fieldNorm(doc=3282)
          0.26990837 = weight(abstract_txt:induction in 3282) [ClassicSimilarity], result of:
            0.26990837 = score(doc=3282,freq=1.0), product of:
              0.27826598 = queryWeight, product of:
                2.5894842 = boost
                8.868255 = idf(docFreq=16, maxDocs=44421)
                0.012117379 = queryNorm
              0.96996534 = fieldWeight in 3282, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.868255 = idf(docFreq=16, maxDocs=44421)
                0.109375 = fieldNorm(doc=3282)
          0.11125955 = weight(abstract_txt:rules in 3282) [ClassicSimilarity], result of:
            0.11125955 = score(doc=3282,freq=1.0), product of:
              0.19418578 = queryWeight, product of:
                3.0591934 = boost
                5.238438 = idf(docFreq=640, maxDocs=44421)
                0.012117379 = queryNorm
              0.5729542 = fieldWeight in 3282, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.238438 = idf(docFreq=640, maxDocs=44421)
                0.109375 = fieldNorm(doc=3282)
        0.16 = coord(4/25)
    
  4. Kolluri, V.; Metzler, D.P.: Knowledge guided rule learning (1999) 0.09
    0.090240926 = sum of:
      0.090240926 = product of:
        0.32228902 = sum of:
          0.04226778 = weight(abstract_txt:continuous in 550) [ClassicSimilarity], result of:
            0.04226778 = score(doc=550,freq=2.0), product of:
              0.08959561 = queryWeight, product of:
                1.0389905 = boost
                7.1165 = idf(docFreq=97, maxDocs=44421)
                0.012117379 = queryNorm
              0.47176176 = fieldWeight in 550, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.1165 = idf(docFreq=97, maxDocs=44421)
                0.046875 = fieldNorm(doc=550)
          0.005471898 = weight(abstract_txt:with in 550) [ClassicSimilarity], result of:
            0.005471898 = score(doc=550,freq=2.0), product of:
              0.033068374 = queryWeight, product of:
                1.0932897 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.012117379 = queryNorm
              0.16547224 = fieldWeight in 550, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.046875 = fieldNorm(doc=550)
          0.012252475 = weight(abstract_txt:developed in 550) [ClassicSimilarity], result of:
            0.012252475 = score(doc=550,freq=1.0), product of:
              0.06229449 = queryWeight, product of:
                1.2252029 = boost
                4.1959753 = idf(docFreq=1817, maxDocs=44421)
                0.012117379 = queryNorm
              0.19668634 = fieldWeight in 550, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1959753 = idf(docFreq=1817, maxDocs=44421)
                0.046875 = fieldNorm(doc=550)
          0.038995184 = weight(abstract_txt:mining in 550) [ClassicSimilarity], result of:
            0.038995184 = score(doc=550,freq=1.0), product of:
              0.13478485 = queryWeight, product of:
                1.802203 = boost
                6.1720386 = idf(docFreq=251, maxDocs=44421)
                0.012117379 = queryNorm
              0.2893143 = fieldWeight in 550, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1720386 = idf(docFreq=251, maxDocs=44421)
                0.046875 = fieldNorm(doc=550)
          0.104437046 = weight(abstract_txt:attribute in 550) [ClassicSimilarity], result of:
            0.104437046 = score(doc=550,freq=3.0), product of:
              0.18023111 = queryWeight, product of:
                2.0840018 = boost
                7.1371193 = idf(docFreq=95, maxDocs=44421)
                0.012117379 = queryNorm
              0.5794618 = fieldWeight in 550, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.1371193 = idf(docFreq=95, maxDocs=44421)
                0.046875 = fieldNorm(doc=550)
          0.07118194 = weight(abstract_txt:extension in 550) [ClassicSimilarity], result of:
            0.07118194 = score(doc=550,freq=1.0), product of:
              0.23045035 = queryWeight, product of:
                2.8861408 = boost
                6.58948 = idf(docFreq=165, maxDocs=44421)
                0.012117379 = queryNorm
              0.30888188 = fieldWeight in 550, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.58948 = idf(docFreq=165, maxDocs=44421)
                0.046875 = fieldNorm(doc=550)
          0.047682665 = weight(abstract_txt:rules in 550) [ClassicSimilarity], result of:
            0.047682665 = score(doc=550,freq=1.0), product of:
              0.19418578 = queryWeight, product of:
                3.0591934 = boost
                5.238438 = idf(docFreq=640, maxDocs=44421)
                0.012117379 = queryNorm
              0.2455518 = fieldWeight in 550, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.238438 = idf(docFreq=640, maxDocs=44421)
                0.046875 = fieldNorm(doc=550)
        0.28 = coord(7/25)
    
  5. Methodologies for knowledge discovery and data mining : Third Pacific-Asia Conference, PAKDD'99, Beijing, China, April 26-28, 1999, Proceedings (1999) 0.08
    0.08301978 = sum of:
      0.08301978 = product of:
        0.51887363 = sum of:
          0.009028172 = weight(abstract_txt:with in 4821) [ClassicSimilarity], result of:
            0.009028172 = score(doc=4821,freq=1.0), product of:
              0.033068374 = queryWeight, product of:
                1.0932897 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.012117379 = queryNorm
              0.2730153 = fieldWeight in 4821, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.109375 = fieldNorm(doc=4821)
          0.12867755 = weight(abstract_txt:mining in 4821) [ClassicSimilarity], result of:
            0.12867755 = score(doc=4821,freq=2.0), product of:
              0.13478485 = queryWeight, product of:
                1.802203 = boost
                6.1720386 = idf(docFreq=251, maxDocs=44421)
                0.012117379 = queryNorm
              0.9546885 = fieldWeight in 4821, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.1720386 = idf(docFreq=251, maxDocs=44421)
                0.109375 = fieldNorm(doc=4821)
          0.26990837 = weight(abstract_txt:induction in 4821) [ClassicSimilarity], result of:
            0.26990837 = score(doc=4821,freq=1.0), product of:
              0.27826598 = queryWeight, product of:
                2.5894842 = boost
                8.868255 = idf(docFreq=16, maxDocs=44421)
                0.012117379 = queryNorm
              0.96996534 = fieldWeight in 4821, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.868255 = idf(docFreq=16, maxDocs=44421)
                0.109375 = fieldNorm(doc=4821)
          0.11125955 = weight(abstract_txt:rules in 4821) [ClassicSimilarity], result of:
            0.11125955 = score(doc=4821,freq=1.0), product of:
              0.19418578 = queryWeight, product of:
                3.0591934 = boost
                5.238438 = idf(docFreq=640, maxDocs=44421)
                0.012117379 = queryNorm
              0.5729542 = fieldWeight in 4821, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.238438 = idf(docFreq=640, maxDocs=44421)
                0.109375 = fieldNorm(doc=4821)
        0.16 = coord(4/25)