Skip to content Skip to sidebar Skip to footer

C4.5 Based Classification Using RapidMiner Studio

The C4.5 algorithm is an algorithm developed from one of the methods, namely ID3, which is also included in the decision tree family. The development is intended to be able to overcome missing value attributes, be able to overcome continuous data attributes, and there is pruning of the already formed decision trees and the use of gain ratios as the solving criteria. This is also reinforced by the book by Santosa and Umam (2018) entitled "Data Mining and Big Data Analytics".

One of the calculation criteria performed by C4.5 is to determine the value of Entropy, Gain Ratio, and Split Info. See also the analysis and discussion in the article: Discussion of the C4.5 Algorithm.

The error that often occurs when using the C4.5 algorithm is that it does not determine the calculation criteria such as determining the value of the gain ratio, which is also very different from the ID3 algorithm and other decision tree families.

In this article I will share my educational experience regarding the implementation of the C4.5 algorithm using the RapidMiner Studio version 9.7 Beta application. Where this is the latest update by Rapidminer itself. Well, here's how to use the C4.5 algorithm.

First, prepare your training data that meets the criteria for classification with the C4.5 algorithm. Then you import the data set into the rapidminer repository. See also how to import data into the repository here ---.

Second, you drag the imported data set to the process page, then search and find the operator menus in the operator sidebar, including Decision Tree, Apply Model, Split Data, and Performance.

Decision tree

Third, you connect the training data set operator with the split data operator, then you set the parameters in the form of the sampling type and edit the enumeration to distribute the number of values ​​for the training data set and the test data set. This setting parameters, your Klik Add Entry and then klik Apply & Ok

training test

Fourth, you connect the split data operator at the top with the decision tree operator, then you set the parameter criteria which includes the calculation criteria of the C4.5 algorithm, namely by determining the Gain Ratio value. And you can also determine the maximum depth of the decision tree, and can take actions for pruning or pre-pruning.

training test

Fifth, you connect the operators mentioned in step number two as shown in the image below.

operator

Next, you run it by pressing the blue play button at the top of the process page. Then the results shown are in the form of a decision tree model, determining predictions from the test data set and a vector performance page that contains values ​​for precision, accuracy, recall and area under curve (AUC)

Well, that's how to use the C4.5 algorithm with Rapidminer version 9.7 which is already different from version 8.1.

Thank you for visiting and hope it is useful and blessing "Don't Forget to Breathe and Stay Grateful"

See you later