Skip to content Skip to sidebar Skip to footer

3 Top Ways to Handle Missing Value Attributes Using Rapidminer Studio

Handle Missing Value

Hello everyone, have a good day.

In large data sets, defects are often found, such as missing values or missing data. Missing Value is a data record where the value of a few or more attributes is unknown. In the case of missing values, research is often done by imputation or filling in the average values that often appear and also removing the attributes.

In data mining testing of missing value cases, the algorithm from the decision tree is categorized as being able to solve it without having to do data imputation, and there are three types of decisson trees, namely the CART algorithm, ID3 and C4.5 algorithm. As in research evaluating the performance of the decision tree algorithm on data that has missing values by classifying without imputation or filling in missing data

But besides that, there are 3 ways that are often used to overcome a case of missing data. Namely, deleting data or attributes that have missing values, filling them in by looking for them based on the average (replace missing values) and imputing data using the k-NN algorithm (Impute with k-nn).

these three methods will be explained using the Rapidminer Studio application

Example Data Missing Value

The data above is an example of a data that has problems because there are missing values. We will deal with it in all three ways

Okay, let's go straight to the first method

1. Removing Data or Attributes that Have Missing Values

For the first method, we can delete attribute data that has missing values using the example filter operator.
  1. Enter local data to the process page
  2. Look for the Filter Example operator then connect with local data
  3. Click filter example, direct your view to the right side, on the condition class menu select no_missing_attributes
Delete Missing Value Data

2. Using the Replace Missing Value Operator

Okay, we come to the second stage. Well, at this stage, in my personal opinion, this method is not efficient if you use the repalce missing value operator. Because this operator is only limited to values, minimum, maximum, average and fills in the value according to what we want.

But don't worry, I will still share this tutorial, maybe your data hasn't reached thousands of data or it's just an example of data for a college assignment.
  • Enter local data to the process page
  • Find and drag the "replace missing value" operator to the process page, then connect it to the local data.
  • Click the "replace missing value" operator, then point your gaze to the right side bar
  • "Attribute Filter Type" select a subset - then click "select attributes"
Select Attributes

  • Enter the attribute that has the missing value, move to the right using the right arrow in the middle of the dialog box, then click "apply"
  • back again to the right side, in the "default" section select average
  • Finally you run
Replace Missing Value

3. Missing Value Imputation Using k-NN

The difference in this method is that we can combine the imputation technique with the knn algorithm, which will later take the closest k value from the original data. This technique can also overcome missing values that are more than 30% of missing data. So, let's discuss the methods below.
  • Prepare Operators named Impute Missing Value. Then double-click or drag it to the process page.
  • Then on the operators double click 2x. After the new process page appears, you look for Operators k-NN then you connect it as shown below. And if you want to return to the start page, you just click the Process menu.

  • Apart from that, you can also set the imputation level parameters. Like you want to imput only a few attributes, or you impute all attributes or based on the contents of the data in the Parameters dialog box on the right side of the process page.
Impute Missing Value

  • After that, the last step is that you connect all Operators, after that you just click Run or run the process. So then Rapidminer will display the results of the imputation process using k-NN.

After that, the last step is that you connect all Operators, after that you just click Run or run the process. So then Rapidminer will display the results of the imputation process using k-NN.


I would like to inform you that it turns out that we can also combine this imputation technique with the stages or application of cross validation to find information on accuracy, precision and recall in order to determine the performance of the imputation technique.

All you have to do is add Cross Validation operators

Impute Missing Value and Cross Validation

Conclusion

This Missing Value Imputation technique can also be applied as a k-NN imputation technique where this technique functions as data cleansing or enters the data pre-processing stage.

And as another note, that the stages I wrote above are based on my personal experience. If you are still unsure and unsure, it would be best if you first consult a tutor or someone who is truly an expert.

Thank you, I hope this is useful and you can find what you are looking for. "Don't forget to breathe and stay happy in the link of gratitude".

Wassalamulaikum Wr. Wb. See you soon