Machine Learning - Preprocessing Structured Data - Imputers
27K views
Oct 10, 2024
Machine Learning - Preprocessing Structured Data - Imputers https://www.tutorialspoint.com/market/index.asp Get Extra 10% OFF on all courses, Ebooks, and prime packs, USE CODE: YOUTUBE10
View Video Transcript
0:00
In this video, we are going to discuss inputters
0:06
So, inputting refers to using a model to replace missing values. So that means whenever we're having some missing values in the data set, how to replace
0:16
the missing value with some other value using some logic. So that will be handled by this imputing process
0:22
So there are many options we could consider when replacing a missing value
0:26
We're having some examples here. So example here is a cost. constant value that has meaning with the domain such as zero distinct from all other values
0:36
So depending upon the domain, obviously, domain means from the, that is a collection of data
0:40
from where the data value came. So that's why we can select in case of, say, numeric domain, we can select zero that is
0:47
a distinct from all other values. A value from another randomly selected record
0:53
So we can select the respective attribute value which is missing. We can select any other record randomly
0:59
and you can pick up the respective attribute value to replace this missing value
1:04
A mean, median or mode value for the column. A value estimated by another predictive model
1:11
So there are multiple different ways in which we can fill up these missing values
1:16
We can handle with the missing values. Any imputing performed on the training dataset will have to be performed on the new data
1:24
in future when predictions are needed from the finalized model. As example, if you choose to impute with mean column values, these mean column values will need to be stored to a file for the later use on the new data that has missing values
1:42
So if I found, if I decide that we shall be filling up those or replacing those missing values by the column mean in that case, those mean values, respective columns are to be saved in a certain file so that when the new tuples
1:59
will be coming with some missing values in some attributes, then from the file we shall
2:03
pick up the respective missing value, which was nothing but the mean of the column in
2:08
the previous instance of the dataset to replace that one So that has to be maintained Using Python imputors missing values can be replaced by the mean the median or the most
2:21
frequent value using the strategy hyper parameter. So there are multiple different ways with help of which we can deal with the missing values
2:30
So let us go for one practical example for the better understanding on this topic
2:35
So here is the example for you. Let us discuss SK learn
2:40
Pre-Processing Imputer class. So how to use it in our Python code
2:45
to handle with the missing values? So at first we are going for preliminaries
2:50
So we are importing punders, import punders as PD, import Numpai as NP, and also from SK learn
2:57
Pre-processing import inputter. So inputted is nothing but a class under which
3:03
will be defining objects passing some hyper parameters. So at first we are creating one data
3:08
So one data frame we are going to create so we are creating the empty data set at
3:12
first so that is a empty data frame we are creating the data frame object is DF and
3:17
then it is going to have two columns one is the x0 and the one is the X1 so two
3:22
column heads are there here we're having the set of values here we're having
3:25
the set of values you can find that this particular value has been
3:29
initialized with the respective none that is our NP dot NN what is the NP
3:34
dot NN that is a NN pi not a number So here we're having one missing value there
3:40
And here we're printing the respective data frame here. So let me execute my code
3:46
So here you can find that my data set has got my data frame has got printed and here
3:50
we're having one NAN. What is the NN? That is the not a number. So now we are going for fit imputer
3:57
So create an imputer object that looks for NN values, then replaces them with the
4:02
mean value of the of the feature by columns. axis is equal to 0 what is the meaning of axis is equal to 0 that means along
4:11
the column that means not along the row along the column wise will be doing the
4:15
respective replacement so mean imputed we are defining one object under the class
4:20
imputed passing parameters that is a missing values is equal to n say we are having say a missing value means those particular cells having got the value zero so we are passing the respective value here strategy is
4:33
equal to mean so here we are going to replace it by the mean values and then
4:37
axis is equal to zero that is column wise so what are the strategies are there let me go
4:43
for some more detailing about this imputer class so here I'm having this
4:47
help imputer so let me execute it it will give me the I
4:51
idea so help on class imputed in module escalon dot pre-processing dot imputation
4:57
so here we're having the class imputed which is having some set of parameters
5:03
so we're having the missing values is equal to n strategy is equal to mean axis is
5:07
equal to zero Barbus is equal to zero copy is equal to true so here we can find
5:12
that these are the default values are there so now for the missing values the
5:17
integer we can put some integer or Nn option of default will be n and so on and for the strategy the default is mean other
5:26
strategies that they are median and most frequent so if mean then replace
5:32
missing values using the mean along the axis and if median then replace the
5:37
missing values with the median along the axis and if most frequent then
5:42
replace the missing using the most frequent value along the axis so here the
5:47
axis will be by default zero so axis is equal to zero means
5:51
along the columns axis is equal to one means impute along the rows we're having the
5:56
verbose that is an integer optional default is zero controls the verbosity of the imputed
6:02
we're having the copy that is a bullion optional and here the default value is true
6:08
if copy a copy of x will be created if false the imputation would be done in place
6:15
whenever possible note that in the following cases a new copy will will always be made
6:21
made even if the copy is equal to false. So for this particular cases, the new column will be created
6:26
always whenever also the copy is equal to false is there. So this is the respective health
6:33
we are availing against the glass imputed So you can go through it later on for the detailing and this is a code which has to be executed to get the same okay now let me come
6:45
to my point so missing values is equal to n n so we are considering that
6:49
n n will be considered as a missing value strategy is equal to mean and axis
6:54
is equal to zero that is a column wise and train the imputer on the DF
6:59
dataset so now we're going for the training this imputer on the Df
7:04
set so that is a mean underscore imputed so whatever the object we define
7:09
under the class imputed so mean underscore imputed dot df so the data frame has
7:15
been given so the respective columns the n n will be replaced by the mean of
7:21
the column here also you can make median if I if I like to do so so apply the imputer
7:27
to the data set so now impute df is equal to mean imputer
7:30
dot transform df values so here we're having the respective transformation will take place and then we are going for
7:38
the view of the data so now let me execute this one and then we are going for
7:42
this now viewing this one you can find that initially this one is having the
7:47
n-n in the column number X1 attribute name X1 so here you can see that the
7:54
attribute name X1 here the it has been replaced by point 4927
8:00
double three double three so notice that point four nine two seven double three
8:04
is the imputed value replacing the np. np.n value. So in this way in this particular
8:10
program we have given you the demonstration. I'm just putting all the codes in front of
8:15
you so that you can see you can also type the same you can do the experiments at your
8:19
end here also you can go for the median to get it let me do it that one for you
8:23
so we are going for this median I'm not changing this object names so let it be
8:29
there so now if you print it you can find that the nn value has been replaced by the respective median one so i think you have got the idea that how to use this
8:41
escalon dot pre-processing imputer class or to do the imputation on our data set on the null values
8:51
thanks for watching this video
#Machine Learning & Artificial Intelligence
#Reference