Machine Learning - The Learning Process
3K views
Oct 17, 2024
Machine Learning - The Learning Process https://www.tutorialspoint.com/market/index.asp Get Extra 10% OFF on all courses, Ebooks, and prime packs, USE CODE: YOUTUBE10
View Video Transcript
0:00
In this video we are going to discuss the learning process
0:04
So, what are the different steps through which our data set will be ready and that
0:10
will be given to a model for the learning and that we are going to discuss here
0:15
So this is our real world. So the first step will be there from the real world, data gathering has to be done at first
0:24
Then we will be going for data pre-processing, dimensionality direction, model learning
0:29
and then model testing and then our data will be ready for the ysis of the
0:35
unknown data for the prediction so this is the basic steps in our learning
0:41
process so at first we are going for the data gathering this data gathering can
0:46
be done from different devices from different transactions from camera from sensors from databases and so on so there are multiple different sources from
0:57
where we can gather our data but whenever we shall gather our data that data may not be
1:03
in a good format so that's why you'll be going for data pre-processing where in this
1:09
case we'll be going for noise filtering if you find that some data are corrupted some
1:14
data are biased so in those cases this noise filtering has to be done so that we can get
1:20
our data without any noise next one we are going for the feature extraction and normalization
1:27
After doing this noise filtering, our data will be obtained which is free of noise, but sometimes it may have some missing values
1:36
So some of the attributes, some of the fields for a certain row or record might be having no value
1:43
In those cases, the missing values are to be resolved. Then we'll be going for the feature extraction
1:50
We might be having, say, thousands of attributes, might be having hundreds of attributes
1:54
If you go on working with all the feature extraction, we might be having thousands of attributes, if you go on working with all, those attributes the computational complexity will be too high and the model will be
2:02
functioning very slow and all those attributes may not be required at all so that's
2:08
why we should find out those principal attributes which are required for the
2:13
prediction purpose and which will be required for the learning purpose of the
2:18
module so this particular model should be learned on those particular set of
2:23
attributes which are relevant and principal attributes then the normalization different type of attributes may have different minimum value maximum value The age of a person for the particular database we have found that the age
2:39
of a person is ranging from, say, 20 years to 80 years. The height of the person is ranging from
2:45
say, 4.5 feet to 6.5 feet. So in this way, we are having multiple different attributes
2:51
are they are having different set of or range of data so all these data are to be normalized
2:57
are to be scaled so that they can be processed for learning next we'll be going for the
3:03
dimensionality deduction here we'll be going for feature selection and feature projection so in case of feature selection and projection we should we are trying to
3:13
find those attributes which are relevant for our learning process for our prediction
3:18
process also we can assign some weightages on those attributes so attributes with some
3:25
higher weight age will be more relevant attributes with the lower weightage value will
3:31
be less relevant so these processes are to be done in our dimensionality
3:36
reduction and then we'll be going for this model learning we can go for the
3:41
classification regression clustering descriptions and so on in case of classifications let us suppose there is one
3:48
credit card approval data set where applicants certain personal details are written say their age, their
3:57
sex, their income, their savings, say the respective their whether they are having their own house
4:06
whether they are having their own car. So all these details are there. And then we are having a past
4:12
data which is depicting that whether this particular persons have got the sanction for the
4:18
credit cards yes or no so approved and not approved in this way I'm having a set of data
4:24
so on that particular set of data I shall make my model learn now when I shall give some new
4:32
applicants information to my model the model will tell us whether the credit card should be
4:38
approved to him or her or should not be approved to him or her so that is the that is known
4:44
as a classification so classification means we're having two classes in this problem
4:48
whatever I have discussed, that is approval will be done successfully or not approved
4:55
So, approved and not approved. So these are the two outcomes So we be calling them as two classes So what will happen when a new data set will be given then our model will predict whether approved or not approved
5:09
So, it will go for a certain class. So that is known as the classification
5:13
Whenever we are having our data in a continuous fashion, then we can plot X and Y for
5:20
them using some regression and then using some graphical representation and then we can
5:28
draw one line through those points which is known as a regression point so that for a new
5:34
value of x i can predict a new value for y so in that case the regression can help us for the
5:42
prediction purpose we are having this clustering and descriptions so in case of clustering
5:46
what we can do we can select those data sets in a certain group which are having some common
5:53
similarities or having some distance within a certain bound So, that is known as the clustering
6:00
So, in this way, in case of model learning, different types of algorithms can be used
6:05
That is the classification algorithm, regression algorithm, having the clustering algorithm, and so on
6:11
After making the model ready, then we shall go for the model testing
6:15
So here we are going for the cross-validation, bootstrap, and so on
6:19
So this is known as the model testing. In the case of cross-validation, what will happen
6:23
our data set, a portion of the data set will be used for the training and another portion
6:29
of the data set will be used for the testing. So as I am taking our testing tuples from the data set, so we are having our labels and
6:38
outcomes known to us. So now I shall check how far my model is doing the prediction correctly
6:46
So that is known as the respective cross-validation. Now in case of bootstrap, bootstrap aggregating also called bagging is a machine learning
6:57
and simple meta algorithm designed to improve the stability and accuracy of machine learning
7:03
algorithms used in statistical classification and regression. So this particular bootstrap is actually a process with the help of which we can increase
7:16
the ability and the stability and accuracy of our machine learning for the correct prediction
7:23
So, regarding all these aspects in our tutorial we are going to discuss so many different issues
7:28
So in one slide obviously all the issues cannot be discussed So but each and every aspect we have discussed in our tutorial with some examples with some implementations and all
7:40
So training data, that is our past data, that will be given to this model for the
7:45
learning purpose, and now test data, that is our future data, will be given to our model
7:51
for the prediction. So gather data from various sources. So that is our step number one
7:58
data to have homogeneity. Build model select the right machine learning algorithm which is applicable there and gather
8:07
insights from the models results and visualize that is transformed results into some visual graphs
8:15
Obviously having data on in a table and having the graphical representation of the data
8:21
will be more comprehensive. So transform results into the visual graphs and representation
8:28
randomly split examples into training set u and also test set v use training set to learn a
8:36
hypothesis h and measure the percentage of v correctly classified by h and repeat for different
8:45
random splits and average results so we can we can split our data set randomly into training
8:52
set and test set and we can we can go for the performance evaluation and then we can
8:58
average those results to decide the efficiency, ability, accuracy of the model
9:06
So, overfitting, what is overfitting? The model learns the training set too well, it overfits to the training set and performs
9:14
poorly on the test set. So when the model will be tested with our data set which is pre-known to us, it is performing
9:22
okay, the training set is on the training set it is performing okay, is very good performance
9:27
predicting most of the time correctly but when the new data set will be given
9:32
to it it is performing very poorly and in that case we can you can consider
9:36
that overfitting is the problem so underfeiting when the model is too simple
9:41
and both training and test errors are very large so in this way we have
9:46
discussed multiple different issues in this particular video please watch the next video also because that video is also in the continuation of this
9:54
particular video so watch the next video where you'll be doing discussing different aspects into further details and what are the three different categories
10:03
of machine learning model that will be also discussed. Thanks for watching this
#Computer Science
#Credit & Lending
#Credit Reporting & Monitoring
#Machine Learning & Artificial Intelligence