0:00
Hey everyone, my name is Assa Chohan, welcome to the Tutorials Point
0:04
In the previous video we have learnt all about the support vector machine and in this video we are going to discuss about the random forest
0:11
So let's see what's in for you in this video. We are going to look at why random forest, what is random forest, what is the season tree, how a decision tree constructed
0:22
what are the important terms of random forest, what is the working of random forest and what are the advantages of random forest
0:30
First is why random forest? Random forest uses multiple decision trees so it will reduce the risk of overfitting
0:37
It produces highly accurate predictions for a large data. And it can also maintain the accuracy when a big portion of the data is missing
0:46
Now what is random forest? Before that, can you tell me what is forest
0:51
Obviously, the group of random trees is called forest, correct? Similarly, random forest is constructed using multiple decision trees
1:00
And how does we take the final decision in random forest algorithm
1:04
The final decision is obtained by the majority votes of the decision trees
1:09
Let's understand this with an example. Let's say we have these random decision trees and we provided some unknown root to these decision trees
1:18
Now the first decision tree is predicting that it's a strawberry. Second decision tree is predicting it's a norwich
1:26
And third decision tree is predicting it is a strawberry. So we have majority votes for strawberry
1:31
So our final decision will be the given unknown fruit is a strawberry
1:36
Now before understanding the random forest it important to understand the decision tree because the building blocks of the random forest is a decision tree So what is decision tree
1:48
Decision tree is a tree shape diagram which we use to determine the course of actions
1:53
And each branch of the tree is called sub-tree which represents a possible decision or occurrence
1:59
How are decision three constructed? It's quite similar to our decisions that we take in our day-to-day life
2:05
So let's understand this with this example, we have to predict the vegetables in this group
2:11
So at the first stage, I am determining whether the colour of the vegetable is red or not
2:17
If not, then it is a bringer. And if the colour of the vegetable is red, then what is the diameter of the vegetable
2:24
If the diameter is greater than two than it's a red capsicum and if it's less than two than it's a red chalet
2:32
Now let's see what are the important terms of Windham forest. These are the terms that are also important for decision tree
2:39
First is root node. The first node of the decision tree that represents the entire population of the data set is called the root node
2:49
Leaf nodes are the terminal nodes are the nodes that do not split further
2:53
And also, leaf nodes carries the final classification or decision. For splitting the decision tree, there are many methods like Guinea, entropy, and we have discussed the entropy
3:04
entropy, information gain, guinea index in one of our previous videos, that is, this is free
3:10
And I suggest you all to voice that video for more understanding
3:14
Next is this is a node. This is a node provides a link to the leaf nodes
3:18
Now let understand the working of the random forest with an example We have this data set and in this we have to determine the species of the Q based on some certain features
3:31
And in this case, the features are island, body mass in gram, slipper length in MM
3:37
Now we have to do the random sampling with replacement for the given dataset
3:42
What does this random sampling mean and what does this replacement mean
3:46
Random sampling means that we have to select some random rows and random features and make some random samples
3:53
So here we are selecting random rows and random features and making these three random samples
3:59
What does this replacement mean now? Replacement we can understand from the first random sample
4:04
Here this row is repeating. That means some rows are repeating and maybe some rows are not included
4:11
So that's what replacement means. Now after random sampling with replacement, we have to make the decision trees for every random sample
4:19
So for first random sample, this is the decision frame. In this, at the first stage, we are determining body mass of the pangoon
4:27
If it is greater than 3,500 or not. If not, then it's a chin strip
4:33
If the island is Torgensen, then the species is adele. And if the island is Bisco, then obviously the pangoon is gento
4:42
This is a decision tree for second random sample. And here at the first stage we are determining the flipper length of the penguin
4:50
If it is greater than 190 or not. If not, then what is the island of the penguin
4:57
Again if it is Torgensen then it is an idly species and if it is green then it a chin strap species And if the flipper length is greater than 190 then the species is kento similarly we draw this the season tree for the third random sample
5:14
here at the first stage we are determining the body mass of the pangoon if it is greater than or
5:19
equals to 4 000 if it is then it's a gento species if no then what is the flipper length
5:27
of the penguin if the flipper length is greater than or is equals to one
5:31
then it's a chin strip and if not then it's an adelee
5:37
Now we have this new input data and we have to determine the species of the penguin
5:42
So we get this input to our model that has those three decision trees
5:47
So first decision tree is predicting that its species is chin strip
5:52
Second decision tree is predicting that its species is an edelie and the third one is also predicting that it's a chin strip
6:00
So we have two votes for chin strap and one vote for Adelaide
6:05
That means the final decision of our random forest model will be that it's a chin strap
6:11
Now what are the advantages of random forest? It has low variance. It will reduce the overfitting
6:17
Normalization is not needed in random forest and it has good accuracy
6:22
So that was it for this video. We have already discussed the supervised machine learning algorithm
6:27
Canon algorithm, the season tree, linear regression, support vector machine, and random forest in this video
6:34
In the next video, we are going to discuss about naive base
6:38
and rest all the machine learning algorithms in further videos. So stay tuned with tutorials
6:44
point. Thanks for watching and have a nice day