0:00
Hey everyone, my name is Asta Chohan, welcome to the Tutorials Point
0:03
In the previous video we have talked all about the Canon algorithm
0:08
and in this video we are going to talk about decision tree. So let's see what's in for you in this video
0:14
We are going to talk about the season tree problems that decision tree can solve
0:19
some important terms of decision tree, how does a decision tree work
0:24
and what are the advantages and disadvantages of decision tree. So first, what is the decision tree
0:29
decisionry? Decision tree is nothing but a pre-shaped diagram used to determine a course of action
0:36
It's quite similar to our day-to-day life decisions. Let's understand this with this example
0:43
In this we have to predict whether the person is fit or not. So at the first stage we are determining
0:49
the age of the person whether it is less than 30 or more than 30. If it's less than 30, then does the
0:57
person eats a lot of pizzas, if yes, then the person is unfit and if no, then the person is fit
1:04
If the age of the person is greater than 30, then does the person exercises in the morning
1:10
If yes, then the person is fit and if no, then the person is not fit
1:15
Next is what kind of problems decision tree can solve? So there are two kinds of problems that decision tree can solve
1:21
First is classification and second is regression. A classification tree will determine a set of logical if-then condition to classify the problems
1:33
For example, discriminating fruits based on certain features. While regression tree is used when the target variable is numerical or continuous in nature
1:44
we fit a regression model to the target variable using each of the independent variables
1:50
Now let's discuss some important terms of decision tree. First is root node
1:56
Root node we can say that the first node of the tree that represents the entire population or the sample is called the root node Leaf or terminal node We can say that the nodes that do not split further are called leaf node or terminal node
2:13
And also leaf note carries the final conclusion or decision in the tree. Next is entropy
2:20
Entropy is the measure of randomness or unpredictability in the dataset. From the diagram, we can observe that
2:27
At the first stage there is high randomness. That means its entropy is high
2:33
Next is information gain. And it is the measure of decrease in entropy after the data set is split
2:40
Again, from the same diagram, we can observe that after the split, the randomness of the data set decreases
2:49
That means its entropy decreased. And the information gain after the split is higher
2:55
Now, let's understand how does a decision? decision-free work, the decision of making strategic splits heavily affects a tree's accuracy
3:04
And also the decision criteria for regression and classification are different. Decision-tree uses multiple algorithms to decide to split nodes into two or more sub-notes
3:18
and the creation of sub-notes increases the homogeneity of the resulting sub-notes
3:23
Or in the other words, we can say that the purity of nodes in
3:27
with respect to the target variables. The decision tree splits the nodes on all available variables and then selects the split
3:37
which results in most homogeneous node. There is one more term which is important that is attribute selection measure
3:45
If the data set consists of an attributes, then deciding which attribute to place at the root
3:52
or at any different label as the internal node is a complicated step
3:57
So just by randomly selecting any node to be the root can solve any problem So for that we have different methods and some of them are entropy information gain and
4:10
green index. So let's discuss these three in detail. Let's discuss the entropy first
4:17
Entropy is the measure of randomness in the information being processed. The higher the entropy, the harder it is to draw any conclusion from the information
4:27
We can observe from this graph also. For the probability of an event when it is 0 or 1, the entropy is lowest, that is 0
4:37
And for the highest randomness, that means the probability is 0.5, the entropy is highest, that is 1
4:46
Now let's discuss the mathematical representation of the entropy. And before diving into this, I want you to note that you don't have to memorize all this mathematics
4:55
as it will be taken care of while we are using the Python scripts
4:59
So, mathematically entropy of S is represented as sigma i is equal to 1 to C minus PI log base 2, PI, where S is the current state and PI is the probability of the event I of state S
5:17
So using this mathematical representation, we calculate the entropy of play golf for this dataset is coming out 0.94
5:25
Now, the mathematical representation of entropy of t for any attribute X is represented as
5:34
Sigma C belongs to X PC, EC. And using this representation, the entropy of playing golf for attribute outlook is coming out
5:44
0.69, where T is the current state and X is the selected attribute
5:51
Now next is information gain. Information gain is a statistical property that measures how well a given attribute separates the training examples according to their target classification Constructing a decision tree is all about finding an attribute that returns the highest information gain and lowest entropy
6:14
Now let's see the mathematical representation of the information gain. As I told you before also, information gain is the decrease in entropy
6:23
So mathematically it is represented as entropy before minus sigma j is equals to 1 to k entropy j
6:32
after where before is the data set before split is the number of subset and j is a subset after the split
6:42
So for the previous example, the information gain of play golf for attribute outlook is coming out 0.247
6:51
Now let's talk about the guinea index. You can understand the guinea index as a cost function used to evaluate splits in the data set
6:59
And it is calculated by subtracting the subducting the set. subtracting the sum of squared probabilities of each class from one
7:08
And it performs only on binary splits. Now what are the advantages of decision tree
7:14
First is simple to understand, interpret and visualize. Little efforts required for data preparation and it can handle both numerical and categorical data
7:24
Non-linear parameters don't affect its performance. Now what are the disadvantages of decision tree
7:31
What are fitting occurs when the algorithm captures noise in the data, high variance, the model
7:36
can get unstable due to small variation in data, low biased, a highly complicated decision
7:42
tree tends to have a low bias which make it difficult for the model to work with new data
7:48
So that was it for this video. We have already covered the supervised machine learning algorithm, Canon algorithm, decision
7:56
tree in this video and in the next video we are going to discuss about linear regular
8:01
and rest all the machine learning algorithms in further videos. So stay tuned with tutorials points. Thanks for watching and have a nice day