0:00
Hey everyone, my name is Asta Chohan
0:02
Welcome to the Tutorials Point. In the previous video, we have learned all about the decision phase
0:07
And in this video, we are going to talk about the linear regression
0:11
So let's see what's in for you in this video. We are going to understand linear regression, regression equation
0:19
prediction using regression line, intuition behind the regression line, best fit line, multiple linear regression
0:26
and what are the applications of linear regression. What is linear regression? From definition, linear regression is a statistical method used to determine the strength and direction of relationship between a dependent variable and one or more independent variables
0:42
It is called linear regression because it assumes that the relationship between the variables is linear
0:49
That means the change in one variable is associated with the proportional change in another variable
0:56
Here we have dependent variable Y and independent variable X and some data points are given
1:03
And a line passing through these data points is called line of regression
1:08
We have two important factors that we have to examine for linear regression
1:13
First is which variable in particular are significant predictors of outcome variables
1:19
That means which independent variable is more significant for dependent variable. Second factor is how significant is the regression line to make predictions with highest possible accuracy
1:33
It is important to have the best fit line for highest possible accuracy Now let talk about the regression equation So the simple linear equation with one dependent variable
1:45
and one independent variable is represented as y is equals to mx plus c
1:51
And I'm pretty sure you are friendly with this equation as this is the straight line equation
1:56
passing through two points having the slope m. So here, y is the dependent variable
2:02
X is the independent variable, and is the slope of flat, which we calculate as y2 minus y1 over x2 minus x1 and c is the coefficient of the line
2:13
So now let's understand the concept of prediction using regression line. With this example, here we have plot between crop yield y and rainfall x and a regression
2:25
line is passing through the given data points. So we can say that this purple point on the y-axis is the amount of crop yield you can expect for
2:35
some amount of rainfall that is represented by this brown point. So that's how we predict the outcomes using the regression line
2:44
Regression line should ideally pass through the mean of x and y
2:49
For this example, let's consider this small data set where x is the independent variable
2:55
and y is the dependent variable. Now for the mean value of x, we have to do 1 plus 2 plus 3 plus 4 plus 5 over 5
3:05
which is coming out 3 and similarly the mean value of y is coming out 4
3:11
So the regression line that is represented by this pink dotted line here is passing through the mean point that is 3 in this case For predicting the outcomes using regression line we need the regression line And for drawing the regression line we need the regression equation
3:27
And before diving into the mathematics, I want you to note that you don't have to memorize
3:33
this mathematics as it will be automatically taken care of while we are writing the Python script
3:40
So for now, just understand the concept behind this. So we have x, we have y, now we have x squared, y squared and x into y
3:51
We get all the summation of these columns. So after putting all these values in the formula of m and c and calculating
4:00
we are getting the value of m as 0.6 and value of c as 2.2
4:05
So now, after having the value of m 0.6 and value of c 2.2
4:12
we finally have a linear equation, using which we can predict the values of y for the corresponding
4:19
values of x, and that is represented by this pink dotted line. Here, these voided points on the line
4:26
are the predicted values of y, and these green points are the actual values of wire, and the
4:32
distance between the actual value and the predicted value is known as residuals or errors
4:39
and the best fit line should have the least sum of the squares which is also known as e square
4:46
Now what we have to do for e square value? We have actual value of y
4:51
We have predicted value of wire. We do the subtraction of these values and got the square of these values And we will get some summation of these squares The sum of squared errors for this regression line is coming out 2
5:06
We check this value for each line and conclude the best fit line having the least e-square value
5:13
For minimizing the distance between the actual value of y and predicted value of y, there are lots of ways
5:18
like sum of squared errors, sum of absolute errors, root knee square errors, etc
5:25
Now multiple linear regression line. Multiple linear regression line is quite similar to the simple linear regression line
5:31
In this we have more than one independent variables. And mathematically it is represented as y is equal to beta 0 plus beta 1 x1
5:41
beta 2 x2, beta and xn plus e. Where beta not is the intercept term
5:47
beta 1, beta 2, beta and are the coefficients of the input features
5:51
of x1, x2, xn respectively, and E is the error term. Now let's see what are the applications of linear regression
6:00
Some of them are mentioned here. Predicting the stock prices and financial markets
6:05
forecasting sales and demand for products, yzing the impact of marketing campaigns
6:11
predicting housing prices based on the property features, yzing the relationship between variables in scientific research
6:18
So that was it for this video. We have already discussed the supervised machine learning algorithm
6:24
Canon algorithm, decision tree and linear regression in this video. And in the next video, we are going to discuss about support vector machine
6:33
and rest all the machine learning algorithms in detail. So stay tuned with tutorials point
6:39
Thanks for watching and have a nice day