Machine Learning - Data Description - Measures of Positions Standard Score and Outliers
10K views
Oct 17, 2024
Machine Learning - Data Description - Measures of Positions Standard Score and Outliers https://www.tutorialspoint.com/market/index.asp Get Extra 10% OFF on all courses, Ebooks, and prime packs, USE CODE: YOUTUBE10
View Video Transcript
0:00
In this session we are discussing measures of position that is standard score and
0:07
outliers. So at first we are discussing standard score also known as Z
0:14
score. A Z score or standard score for a value is obtained by subtracting the
0:21
mean from the value and dividing the result by the standard deviation and the
0:27
symbol for a standard score is z and the formula can be visualized in this way that
0:33
is z is equal to that very particular value minus mean by the standard division
0:39
For samples the formula is x minus x bar by s and for the population the formula will
0:46
be x minus mu by this sigma. So here is stands for the standard division for sample and x bar is the mean for the
0:56
sample. Similarly, mu is the mean for the population and sigma is the mean, sigma is the
1:03
standard division for the population. So, the Z score represents the number of standard
1:10
divisions that a data value falls above or below the mean. So, this is the purpose of
1:16
the Z score, also known as the standard score. Now, we shall discuss outliers. These
1:23
outliers are very important and in our machine learning will be using these terminologies
1:28
for multiple different places An outlier is an extremely high or an extremely low data value when compared with the rest of the data values
1:41
So procedure to find our outlayers. So here we are having five different steps
1:47
Let us go one by one. Addens the data in order and find Q1 and Q2
1:53
So Q1 and Q2 are the first and the third quartile. So step two
1:58
the interquital range, so it will be denoted by the abbreviation IQR, interquartal range
2:07
So that is nothing but Q3 minus Q1. Step 3. Multiply the IQR that is interquartile range by 1.5
2:19
Step 4. Subtract the value obtained in step 3, that is our respective IQR into 1.5
2:27
So, subtract the value obtained in step 3 from Q1 and add that value with the Q2
2:35
Step 5, check the data set for any data value that is smaller than Q1 minus 1.5 into IQR
2:44
or larger than Q3 plus 1.5 into IQR. So those values which will be falling beyond this ranges will be known as outliers
2:55
So I think for the better understanding latest. Let us see one example
3:01
So, example of out layers. So check the following data set for outliers So here we are having a set of data How many data we are having here Here we are having eight data So at first we are calculating the Q1 and Q1 is equal to 9 and
3:17
Q3 is equal to 20 here. So now questions might be asked how this Q1 has become 9 and Q3
3:23
is equal to 20. So as we are having eight number of data and 8 is an even number, so I cannot
3:29
get the middle most data. So to calculate Q2, that is the second quartile, that is the median
3:35
of this data I shall have to do the average of data number 4 and data number 5
3:41
So, the 4th data is our 13 and 5th data is about 15
3:46
So, 13 plus 15 whole by 2, so 14 will be our Q2
3:51
So here we are having this 14. So in the first half, in the first half before this median, in the first up we are having
3:58
how many data we're having 4 data. So to calculate this Q1, I must be doing 4 data means it is the even
4:05
number of numbers. So, I cannot get the middle most value. So, I shall take the average of
4:09
this 6 plus 12. So 6 plus 12, that is the average of 6 and 12, rather. So 6 plus 12
4:18
what is that? That is our 18 by 2, what is that? That is our 9. So Q1 is equal to
4:24
9 here. So as we know that here the 14 is the second quartile, that is a median. So here
4:31
on the and above this particular 14 we're having four data even number of data so to calculate
4:37
this Q3 I must be getting the average of 18 and 22 so 18 plus 22 hold by 2 so I shall be getting here 20 So 20 is our Q3 and 9 is our Q1 So that is the step one we followed Arrange the data in the order
4:56
and find Q1 and Q2. So we have preferred that one. Next we are going for this IQR, that is
5:02
interquartile range. So that is Q3 minus Q1, that is 20 minus 9 is 11. So we followed the step 2
5:10
is the interquital range we have calculated. Now I shall multiply this value with 1.5
5:17
So 1.5 into 11 is equal to 16.5. So step number three, multiply this IQR by 1.5
5:25
It has been done. Next. So the lower limit is equal to 9 minus 16.5, that is minus 7.5
5:33
And upper limit is equal to 20 plus 16.5. That is about 36 point. So that was ox to be done
5:40
here. So, now we are going for this step number five. So, check the data set for any data
5:45
values that fall outside this interval, that is minus 7.5 below to that, to 36.5 above
5:54
to that, if there is any value existing, then the value will be treated as a, as an outlet
6:00
So, here we are getting this value, that is the value 50 is outside this interval, hence
6:06
it can be considered as one outlet. So, in this way, we have discussed that how to calculate the standard score and how to determine
6:16
the outlets in a set of data in this video. Thanks for watching this
#Computer Education
#Machine Learning & Artificial Intelligence
#Statistics