Machine Learning - Data Description - Measures of Central Tendency: Mean, Median and Mode
9K views
Oct 17, 2024
Machine Learning - Data Description - Measures of Central Tendency: Mean, Median and Mode https://www.tutorialspoint.com/market/index.asp Get Extra 10% OFF on all courses, Ebooks, and prime packs, USE CODE: YOUTUBE10
View Video Transcript
0:00
Now, we shall discuss measure of central tendency mean median mode
0:07
So, a statistic is a characteristic or measure obtained by using data values from a sample
0:15
So, always remember, statistic will be a characteristic or measure for a sample
0:21
And a parameter is a characteristic or measure obtained by using all the data values from
0:28
a specific population. So, statistic will be related with the sample and parameter will be related with the population
0:36
At first we are going to discuss mean. So the mean is the sum of values divided by the total number of values and the symbol
0:45
will be represented as this. So whenever we are trying to calculate the mean, so the symbol represents the sample
0:52
mean is this one. So that is x bar is equal to x1 plus x2 up to x n, then by n, and then by n, and the symbol
1:00
and that can be also written as sigma x by n, where n represents the total number of values in the sample
1:07
So always remember whenever we are trying to calculate the mean for a sample, it will be expressed in the form of x bar
1:14
So now in case of population, the mean will be calculated, and in that case, it will be expressed in the form of mu
1:21
So here we are having this x1 plus x2 up to xN, the addition will be done by capital n, here it was small n
1:29
where small n was the size of the sample and here it is capital n so that is sigma x by
1:35
capital n for a population the greek letter mu is used for the mean where capital n represents
1:42
the total number of values in the population so now let us suppose the procedure for finding
1:50
the mean for group data is given below so here we are having the respective classes so these are the lower class boundary this is the upper class boundary so these are the lower class boundaries and these are the upper class boundaries And that is the respective frequency
2:05
here we are having. So now what you shall do, we shall calculate the midpoint. So how do
2:10
to calculate the midpoint? That is a lower class boundary plus upper class boundary whole by two
2:15
So here you have calculated the midpoint and this is the respective frequency. So B and
2:20
the columns are to be multiplied to get the column D that is f into xM so in this way
2:27
you are getting the sum of that that is 490 and then 490 by the sum of the
2:33
frequencies so here the sum of the frequency is 20 and there is 24.5 miles will
2:38
be the respective mean for this given data so whenever we are having our data in
2:45
the form of divided or in case of classified in multiple different classes
2:49
then how to calculate the mean we have shown that one. Next one we are going to discuss the weighted mean
2:58
So find the weightage mean for weighted mean for a variable capital X by multiplying each
3:03
value of its corresponding weight and dividing the sum of the products by the sum of the weights
3:11
So here you can find that here we are having multiple values x1, x2 dot a dot x n and the respective
3:16
weight values are w1, w2, dot, dot, up, and the respective weight values are w1, w2, dot, up, and the respective weight values to w n. So here we shall go for the sum of the product terms. So w1 x1 plus w2 x2 plus dot-da-dot
3:27
In this way we'll be continuing up to w n x n and the full summation, the full total
3:33
will be replaced or will be divided by this sum of the all weight edges. So in this way the
3:39
expression will be sigma w xx by sigma w So where w2 dot dot up to w n are the weights and x1 x2 dot dot x n are the respective values Now let us go for one example here So English composition 1 introduction to psychology biology biology 1 and physical education
4:02
So here we are having four different courses and the respective credits and the weightage values
4:08
are given here. And here we are having the respective grades, that means the A, C, B and D
4:14
So, we have written the respective points for which this grade has been obtained by a student
4:20
So in that case, if you want to calculate the respective weighted mean, then it will be calculated
4:24
using that formula. So whatever you have derived. So that is a 3 into 4 plus 3 into 2 in this way and this is the sum of the weightages
4:32
and we are getting the average as 2.7. So the grade point average is 2.7
4:39
Now we are going for the median calculation. The median is the midpoint of the data array
4:45
The symbol for the median is capital M, capital D, MD. So let us go for one example
4:52
The number of cloudy days for top 10 cloudiest cities is shown below
4:57
So find out the median one. So median is the midpoint of the data array
5:03
There is a symbol for the median will be MD. So we have discussed that one
5:07
So what will happen here we are having the set of say 10 such values we are having
5:12
first we can arrange those values we can arrange those values and in that case as we are having
5:17
10 number of data and 10 is the even number so i cannot get the middle most one but let us suppose
5:23
the last value is missing and we are having nine data here then obviously the fifth value
5:29
after arranging them in the shorted form in the ascending order and in that case we are having
5:35
the fifth value will be the median in that case so if this 240 is not present if we're having 9
5:42
values here then 1 2 3 4 5 so the 5th data will be the median but here the case is not like that here we having 10 such data so obviously for 10 data as 10 is even I
5:54
not going to find out the middle most value in that case I shall consider these two
5:58
values and then I shall calculate the median considering their average so median
6:04
that is MD will be equal to 213 plus 2 223 hold by 2 that is 218 hence the median is
6:12
is two and eight days. Now here we are going to calculate the mode. The value that occurs
6:20
most often in a data set is called the mode. So, let us go for one example. Find the
6:26
mode of the signing bonuses of 8 nFL. So players for a specific year. So the bonuses in
6:36
millions of dollars are given here. So if you judge this particular value, you might be
6:41
getting the value 10 which is having the frequency highest that is 10 has occurred for�
6:47
so solution it is helpful to arrange the data in the order although it is not necessary in
6:54
case of mode calculation so now what will happen if we go on ordering this data you can find
6:59
that the same data will be coming adjacently so here you can find that 10 million occurred
7:05
three times a frequency larger than any other number so the mode is dollar 10
7:11
millions in this particular case. So, sometimes it may happen that all the data are having
7:17
the occurrence one only. In that case, I can say there is no mode available in that data set
7:23
Sometimes it may happen that two data are having the same highest frequencies. In that case
7:29
we can call that this particular data set is bimodal. So, in our video, we have discussed
7:35
how to calculate the mean for sample and population, mode, and also the median
7:41
Thanks for watching this video
#Computer Science
#Machine Learning & Artificial Intelligence
#Statistics