Welcome back to Mr. Salami's class, in our previous sessions, we learned how to write linear equations. Now we'll be introduced to scatter plots and learn how to use them to find prediction equations.
Chapters:
00:00 Introduction
00:24 What's a bivariate data? a scatter plot?
05:57 Identifying the line of best fit and the correlation
08:27 Application
Show More Show Less View Video Transcript
0:00
the girls from my class so we're going
0:03
to talk about Scatter Plots and lines of
0:06
regression I don't want and this chapter
0:08
is actually pretty easy it's not that
0:11
hard it just looks a little more
0:14
complicated than it actually is all
0:16
right so don't be worried about it so
0:18
basically the first thing we need we
0:19
need to understand is what is it by
0:23
variate data what is what is it and what
0:25
does it mean and why do we have that so
0:29
by definition
0:30
it means that when you have like data
0:33
with two variables right such that such
0:36
as year and then number of visitors for
0:38
example if you have a website I have
0:40
like year one I have like two visitors
0:42
year two I have like 150 and then by5
0:46
like 3,000 this is called it by VAR data
0:50
because you put that on the scatter plot
0:52
right you trying to figure out so when
0:54
you create it it's going to show you
0:55
some kind of a pattern with dots yeah
0:59
right I give
1:01
you so does that make sense so we use
1:04
this kind of stuff in uh people that do
1:08
let's say insurance companies they
1:10
always use like scar BL because you want
1:13
to study your population and want you
1:16
say that population excuse me what is it
1:19
are you coming in or out what is it you
1:22
can't be here what
1:23
are oh you can't I don't have for you
1:26
sorry you guys thank you bye
1:32
I don't have room for anybody else I
1:34
have I'm full capacity what's the word
1:38
before then
1:42
no
1:46
yeah data like I said is a data such
1:50
that you have like a year number of
1:52
visitors and when you put down a set of
1:55
on a coordinate plane the coordinate
1:56
plane is a line like this right we have
1:59
POS in this case we have mostly positive
2:02
right this is called byar because you
2:03
get a SC PL so therefore it's formed by
2:07
using byar data all right now why do we
2:10
have this we have this to make
2:12
predictions you know what a prediction
2:13
is
2:15
right if guess if you it's a guess but
2:18
it's an educated guest it's not just a
2:21
guess that's done in a vacuum I don't
2:23
want to just make like prediction based
2:26
on whatever I feel like no I said before
2:29
numbers don't what lie numbers don't lie
2:33
right I don't have room sorry yeah right
2:36
numbers don't lie numbers don't lie so
2:39
because numbers don't lie you want to
2:41
create a set of data and when you create
2:43
it you're going to create some kind of a
2:45
line This is called a line of best fit
2:47
and this line of best fit is going to
2:49
help you to make predictions on
2:51
something right and I will explain a
2:54
little more so when you when you find a
2:57
line that closely approximates the data
3:00
right if you see here what do you notice
3:02
about this data here what do you see yes
3:05
sir it kind of CR around line it C it
3:08
CRS on the line right the pattern is
3:10
specific it CRS on this line you can see
3:13
that right it's a specific pattern it's
3:15
not just all over the places right and
3:17
it's going down does that make sense
3:20
this is going down what about
3:22
here it's going up and again he cross on
3:25
what the line just like he's mentioned
3:28
what about here um there's no line it's
3:31
just all in random places random places
3:34
that's the keyw is random there is no
3:38
pattern really everything is just random
3:39
it's like here here there is no specific
3:42
patterns right for example let's try to
3:45
see our soccer team so if you want to
3:48
study a pattern on our soccer team no no
3:51
I'm just saying like if we look
3:53
throughout the year right Christian
3:55
School like um what's the like if you go
3:59
from 2001 to like
4:02
2024 as far as wings and so it's going
4:05
to be specific right so every time you
4:09
go you know
4:13
like
4:15
out so where yall
4:21
going2 you know what I'm saying
4:28
like let me
4:30
what does the n mean VAR data right a
4:33
data with two variables such as year a
4:37
number of visitors are called by VAR
4:40
data when the set up by VAR data are
4:42
plot on a coordinate
4:45
plane then they refer to as what it's
4:47
referred to as a scatter plug right I
4:50
mean I wasn't trying to laugh I'm
4:51
serious man you know what I'm saying so
4:54
now here such we have this set of data
4:57
here right now it forms a spefic par is
5:00
all over the places okay and then here
5:03
this is now we're going to talk about
5:05
correlation later right correlation
5:07
correlation now here the correlation
5:09
will be strong but it's negative because
5:11
everything is going down here the
5:13
correlation is positive because it's
5:15
going up and in here there is no
5:16
correlation it's basically like zero
5:18
we're going to talk about it shortly but
5:20
right now we're just understanding this
5:23
now when you have like a specific
5:26
pattern you want to capture that data
5:28
you use what you call line of best fit
5:31
right a line of best fit is what it
5:33
approximately captures the entire data
5:36
it's a good representative for the data
5:38
does that make sense that line is used
5:41
to capture most of your data does that
5:44
make sense you getting it yes or no not
5:47
really sure all right so if I have a set
5:51
of data like this
5:57
right this is what I have right
6:00
what kind of PN you see here it forms
6:03
what kind of PN line a line right now if
6:07
you have that line you want to draw a
6:08
line here to kind of capture most of
6:11
your data now why is it important
6:13
because you're going to use this line to
6:14
make
6:15
predictions right because you're trying
6:17
to study a pattern and then that line is
6:19
going to help you make that
6:22
prediction so basically let's say you in
6:24
Florida right you're in Florida if you
6:26
go to Florida now you are an insurance
6:29
company
6:30
right so you're going to see that uh the
6:33
more you um the more people move
6:36
there I don't know what PIR you can I
6:38
mean come up with any kind of scenario
6:40
right you want to see if you're an
6:42
insurance company would you want to
6:44
charge a lot for uh your insurance
6:47
because you in Florida yes because
6:49
there's a lot of what hurricanes and
6:51
stuff you see what I'm trying to say so
6:53
therefore that line is going to help you
6:55
say wait I've seen that in Florida every
6:58
year there has been an increase in
6:59
hurricanes right so therefore I know
7:02
that based on this data in 2027 we're
7:06
going to have maybe like this many
7:07
hurricane so because of that I need to
7:09
start charging more for my insurance
7:10
because this line is giving me some kind
7:13
of an accuracy of what's what's
7:15
happening does that make sense so the
7:17
line of best feed is used for that right
7:20
because you you want to have a set of
7:21
data and then you see what pattern is
7:24
being formulated and then you use that
7:27
to make a prediction
7:29
because this is what people do in stocks
7:31
where they make prediction they say okay
7:33
we seen some Trends here what's the
7:35
trend the data is giving you some kind
7:37
of a trend that line is going to help
7:39
you to give you an actual equation so
7:41
that you can use that equation to solve
7:45
the problem right now I'll give you an
7:47
example and that will make all the sense
7:49
in the world so basically here's what we
7:52
have here right so the table shows the
7:56
US household with internet
8:00
access right so in 1997 what's the
8:04
percentage 18 18 in and 2000 41.5 and
8:10
then 221 50.4 and then 2023 54.7 and
8:15
then 2007 61.7 right so now here's what
8:18
we're going to do we're going to build a
8:21
scatter plot with the information right
8:24
so the first question she says what make
8:26
a scatter plot and a line of best Feit
8:29
and the describe the
8:30
correlation let X be the number of years
8:33
since
8:34
1995 now there's something called
8:36
correlation correlation is R right the
8:39
letter that we use for correlation is r
8:41
and this is very
8:43
important R right now if R is
8:48
one the closer R is to
8:52
one the stronger the
8:55
correlation does that make sense the
8:58
closer r is to one the stronger the
9:00
correlation and is positive for example
9:03
here you see you're going to say that R
9:07
is right here maybe that's say
9:09
0.99999 because that's a strong
9:11
correlation it's a strong correlation
9:12
meaning the data is
9:14
connected does that make sense the data
9:17
is connected and because it's one is
9:19
close to one it's a positive
9:22
correlation does that make sense now
9:25
here what kind of correlation do we have
9:29
negative and it's close to what netive 1
9:31
so maybe the correlation here is like
9:34
0.999 because there's a strong
9:36
correction between the da does that make
9:38
sense here what's the correlation
9:41
like no inin the correlation is always
9:44
between 1 and one it can be more than
9:47
that your correlation R is always
9:49
between one and negative 1 it can never
9:51
be above one it can never be less than
9:55
uh more than 1 less than1 okay your
9:58
correlation is always here the stronger
10:00
is to negative one you have a negative
10:02
but strong correlation and it's closer
10:04
to one is a positive correlation but
10:06
when it is Clos to zero there is no
10:09
correlation correlation right so here we
10:12
have a zero correlation because there is
10:16
no connection between the dot they are
10:20
all over the places so therefore there's
10:22
no relationship so R is probably close
10:24
to zero that means you don't really have
10:26
a strong
10:27
correlation does that mean makes sense
10:30
right so this is how you use it this
10:32
they do that all the time in like uh
10:35
statistics yes is the correlation guess
10:37
too like you just guess for that are we
10:40
I guessed it yeah because when the data
10:42
is really like let let me give you some
10:44
some scenarios and you guys tell me what
10:45
kind of correlation do you think this
10:50
is so what kind of correlation do you
10:52
think is it going to be positive or
10:53
negative number one positive why do you
10:56
think the the the coefficient is going
10:58
to be close to one
10:59
or like R 0.6
11:02
0.8 0.5 0.5 maybe 0.6 right all right
11:07
what about
11:15
here now what kind of correlation is
11:18
this positive what do you think the
11:21
coefficient is going to be like one
11:22
close to 1 0.9 right because it's really
11:25
strong you see the closer the data is
11:27
together the stronger the correlation
11:30
right what
11:34
here what kind of
11:36
correlation n so be zero there's nothing
11:40
really going on here there is no
11:43
connection right what about
11:50
here what would you say I say negative
11:54
what do youall
11:57
think it's negative out negative what
12:01
outliers right negative what for
12:06
example seven right because two of them
12:09
are standing out everything else is here
12:11
what these two things they call these
12:13
outliers right you know what an outlier
12:15
is it's something that's not part of the
12:18
normal right so for example your ages
12:22
here if I were to guess 16 17 I'm like
12:25
above that way so I'm an out Li because
12:27
I don't belong as I don't belong on the
12:30
benches on your six right because I'm an
12:32
outlier because most people what be the
12:34
correlation in your age would it be
12:35
positive or
12:37
negative positive because you guys are
12:39
all around the same age right and you
12:41
getting what you you going you going to
12:43
keep getting older you're not getting
12:44
younger you know like you getting older
12:47
so all your and you going to be going
12:48
around the same age right so it's going
12:51
to be positive does that make sense J
12:53
yes get that all right so now that we
12:55
understand what a correlation is now
12:56
let's try to build a scatter PL where's
12:58
my
13:03
is why that
13:12
e all right let's try to buiness right
13:15
so let's pay attention it says what make
13:19
a scatter PL and a line of F and
13:21
describe a correlation let X be the
13:23
number of years since 1995 so X is going
13:26
to be here or here
13:29
here right X is here and and year will
13:33
be right here right X is number of years
13:36
and percent will be here P will be here
13:38
so it's number of years since 1995 s
13:43
1995 the chart up there has yeah but I
13:46
want to I want to start with the the
13:48
problem is telling me since
13:50
1995 so that means this one will be what
13:54
how many years is that since 1995 two
13:57
right two years this one will be five
14:01
this one will be this eight and then
14:05
this 12 12 12 right so we going to have
14:09
let's let's
14:11
build which which one is 13 never mind
14:15
no she was saying between 2000 and 2001
14:17
you put five and six
14:22
oh it took
14:24
estimate that's right right so two is
14:27
right here and then that's do percentage
14:29
right we going up to 60 so let's go 10
14:34
20 30 40 50 60 and 770 all right so here
14:41
2 and 18 so somewhere around here you
14:43
guys agree with me yeah what about uh 5
14:47
and 41.5 5 is here 41 somewhere right
14:50
here
14:52
right are you
14:55
agreeing 6 and
14:57
2001 so so
14:59
2001 uh 6 and then 50.4 so 50 somewhere
15:04
50.4 somewhere here
15:06
right and then uh next is 8 and 203 and
15:10
is 54.7 maybe somewhere around
15:13
here and then uh
15:16
12 and 61.7 so somewhere around here
15:21
right so the line of best feed if I was
15:23
to draw a line right I want to capture
15:26
most of the data right that line
15:29
probably go here like this right that
15:31
would be a good line to draw because you
15:33
want to capture most of the data does
15:35
that make sense
15:37
now they telling us use two ordered
15:40
pairs to write a prediction equation you
15:43
can use any one of them right so if I
15:46
want to use two order pairs do you think
15:48
it would be wise for me to use this one
15:50
probably not what about these two right
15:52
here this one and this one right that'll
15:55
be good right so it be two and what two
15:58
and
16:00
18 right and then what is this one
16:04
that's 8 and 54.7
16:07
right are we follow are we following so
16:09
far are we good yes or no yes who's not
16:13
getting it what's
16:15
corner so if it's if there are any
16:19
outliers it doesn't affect if it's
16:21
positive or negative it's just the
16:22
majority yeah the majority this is not
16:25
really an outlier it's not too far from
16:27
the line right outl will be that soone
16:29
way here this is really still close to
16:31
it and then granted my stuff is not very
16:34
accurate right so I want to use this guy
16:37
and that that guy to find my equation so
16:40
now we need to find what we need to find
16:42
our slope this is X1 y1 and X2 Y2 right
16:48
so somebody help me what's the slope M
16:50
will be what
16:52
54.7 - 18 all right over 8 -
16:56
2 that is West 54 .7 -
17:04
18 like 34.7 all right 34.7 over 6 36.7
17:11
or
17:12
36.7 so what's 36.7 over
17:16
6
17:18
6 are you doing in your head 6
17:22
I6
17:25
6.16 116 like this uh round
17:29
7 seven all right thank you now we
17:33
have y = 6.11 7 x + B now we need to
17:39
find B right are we following so far
17:42
right so I I can use this this one to
17:44
solve for b right so I'm going to
17:46
replace X by 2 and then y by 18 right so
17:50
I'm going to go 18 is equal to
17:53
6.17 * 2 + B right somebody help me find
17:58
B
18:04
this is sorry 6147 * 2 plus b we Sol for
18:15
B help me I don't have my calculator
18:18
where is it my back Co can you help
18:22
me7 all right B is 5.
18:25
767 yes all right so B is five so now
18:28
equation is complete right is y =
18:33
6117 x +
18:35
5.76 7 right now we have the prediction
18:40
equation right so now let's say you you
18:44
want to guess how many people are going
18:46
to have internets in the house in 2028
18:49
what would you
18:53
do
18:55
202 right 1995
18:59
right we have to supply 1995 from 2028
19:02
first right
19:04
why because we need to know the find why
19:06
the number of years since what since
19:09
1995 is everybody on the same same uh
19:13
page right so we want to make a
19:17
prediction so why we making this
19:19
prediction because maybe I'm a
19:20
businessman I want to see if there's
19:23
room for me to make money still in 2028
19:26
right because it's compet competitive
19:27
right so in 28 I want to find out how
19:30
many people are going to have internet
19:32
so I'm going to do 2028 - 1995 to find
19:35
the number of years first right what's
19:38
2028 -
19:40
1995 so that's 33 years right and then
19:43
I'm going to plug it in where I'm going
19:45
to plug it in here so we're going to do
19:48
6117 * 33 + 50 5767 so do that and
19:55
see what you get
20:03
and then add
20:07
it7
20:08
207 33 *
20:13
617 yeah I got 207. 628 also oo okay
20:20
207 628 so about
20:23
207% that's a that's a pretty gigantic
20:26
number I me all right anyway
20:29
well that's what numbers don't lie so
20:31
okay I guess that's what it
20:32
is right because yeah pretty much
20:35
everybody's going have a plus more so
20:37
sometimes you have what you call an
20:40
exaggeration because once 100% is H is
20:43
100% everybody has it right this is
20:45
pretty much sometime prediction can can
20:47
do that they can give you numbers that
20:49
are super fluous if that word exist
20:52
that's a French are translators I don't
20:53
know that's super means like numbers
20:56
that are like outside of the norm right
21:00
so I don't know anyway so I want to stop
21:04
here super fluous super superu
21:08
yeah so I want to stop here cuz tomorrow
21:11
then I Monday we can do lines for
21:13
agression because you guys wait what are
21:15
we doing
21:16
tomorrow I'm not going to be here tomor
21:19
where you going my dad's okay all right
21:22
so let's let me start
#Teaching & Classroom Resources
#Statistics
#Sport Scores & Statistics

