0:00
Hi, good morning, good afternoon and good evening everyone
0:28
depending on where you join us from today. Today we are going to talk about a new programming
0:34
language and for that we want to welcome Sine. Hi Sine, how are you? Are you excited
0:40
Yeah, I'm very excited and fine. Thank you. Great to hear that. Where are you located Sine
0:48
Yeah, I'm in Copenhagen in Wernlöse, which means something like waterloos. Okay. That's an exciting fact
0:58
Yeah, yeah. It's a very little suburb out of Copenhagen, which is not so exciting, but really nice
1:08
No, but I didn't know that it means that. So that's at least something new to hear about
1:15
But we will hear about a lot more new information. And before we get there
1:20
there. I also heard something about that you are an author of a book. Could you tell a bit about
1:27
this book, please? Yes, and now I haven't got it here with me, no, but it's, I'll find a picture
1:34
It's a book called, in English, it's called like a woman's code book, and it's not because it's only
1:41
for women, but it's more because I noticed that there are so many of the programming books we
1:46
have already, which are very much on cars and airplanes and so on
1:51
So I thought we should try to broaden out the palette a bit. So I tried to make a woman's, dangerous to say woman's, but to have it in more colors
2:07
than it is so far, so to say. So I have both some introduction to computational thinking and what is it really
2:14
and I'll use a bit of it in my presentation. And then also some, which I actually really enjoyed making
2:22
was to make some biographies on historical programming women. And there are quite a lot, I figured out
2:29
Okay. Yeah. So there's like, yeah, yeah. Like the pre-inventor of Kobold was, for example
2:35
a woman called Grace Hopper, who also invented the term computer bug, for example
2:41
So there are a lot of women in history. And I just thought that was interesting because, you know, I had this idea that we are very few even who are here in computer science
2:56
But like 50 years ago, it was a woman thing. And I thought that was actually interesting
3:01
That's true. Yes, and that's very true. Okay, that's really interesting. So where can people find this book if they want to read it
3:09
Oh, for example, Saxo. Okay. Yeah, it's like a Danish e-book store
3:18
It's in Danish, so I think if you don't speak Danish, it doesn't give a sense
3:23
I mean, it's never too late to learn a new language, right? Yeah, that's true
3:27
Yeah, you can both learn to program and a new language. Exactly
3:31
Yeah, but I have no programming languages as such there. I have only block programming and then actually a bit of Excel
3:38
because I thought many people, they are maybe know how to do something in Excel
3:43
And then, you know, I try to translate it into what is it like in your program
3:50
Okay, that's really interesting. And now that we talked a bit about it finally
3:55
that, okay, we can anytime start learning a new language. Today, we're going to actually start learning a new language
4:01
And before we get to that, we would like to give you a small introduction to AI42
4:06
And then we switch back to Sine and she'll get going about our programming language
4:13
So see you later, Sine. Hi, and welcome back, everyone
4:30
So the motivation for starting AI42 comes from the recognition that there is no good starting material
4:39
And we aim to take you all the way from complete beginner to getting enough knowledge to build your own model
4:48
So Håkan, would you like to say a bit of the details as well
4:53
Yeah, sure. Thank you, Yves. So what we would like to do here is we want to like to give everyone a chance to get into this
5:00
interesting field. So what we're doing is we have invited industry experts and recognized speakers
5:05
So they will have sessions with us two times a month on Wednesdays at five o'clock Central
5:12
European time. And then we will start out, we have started out here with mathematics and statistics
5:17
and probability theory to give you a solid ground to stand on. And then after that we've dived more
5:23
into languages and tools. So for example, in our previous sessions, we have talked about SQL
5:30
and we talked about Python, and now we will have two sessions on R. And after that, we will look
5:35
more into tools like Power BI and Databricks. Now we'll go into more details like how can you set up
5:41
your own machine learning pipeline, and maybe more advanced topics like reinforcement learning
5:47
and deep learning. And in addition to these theoretical sessions, we will also have
5:53
more practical workshops where you can put this theory into practice. Yes, and you will be able to connect with all the best in class experts from all around the world
6:05
that will hold you these lectures and workshops with really rich content
6:10
And you can follow us on Instagram, Twitter and Facebook. You can find our recordings from our previous sessions on YouTube
6:18
and you can get information about our upcoming sessions at Meetup. We also create cross-collaboration with other organizations
6:27
so we can give you the best opportunities to broaden your network in the AI and data science communities
6:35
So we are in close collaboration with Global AI Community and C Sharp Corner
6:42
And hereby, I also want to say thank you for our sponsors, Microsoft and Miles
6:49
Yes, and as Eva said here, we're in close collaboration both for the global AI community and the C Sharp Corner
6:55
So you will be able to see all of our sessions are recorded
6:58
both on our own YouTube channel, but you can also find them here on the global AI community channel
7:04
and also on the C Sharp Corner channel. And we would also like to thank Mina Marie
7:08
who has composed and performed our intro music. And all of the graphic design that you can see here
7:13
throughout the stream is developed by Levente Pongor. So thank you so much
7:18
Thank you so much to him. And before we switch back to Sina we want to say a great great great thank you to all you guys for joining us today again Please remember to follow us on Twitter or Facebook or wherever you want And remember to sign up for our meetup group as well
7:38
so you get information about our upcoming sessions as well. So shall we get back to Sina
7:57
And such a great initiative you are having. Yeah. All right. It's nice to see you back
8:06
Yeah. Should I just start? Yeah, you can. Before starting, I wanted to have a little bit of a cheat sheet about what are we going to talk about, actually
8:17
Yeah. So what are we going to hear today? Okay, yeah, I actually have it in my slides
8:23
but I can just briefly tell it, then it will be more lively probably too
8:27
So we are going to have a very introductory introduction to R
8:31
and a bit about some of the basic coding programming concepts that we use in other programming languages as well
8:40
So those of you who are maybe already experienced programmers and so on, you can maybe lean back
8:48
and maybe get some inspiration on how you can tell it to others
8:53
Or maybe you can just, yeah, whatever. They can always follow you, right
9:02
Yeah, yeah, yeah. So today will be very basic on all the variables
9:09
and what is called variables, vectors, and data frames and what we can do with these concepts
9:17
and how we can, when we are working in R. And also a bit, of course
9:22
on why we are choosing R now on them. Yeah, instead of Python, for example
9:28
Yeah. Yeah, well, thank you. And some practical information is going to be shared on the chat during the session
9:35
For example, there will be some information shared about setting up your own R studio as well
9:40
so you can work together with Sina during the session and write your own code in R
9:45
And, sorry, on the other hand, please remember to post your questions in the chat so we can answer them throughout the session and
9:53
in the end as well. So let's get going. Are you ready, Sina? I am. Take it away
9:59
So Sina, before we start here, you could just share your screen. Yes, I'll do that
10:08
I'll do it like this. And you can see my screen? Yeah, one more. We can't. Not yet here. Maybe try
10:15
to share your yeah wait a minute again yeah i do it now i just thought yeah yeah we're just gonna
10:23
have a little bit of music here before we start here okay
10:41
And also thanks for the interesting introduction and really feel free to ask questions
10:46
and also give comments because I'm curious on how this will actually work today
10:53
And I'm going to do a follow-up in a couple of weeks
10:57
So, therefore, I can, I mean, good input for improving is really nice
11:03
So, as we talked about, I'm the author of Clinic and Include
11:08
but I'm also a researcher at CBS where I'm doing research in data science
11:13
and citizen data science. And also, I actually have a PhD in AI that's like 10 years ago
11:21
but it's the kind of AI I worked with in that time, it's all symbolic
11:27
AI which is more on logic and rules and so on and I worked with for example something
11:35
called description logic but I'm more data driven now so I work much more
11:42
within the area of the data science I've been working with genetics and
11:47
genetic data and And that was a really huge amount of data
11:54
And then I've been working with educational data the last four, five years, four years
12:01
So that's my area. And I've also been working in a publishing company
12:07
We made tech books. So that's also why I got inspiration of actually writing a book myself
12:14
And I'm sorry that I have a full presentation mode, but it makes it much easier to switch back and forth from my code
12:21
and to my presentation here. So why should you use R? So I have to start with a small sale speech here
12:34
So it's particularly for statisticians. It's great. It has many functionalities. And it's open source
12:45
and it then has a lot of more functions than, for example, Excel does and so on
12:52
If we look more into why should you, for example, choose R instead of Python
12:57
usually maybe when you're doing deep learning and so on, you should use Python
13:03
but it has some very elegant functions and so on, particularly within the stat area
13:11
where you can write a lot of things using very little amount of space
13:20
Some people say that the first learning steps are easier than in Python
13:24
I don't know if that's true, but I'm just saying there isn't. And you can also do some amazing visualizations
13:32
of all kinds within R. And there are also possibilities of making dashboards
13:39
and more automatically generated files and so on. And last thing that's actually also a really good reason
13:54
for working with R is that there is this environment, RStudio, where there is both an editor and a lot of other things
14:03
And I'm going to show you that in a minute. So here is the plan
14:15
As I said, today we'll look a bit about input file formats or reading in files and so on
14:23
It's one of the things that are challenging for many new programmers actually
14:30
I once had a course where I should be teaching R, but it took maybe, I mean, it took most of the first part
14:39
to actually read in files because there were a lot of different ways to do it
14:44
So I have tried to make one food proof way of doing it now
14:50
Always dangerous to say that. So we also trying to calculate a bit some arithmetics in R We are going to look at the data frames and also a few functions And then next time we go more into functions
15:08
and modeling, actually doing some linear models and so on to prepare for the third time
15:16
which I'm not teaching, but it's my old colleague actually, it will be teaching that one on machine learning
15:22
and so on using R. and I'll also go a bit into how you can visualize in R using ggplot2 which is a package that you can
15:32
use. All right so as I said here to get started you can watch the two videos that we have
15:40
applied. I'll just show you very briefly what I did in these videos
15:45
but that's just something you can return to afterwards if you are very new to programming
15:54
I also have some resources here for you that you can use there are both many good cheat sheets
16:00
there is a very good book called R for Data Science and then there is a Danish site
16:06
for those of you who are Danes which is called R Guide
16:12
and it's by Eric Garner So, here is a bit of the resources
16:20
And I know now where we are online anyway, I know there are people from outside Denmark joining
16:27
and I think that's hilarious. Okay. Yeah. So, maybe I should just
16:36
now I have opened the ball talking about RStudio. So, I will start here
16:41
just showing you the interface. You'll probably just make a new script here
16:49
When you open R Studio first time, you'll probably look a bit like this
16:59
So there will be no, your environment will be empty. Your console will be empty
17:08
and you'll have no, maybe you will not even have a script
17:12
but then you can just write your file, then press our script
17:19
So, and in here you can write your code. So I have written some already here up
17:26
some very basic code, which I hope is big enough for you to see
17:32
I have tried to zoom in a bit. And there is a few things you can do
17:37
when you are going to have things running, you are putting your cursor here just on the line
17:43
and then you press the Control, Enter. And then you have, you can see you have an X with a value tree
17:50
So we have put tree into the variable X. So that's some very basic things that is like a pre
18:07
something that sometimes take a while before you have done that. One thing you could also see here is
18:14
that it's not only here, you can see that X is now tree. So it's kind of stored in here
18:19
So you have your data, whether it's a variable or it's a table or whatever
18:25
you have it in here where you can see it. Also, you have it here in your console
18:31
where you have put an x equals tree. And you can also, if you want to
18:41
you can also take this whole data frame, sorry, this whole chunk of code
18:49
and then run it by pressing Ctrl-Enter. And then you have both a vector called V
18:56
and a data frame here called DF. we can look at this by just pressing over here
19:04
and we can see here what this data frame looks like. Great
19:10
So I'd like to say a little bit about these things. Now you've just seen me fring around with code
19:20
But what I've tried to talk about is that there are both data structures like a vector or an atomic variable or whatever
19:32
Then there are also data formats. And I think data format types are like different ingredients
19:42
For example, when you're baking muffins, it's important that you blend the wet things together and the dry things together
19:50
So it's like if you try to take a text string and add two
19:54
you'll get some troubles here, right? Whereas if you take three and add two to this
20:00
then you get five. And that's really nice, right? So it's important that you are aware
20:07
of which data formats you have. And that's also what can sometimes be a bit annoying in many programming languages
20:16
If you have the wrong data type and you try to do something with it
20:22
then it maybe don't do what you want it to do. Then finally, there is something called functions
20:29
If you have already been at the Python course and so on
20:33
of course, you know something about functions, but they are pre-coded stuff that does things
20:41
to what you have been working with. So it can be, for example
20:49
here I have just for fun said like this is a whiskers
20:54
function so when you are I don't know what it's called in English
20:59
but when you're turning your dough when you're trying to mix mixer
21:06
yeah when you try to mix your dough you know there is kind of
21:10
a mixing function that you need and I try to say that's a bit the same here
21:16
when you're working within R, then you have several functions that does things
21:25
It could be like it can make a statistical test or it can transform your data or so on
21:32
So there can be a lot of different ways a function can work
21:40
And R is very function-oriented. So it's also easy in R to actually write your own functions are easy
21:48
I mean, I'll not teach you how to write functions right now
21:52
because there are so many that it's already made, but that is definitely a thing
22:00
All right. So that was just a very high-level introduction of data syntax in R
22:10
So you could think of it a bit like a grammar where we just looked at atomic variables, vectors, and data frames in the coding
22:20
So this is like an atomic vector wherein we have put a tree and this is sorry an atomic variable This is what we call a vector And to actually make a vector it not enough just to have tree five seven nine
22:37
and then you have a vector with four elements, which is these numbers
22:41
You will need to put this little C in front of it
22:46
So that kind of, that's also a function that turns this into a vector
22:55
which is like a list, but just, it's a bit like what you would think of as a list
23:02
but where it will define each, the position and so on of each number
23:09
Then we have data frames and they can be looked more as like a matrix
23:16
And here we have actually just made it from a matrix function
23:19
where we have decided that there should be two rows. And as we see in here in the code, we can look at it
23:33
And certainly, yes, there are two rows. And then there are three columns, which are called x1, x2, x3
23:41
That's not something I have called them. R just gives them the name
23:48
Yes. All right. So there are also different data types, as I said, and it can be structured many ways. It can also be structured at dates and so on
24:08
But in general, there's these three main data types, numeric, characters, and factors
24:15
where numeric, that's numbers. It can be integers, but it can also be doubles and so on
24:23
Characters is like text strings, like this is a character. But then there is a last version, which is factor
24:35
which can be both numbers and numeric. It can be represented by numbers or text or so on
24:42
But it's usually something, for example, when you're, say you work with a survey and you have a tree
24:52
you have a kind of tree things that you can have in a drop-down menu
24:57
You can, are you between 20 and 30, between 30 and 40
25:03
or between 40 and 50 or between 50 and 60 and so on
25:08
So there is these categories and you can say it and then these would be stored
25:15
it would be reasonable to store these as factors. But you can choose and you can also turn them back
25:24
and forth from factors to characters and so on. You can not always turn them into numeric
25:30
because then you'll get an error in your data frame. But you can try, and we'll, of course, try that
25:42
All right. So just a minute to the first question session. So we're going now to calculate things
25:52
and, of course, you can do that, and I'll just error checking here
25:58
So we can do a lot of things. You can change formative variable, as I just said
26:05
We can also add minus. We can multiply and so on using common symbols for that
26:16
And we can take square root. We can take logarithm and potential calculations and so on
26:24
We can lift numbers. So there are a lot of different things you can do
26:32
I'll just show them here, some of the things. So now we have our data frame, our vector, and our atomic variable here
26:47
So up here we have our vector, which is now numeric. You can even see it here
26:55
but we can make it into a character now. And then our numbers are now not numeric anymore
27:03
but they will be treated like text and so on. And for example, if we have this data frame
27:15
we can see here, we can make everything of the data frame numeric
27:23
And then let's see what happens. That's not true. I haven't made everything in the data frame numeric
27:33
I have actually only made column two numeric here. And let's look at this
27:45
What happened? So as you can see here, we look at the data frame
27:52
I have written this function structure, and I'll show you a bit more on that later on in the lecture
27:59
But here you can see that the first column is now a character, the second column is numeric
28:06
and the third column is a character. And that makes sense, you know. The last one here really makes sense
28:14
Yeah. One thing I like to also show you before your own exercises
28:20
then how now we have put things into variables, right? Now we want to get data out again
28:26
And that's actually not that hard. You just simply write it here like X
28:34
We just run it. And here we can see so X is tree
28:39
We can also print X again, get a tree. We can also take the first element in our vector
28:50
Here tree, and you can see now there is, I don't know what these are called, hyphens
28:56
around a tree here, which means that this is now text, no longer a numeric
29:08
Then there are, you can look at, for example, So here you have taken second row, first column
29:24
Yeah, that was correct. Second row, first column. And that's simply because that's how data frames are structured
29:32
And I'll say a bit more on data frames in the end of the lecture
29:36
So just accept that there are these three ways to get the information out of a data frame
29:48
And these ways are actually the same, right? So this one looks very much like the first one, except for one is changed with the name from the label here
29:59
All right. So, maybe I should just show the calculations. For example, here we can make
30:12
a new variable y where we say x times 2 and we know x is 3, right? So we do that and now
30:19
we have y which is 6. We can also take our vector and say vector times 2. Now we have
30:29
the problem that it's as a character, so it won't work. And that's a quite important thing
30:36
like what we talked about before, like you have to have the right data type to be able to do
30:44
the things that you need. So if we did like this as numeric
30:48
We have no longer any errors. You can see that this vector will be made of four numbers
31:02
which are two times the one before, right. So you can do all these things. You can also take
31:11
a rows in a data frame and do stop with. You can take columns in a data frame and do stop with. So
31:16
have a lot of opportunities here and all this all this data is stored in the in the memory so you can
31:24
do so so many of these row and and column operations are really fast to do all right
31:37
so we are now at our question session so so if you have any questions before we go to a small
31:46
exercise that I'd like you to do or we can do it together of course but I think it's good if you
31:52
have your computer with you that you try it out a bit because or else you have just forgotten it
31:58
after this session. Hi Sina, we have questions. Yes. The one is what's the difference between
32:05
your data frame and your matrix? Well so a data frame has a bit more I think it has a bit
32:16
more functionalities and you don't know if you store actually I'm not, I don't know all the
32:23
details on that but usually when you work with data in R
32:28
you store it always as a data frame but you can put it into
32:31
a matrix format which is like a very simple vector format right
32:37
so I actually cannot give a very good answer on that maybe someone
32:43
else knows more on that Isn't it like that data frame format is mostly like a column
32:52
you know, like the column based data sets. So it's like having like a columns and such
32:58
and the matrices are more like a bunch of data just to
33:03
yeah, not necessarily relational like this, but the whole thing. Yeah, and you can make row names and column names
33:13
and stuff like that in their data frames. Yeah. Yes. That was a good question actually
33:22
Yeah, it's, Stephen says that data frame is a list of equal length vectors. Yeah
33:29
Yeah, that was, yeah. Yeah, so our language has several packages for solving a particular problem
33:36
So how to make a decision on which one is the best to use? yeah I think there are many
33:42
things in it and actually there are both what programming language are you good at
33:50
so it's also what do you which programming languages do you prefer to work in and do you have
33:56
a good intuition about and so on so that's one way to decide but there is also
34:04
you can also try to see if you are going to work with it in industry
34:08
for example, so what are the, in that industry, are they using more R, are they using more
34:15
Python or other things? I mean, usually R would be better, would be a better choice than many of
34:22
the commercial tools, but also because it can take all input data formats, as I showed in the video, right so
34:37
and that can also be a good practice sorry that has of course
34:45
nothing to do with this but what is good about R particularly is this that I mentioned
34:50
before that it's very good at visualizations and it's very good at
34:54
when you have like these huge tables you can do a lot of
34:58
operations on the on the columns and the rows and so on
35:04
Whereas in many other languages, you don't store data that way. So it will be a bit slower
35:11
Then on the other hand, there can be other things that are slower in R
35:16
There are some of the deep learning algorithms that are maybe more optimized for Python, for example
35:23
Yeah. Maybe other people have suggestions as well. I would say also that actually it is a good practice as well
35:34
like Stephen also mentions it, that, for example, it often happens that some packages from R is used in Python code
35:42
because R packages are used for something else. Or, for example, especially for very long, the graphical visualizations and whatever were looking much better in R
35:55
than in Python. You could do like 3D stuff and whatever. that was, for example, one of the reasons why we would use R for visualization rather than Python packages. Yeah
36:12
Going back for one moment to this data frame versus matrices, I am not sure if that is right, actually, that data frame is equal length vectors
36:23
because if you think about it, I think matrices are more ruled into one specific schema
36:29
The data frame can have empty columns as well. But yeah, I'm also just thinking of data frames as something where you can name the
36:40
name the columns and name the rows and so on. So you can, so we have more options there. Yeah, but
36:50
But I have to look into this until next time. Yeah. Because there might be a lot of small details that are..
37:00
For sure. Yeah. All right. I give it back to you. Thank you
37:05
So now we should have a small exercise where you in our studio should try to create a single variable and a vector and a data frame
37:19
And also maybe try to change data type of variables just as I have done and do some calculations So if you could just try this and keep just open R while we are I keep on flabbering
37:39
So here try to, for example, put in some data, to atomic variable or vector or a data frame
37:54
and try to do some of these calculations on the variables. So, and this is, of course, much easier when you have some kind of interaction with people
38:12
So, but so I'll not stay too long here. but it's just really nice
38:22
to have tried these things out for yourself. Please let me know if you have any questions
38:27
to some of these practical things. Now I have stuffed this with a bit more code
38:31
than you need to actually do the exercise, right? Sometimes you also can see here if you start to write a code, then you can get it and then
38:53
you can just press on it and then you are sure that it's spelled correctly. Because that's
38:58
thing in programming right that it's so easy to spell things wrong
39:03
One thing I should also say is that you can use this help function over here and I'm just talking while you're trying
39:15
things out. For example, you can look at as numeric and look so how does this work? It's a bit boring right? You have just a
39:25
you have a, just have X as an input and then you can even take how much you can use the length
39:37
to define how much of it should be numeric. So here you can see for example
39:48
some examples. And the build in, something I would also like to say
40:00
about the building help function here is somehow okay. You know, you can see the usage up here
40:08
and you can look at the arguments here and you can get an example here
40:13
but I don't always think the examples are actually really good. So one thing I would do
40:18
is actually to look at Google for this one. And then I'll find maybe an answer at Stack Overflow
40:31
if I want something more advanced. So that's really a good place to go
40:39
to try to Google your question. And sometimes you will find the answer in some R material
40:46
and sometimes you find it on Stack Overflow or other places where other people have asked
40:53
the exact same question as you. And it's amazing how some weird little niche question
41:01
that you think nobody else are interested in this thing. You just write this, for example
41:06
our numeric length, and then you figure out that there are a lot of people
41:11
who have done that. So I hope you have played around a bit now
41:20
I'll go on with the functions. So function is like this mixer function, for example, the dough mixer function we talked
41:35
about before. If you should take it into something real. But here it's an operation that does something to your data normally
41:47
And you can build your own function, and I'll show you today. You can borrow one from the internet
41:52
For example, if you have this little niche question that you want to post quite often
41:58
there are some nice people who have said, oh, I have solved this by writing this function
42:03
And then you can reuse it on your own data. And you can see the code
42:10
So you can also see how the function actually works. But we are not going to look at that today
42:16
because I thought that would be too much. So we have some examples of functions
42:25
So as numeric is a function. The C is a function. ReadExcel is a function
42:33
That's if you're going to read an Excel file, which we'll do in a minute
42:40
So there are different ways here. Also, for example, ttest. A ttest is a function
42:48
I'm going to try here to show some of the arguments. And this one is from the help function
42:53
So if I can just show you. Yeah, we have this written this way
43:06
And it's, except if you're really good in R, it can be a bit impractical to look at this
43:21
And then actually it can be maybe even better to then Google T-test example two-sided or something
43:28
And then you can, or alternative equal this or something. And then you can, then you maybe can get some better
43:38
some better versions of this. And then of course there is still an example
43:49
there are examples in the buttons, but that can sometimes be a bit hard to interpret
44:01
What's going on here? Alright, so do you have more questions by now
44:19
If not, I think it was a very short session, I agree
44:23
Then we'll go on. I have made a small exercise here too, where I try to look at, there is a function
44:32
this t-test function, and you can, for example, write these three lines of code, very simple
44:40
So you're creating some normal distributed, 10 points of normal distributed data
44:49
and put it into x I think it with a mean 0 and a standard deviation 1 And then after this you do a t on these two variables to see if they are equal or not
45:10
And you can just look at our rnum to look at what it is. And yes, I was right. It's you're creating
45:20
some numbers with the mean 0 and standard deviation 1, some random numbers, and you have two different..
45:31
Here you have... So now we have both x and y are vectors
45:41
numeric vectors, as we can see up here, 10 points, and you can see here they are around zero
45:51
You can create a t-test to see what's here. So the t-test things really cannot exclude the possibility that these two are equal
46:09
That's what it means. So if the p-value is below 0.05, usually you say
46:15
then these two vectors are not the same, but well, they seem to be very close to each other
46:24
And you can also see that they should be right because they are, I mean
46:30
maybe if you're doing enough random examples of this
46:39
you'll actually, you'll maybe find now and then that they are differing
46:48
Yeah. So the exercise here is try to add 5 to the x value
46:55
and then try to do a one-sided t-test. So that's the exercise that you should do
47:04
And I think we'll try to do it together now because since we don't have this interaction thing
47:11
where we can chat about it, it's maybe better that we go through it together, right
47:20
So I made a new code here where we are not to confuse you too much
47:29
I have to like this. So now we put add five to all the values in X
47:36
and we make a y here, which is just with a mean of five
47:45
And then maybe we should start with making the common two sets here
47:52
So what can we see here? So now they are definitely different
48:00
So, and then if we are going for the greater, we can see this is also different
48:09
but the p-value is even smaller than the p-value of the two-sided t-test
48:18
So you can see that one-sided t-test with an alternative greater is better
48:27
And I always forget these things. So what does it mean? So I have written it in my slide
48:35
And the other one here, when you look at this, the output here, you can see the p-value is equal one
48:43
but you can also see here, the mean is four and a half of x
48:50
and it's a bit more than zero for y, so it should be different, right
48:58
And the standard deviation, which you cannot see here, is equal one
49:05
All right. So, and I'll just find it here. So, if you say less, then the hypothesis is that y is smaller than y, y is smaller than x, and greater means the other way around
49:29
And the thing is, so you could say, but it is, y is smaller than x
49:35
And that's true. And that's why the p-value becomes so large. Usually we try to formulate it so that we have a hypothesis we can reject
49:47
So that h0 is a hypothesis that we reject. Yeah. All right
49:52
So, but that's the stat and I'll not go too much into it. I know you have already, there have already been some sessions here in AI42
50:00
So if you want to get more into it, you can rewatch these sessions
50:05
All right. So do you have any questions now? I have maybe been playing around with the exercise
50:12
Please let me know. All right
50:25
So, I'll go... Ivan Hogan, are there more questions? Alright, so I'll just go on here
50:39
A bit more data frames. We'll re-watch these. So, here is an example where we take this matrix and put it into a data frame
50:49
So, we need, if we want to read in files, we need a library
50:58
And we also have different tools to inspect these data frames. So, I'll just go through what I have here
51:09
And then we go into the last coding here, where we're looking back in our RStudio
51:19
So, typically a data frame is structured so that we have rows on the first
51:30
so the row number here as the first part of the space within the square brackets
51:39
and then we have the column number afterwards here. Like this. So we can also access data in another way, for example, using this dollar sign
51:58
So here we actually have a vector, which is the column. For example, here we have a vector with column one
52:09
And then we can even, if we want to access a row within this, we can do it like that
52:16
we can put a square bracket afterwards. So that's the two way
52:22
you can actually, there are probably others because sometimes you know many people they have made new functions now But these are the two that are mainly used
52:36
So as I said before, here are three different ways of actually accessing the same
52:43
which is column one. Right. So let's go back and look a bit more into our old friend
52:52
the data frames. So now we need something called a library. And that's actually a package
53:05
We use library to access the package. So library is a function that opens up a package
53:11
And sometimes if you don't have the package already, so I have, then you can go in here and press and install
53:18
and then you can install it, right? Oh no, I'll just It was fast, that was good
53:28
I didn't try that before So sometimes it can take a while to install packages
53:34
So after we have installed it It will be here in our package
53:39
And it's installed on the computer now So the reason for not taking in all the packages in the world
53:45
Is that they will take up quite a lot of space So install the packages when you need them
53:50
Just like you only load in the packages you need. I have a bit, you can also load a lot of packages in
53:59
if you sometimes use this package and sometimes use the others, you can just put them all in at the beginning
54:04
But it's a good habit to only use the libraries you need
54:09
because it will take a lot of space to actually load them in
54:16
But let's see if we can now access it. Oh yes, but now it warns me, sorry, isn't it Danish
54:23
And it warns me that this package is maybe a bit too new for my R version
54:29
So we are very, very curious if this will go on. Right
54:36
So, but we try to put in, we have our data frame and we can, for example
54:45
so what I have done here, we're looking into it. if something is a factor
54:54
then you'll need to put it into a character first and then as numeric
55:01
I think there are some new packages that does it smarter, but this is just something you can always do
55:06
with the base R installation, which is like the common libraries that are always in R
55:12
whenever you have, whether you have downloaded other things or not, there is a library of face functions
55:21
So now I have only run this one because I think it's fun
55:26
Now we can look in the different types here. We can look at this function view data frame
55:35
then we can see it here. And we can see it looks like these two have different
55:42
data types somehow, right? but we don't know. And we can see over here, hello world
55:50
that's definitely, that looks like text at least. Then we can use this one
55:54
that gives us a bit more information. Here we have the structure of the data frame
56:00
Here we can see that X1 is numeric, X2 is a character and X3 is a character
56:07
So X3 is a character, that's just great. But that X2 is a character
56:14
That's a bit stupid. So we don't even need to press it into a character
56:18
because it's already a character. And we can do like this. I just ran the code
56:25
and then I'm running this structure code again. And now we can see it makes sense now
56:30
So there is a numeric and then there is a character here in row three
56:36
Then there is a, I mean, there are more ways to inspect data
56:42
but you can also, for example, look at the summary. And we can see that the first column here
56:55
with two values, it takes an average so that you can get the mean and you get the median
57:01
and the max and the minimum. That can be really nice to have this overview
57:05
If you have a data frame, for example, of a survey with Likert scale questions
57:10
that can be nice and a lot of other good reasons for making this summary
57:21
Then we have some new friends here. And so I just have to see where am I
57:30
Oh, I'm the right place, that was good. So now I can just do like this and I read in my files
57:37
files. But this is what always goes wrong because we need to be in the right directory
57:45
working directory, to do that. So one thing that will always work if we put here
57:52
it is to import data set. And the first data set here we need is a CSV file
57:59
and that's equal to a text file. And it's this one. As far as I remember
58:11
And here we can see down here, this is the input file
58:15
This is what it looks like in a data frame, right? Where we have names of the columns and so on
58:23
We can also, we can see here that it understands that the separator is a comma
58:28
but we can change it if we think. But it's wrong. The separator is really white space
58:34
You can see now it looks weird. So we go back to comma
58:40
And there are a lot of other things you can set here. Row names
58:44
You can also make a use first column as row name. Oh, it won't allow us to do that
58:52
Maybe use numbers. It hasn't. But we'll try this. and see what happens
59:00
Also, we say if there are some strings that are empty, it will put in an as instead
59:08
And we can press this on, string as factors. Then all the strings we put in are factors
59:14
but we can also just have them in as characters. Now the strings will be loaded as characters
59:23
for example, region and country codes and so on. disease. So this is data from WUHO. There is no corona data, Covid-19 data in it. It's only
59:36
measles and other kinds of diseases. Right. So we now import it. Interesting. Yeah. So
59:48
So, and what have been done here is we have used this code and then just for the next
59:58
time we're going to use it