Get Historical Crypto Price Data Using Pandas and Python (Free)

0:00
hello world it's leo and i'm back with
0:02
video number two in the first video we
0:04
created a crypto price share mean
0:06
reversion system if you haven't seen
0:08
that video i'll put a link or a bubble
0:10
right here
0:11
and if that doesn't happen because again
0:13
this is only my second video like this
0:15
if i fail i'll put a link in the
0:16
description below
0:18
now the goal of this series is to create
0:21
a profitable crypto trading strategy now
0:24
obviously who wants an unprofitable one
0:26
but it's likely that we're going to have
0:28
to create a bunch of strategies that
0:30
lose money before we finally find one
0:31
that makes money let alone the holy
0:33
grail and potentially buy a private
0:35
island now i thought to myself leo
0:37
instead of wasting everyone's time and
0:39
showing you how to import uh you know
0:42
the crypto data every single video
0:44
that's what this video is about instead
0:46
we're going to import the kaggle data at
0:48
one minute resolution re-sample it and
0:50
get it prepared for future back testing
0:52
purposes so without further ado let's
0:55
create some code
0:56
so the first thing that we'll do is
0:58
we'll open up jupiter notebook now if
1:00
you're not familiar with jupyter
1:02
notebook i'll put some introductory
1:04
materials on how to get it installed in
1:05
the link below but long story short it's
1:08
just a web interface where we can put
1:10
python
1:11
and markdown code
1:13
and that's really nice because it makes
1:15
it easy to see so if i put some python
1:17
code or creating some type analysis i
1:19
can explain that analysis or give some
1:22
explaining language around more complex
1:24
code right within the web interface so
1:26
it's pretty nice
1:27
so before we do any coding though we've
1:30
got to download the data
1:32
so go to this link right here and i'll
1:34
put this link in the description below
1:36
and download that so you can follow
1:38
along with me
1:41
so the next step we'll take is we'll
1:42
import pandas import pandas and fpd
1:47
and then we're also going to use zip
1:48
file so do from zip file import
1:52
zip file okay perfect so now that we've
1:55
got our imports
1:58
we'll do import
2:00
imports now import
2:02
now that we've got our imports
2:05
we'll select the
2:07
columns that we want and create our zip
2:08
file objects we'll do zf equal zip file
2:12
and then put in your download location
2:15
obviously be different than mine
2:17
um
2:18
google
2:19
downloads
2:21
archive.zip
2:23
archive.zip
2:26
and then we'll select the columns that
2:28
we want will be time
2:31
open
2:32
i just got these from the kaggle
2:33
description above on the other tab open
2:36
i
2:37
flow
2:38
close
2:39
volume
2:41
control enter that seemed to work
2:43
perfect okay
2:44
now
2:45
the next step
2:47
there's a fair amount going on so i'm
2:48
going to explain it at a high level and
2:50
then i'll explain every line of code and
2:52
then you know kind of wrap up explaining
2:54
it one more time
2:56
you know the thing about python it's
2:57
great you can do so much and so little
2:59
code but at the same time sometimes it's
3:03
almost
3:05
too concise right so
3:07
let's think about what we're doing we've
3:09
got an archive.zip file and within that
3:12
archive.zip file we've got a bunch of
3:14
csv files and each one of those csv
3:17
files contains our crypto data with the
3:19
time open high low close and volume
3:22
so we want to loop through all of those
3:24
csvs create a data frame and then stack
3:27
those data frames on top of each other
3:30
so let's see how that works we'll do dfs
3:33
that way um
3:35
here we'll do for data frames and we'll
3:37
do pd.concat
3:39
which that again stacks the data frames
3:41
on top of each other we'll pass that a
3:44
dictionary we'll do text file dot file
3:47
name dot split
3:50
and we'll make that
3:51
the first element a little bit of key
3:54
we'll do pd.read psv so now reads
3:58
pd.readcsv creates a data frame from a
4:00
csv file but our csv file is actually
4:04
within a zip file so we need to use that
4:06
zip file object we'll use it zf.open
4:09
set the
4:11
file name
4:13
name
4:14
and then only pass the you know the
4:18
columns that we want right so use calls
4:20
equal false perfect
4:22
now
4:23
uh the next part of the code where we
4:25
loop through each text file within the
4:27
zf
4:29
zip file so we'll do text file in df
4:32
info list
4:34
and instead of just looping through and
4:36
importing every file we'll check to make
4:37
sure that the file within the zip file
4:40
ends with dot csv i believe it does in
4:42
this case but you know we'll just add
4:45
this to make our code a little more
4:47
robust we'll do text file dot file name
4:50
ends
4:51
with
4:53
dot csv
4:57
then cfs hit enter and i made a typo so
5:01
let's see what did i do here invalid
5:02
syntax
5:05
okay
5:07
oops
5:11
okay perfect so it looks like it's
5:13
running and again what we're doing we
5:15
have you know our zf info list which
5:19
outputs you know the list of the file
5:21
paths and we'll do text file you know
5:24
for each one of those
5:26
files in the info list we create a csv
5:29
out of them and
5:31
then
5:32
we don't create a csv we put a data
5:33
frame out of them and then we smush all
5:36
the data frames on top of each other
5:38
with dfs
5:40
as you can see here and that looked like
5:42
it worked perfectly now
5:44
the challenge is it's actually not in
5:46
the format that we want right so our
5:48
level 0 and level 1 indices are you know
5:51
the ticker which is great and but you
5:53
know this we have this index here that's
5:55
just useless it's just an integer index
5:58
uh so we want to actually change this
6:00
multi index to the time and then the
6:03
ticker and you know that brings me to a
6:05
point is that any time that we're you
6:07
know coding this stuff up
6:10
we want to try to get as close to our
6:12
data in a format as close to how we're
6:14
going to receive it from a broker
6:16
and their api so you know every minute
6:19
we will receive a you know bar and then
6:21
within or that data and then within that
6:24
data we'll have all of our ticker values
6:26
so we'll we'll set level 0 due time and
6:30
then level 1 to ticker
6:34
but
6:35
before we do that we what we'll do is
6:37
we'll get rid of
6:38
this
6:40
index
6:41
and then we'll reset our index so that
6:43
way we can get our rename our ticker
6:45
from indexes will just be named index do
6:49
thicker
6:50
and then we'll reset that back to that
6:52
multi index
6:53
and then time will also
6:55
make it a little bit more
6:58
human readable so let's go ahead and
7:00
you know take those steps now so we'll
7:02
do
7:05
cf equals dfs drop level so we're
7:09
dropping that integer index will reset
7:11
the index so now our ticker will be just
7:14
named as index we'll do rename
7:17
columns
7:18
equal
7:20
index
7:22
picker
7:24
perfect
7:25
now what we'll do is we'll
7:28
only grab the u.s dollar pairings we'll
7:31
use we'll do some boolean indexing for
7:32
that we'll do df picker
7:35
access the string methods contains usd
7:39
so this checks and return check to see
7:41
if the ticker contains usd
7:44
and it'll return a series of true and
7:47
false values so that's not really
7:49
helpful to us but we can use boolean
7:52
indexing to pass that to our data frame
7:55
so now what that will do is only the
7:57
rows where the ticker contains usd will
8:00
be returned
8:01
the falses will essentially be dropped
8:04
so we'll only have a data frame with us
8:06
dollar pairings
8:08
and now we'll change the time
8:10
field to date even though it's a date
8:12
time but and it's shorter and easier
8:14
types we'll do pd i'm sorry df.date was
8:17
pd.2
8:19
date time
8:21
and we'll pass it the time column
8:24
and we need to let it know it's in
8:26
milliseconds
8:28
we're done with that
8:29
now we need to sort the values and the
8:31
reason we sort the values is that we're
8:34
going to
8:35
uh select we want to be able to slice
8:37
the time index we'll do sort
8:39
values by
8:41
date picker
8:43
we've got that we no longer need that
8:46
time column so we'll drop that so df dot
8:48
drop
8:49
columns equal time
8:53
we'll set the index the f equal df dot
8:55
set index
8:57
date
8:58
time note
9:00
date ticker
9:03
and then we will
9:04
slice you know we don't want to have the
9:06
whole data frame
9:08
just because it'll take a while to
9:09
process you can select however much data
9:11
you want i'm just going to select uh you
9:14
know a year so we'll do cf2020701
9:20
to uh
9:22
20
9:23
20
9:25
uh zero i'm sorry 20 21 007 year one
9:29
that's good enough okay
9:31
perfect and then we'll turn that hit
9:33
enter see if i made any mistakes
9:38
and it looks good
9:40
so we can see we've got our level zero
9:43
set to the date
9:44
and we've got our level
9:46
one set to the ticker and then the open
9:49
close high low volume
9:51
awesome okay so now remember
9:55
um
9:56
you know we don't have
9:59
a you know a minute bar for every single
10:01
ticker because sometimes there wouldn't
10:03
be a trade on those tickers so we need
10:05
to resample this into one minute bars
10:08
and fill the gaps so that's the step
10:10
we're going to take next
10:13
so the first thing that we'll do is
10:14
we'll create a new data frame so we'll
10:16
just call it bar bars
10:19
bars one m equal df
10:21
and that way i'm not accidentally
10:24
overwriting what we've done previously
10:26
and then what we'll do is we want to
10:30
change the index now just to a
10:33
date time index for resampling we'll do
10:36
bars one m
10:37
for bar is one minute bars one m equals
10:40
bars one m dot reset
10:42
index
10:44
we'll set the index back to date so
10:46
that's the only index we'll group by
10:49
thicker
10:51
and then we'll resample
10:54
to one minute and that's one m i n not
10:57
one month one t always works now since
11:00
we're really just resampling one minute
11:03
bars to one minute bars we can
11:05
essentially say last or sum or whatever
11:08
so i'll just do last and then we don't
11:10
want that
11:11
uh index uh we just created sort of do
11:14
drop
11:15
level zero
11:17
perfect so that's step number one
11:20
so the next thing we need to do
11:22
is we need to fill
11:24
the columns right
11:26
but we don't want to fill everything we
11:28
only want to fill the price data because
11:30
we don't want to accidentally add volume
11:32
data where you know no trades existed
11:36
so let's do that now we'll do bars
11:38
1m.lock
11:40
and then we'll pass all of the rows and
11:42
the columns that we want will be bars
11:44
one m dot columns
11:46
and we'll
11:48
do some indexing here to negative one
11:50
which will drop
11:52
the um which won't add the
11:54
volume column so this is you know our
11:56
first column which is zero
11:58
we do negative one we'll go to volume
12:01
and it's the start so before the colon
12:03
is start
12:04
and then stop which is exclusive mean it
12:07
doesn't include
12:10
this column so that's how i know you
12:12
know negative 1 doesn't include there's
12:14
also a step but we're going to worry
12:15
about that at the moment
12:18
and then we'll
12:19
do equals
12:21
rs 1m
12:23
r is one m dot columns again we're just
12:25
grabbing the same columns negative one
12:28
and we'll just forward fill f fill
12:31
perfect
12:34
mess up there okay
12:43
there we go now we need to do the same
12:45
thing for volume because we set
12:48
you know open close high low we forward
12:51
filled that
12:52
and now we need to forward fill the
12:54
volume only we'll fill the volume with
12:56
zero so we'll do bars one m.lock
12:58
same thing all rows but this time we
13:01
don't need to asset you know essentially
13:03
a list of our columns we'll just enter
13:06
volume we know the column correctly
13:09
we'll do bars 1m
13:11
volume
13:12
dot fill n a and then the value equals
13:15
zero
13:16
awesome
13:17
perfect and now because we you know
13:21
change the index let's get that back to
13:22
how we want it we'll do bars one m equal
13:25
bars one m dot reset
13:27
index
13:28
and then set index and we'll set it back
13:30
to the date
13:32
and thicker and then rs1m to output that
13:36
and see if that worked
13:42
and it looks like it did we'll just do a
13:44
quick spot check we've got the date
13:46
and the ticker and open close high low
13:49
volume
13:51
awesome and now we have it we have
13:53
multiple assets and one minute bars that
13:56
we can use to analyze on our hunt for a
13:58
profitable crypto trading strategy so if
14:00
you like this video i'd love it if you
14:02
hit the thumbs up button to let the
14:04
google algorithm know that this is a
14:06
video worth sharing and i hope that you
14:08
join me on our next adventure as we
14:10
search for that elusive profitable
14:13
crypto trading strategy thank you and
14:15
i'll see you in the next one

Get Historical Crypto Price Data Using Pandas and Python (Free)

analyzingalpha.com

📁 How to Open & Extract ZIP Files in Windows 11 | Complete Tutorial

Come Convertire MTS in MP4 Online (Guida Semplice)

The Shocking Truth About Ancient Egypt's Most Powerful Queen

Compound/Composite Bars - Example 1 (PART1)| #StrengthofMaterials #educationfoundationtutorials

Python Exercise 05: Python Cashier App Tutorial: Calculate Total Price and Change Easily

150 Agile Questions & Review: #51-60 (Agile Events & Artifacts)

CODESYS PLC Beginner Tutorial – Control 4 Pumps with 2 Switches Example

The story of dragons in World of Warcraft (and more lore discussions)

business basics: ukraine's counteroffensive strategy: will it end russia's war?

Saros Review

Resident Evil Requiem Video Review

UNCAPTIONED: Ian Somerhalder 'sold everything' to get out of 'eight-figure debt'.

business basics: china's annexation of tibet: understanding the conflict

Türkiye: Temporary pond hosts flamingos in central Türkiye.

Cool TYPOGRAPHY BACKGROUND Animation in After Effects Tutorial

Up next in 10

Get Historical Crypto Price Data Using Pandas and Python (Free)

analyzingalpha.com