Learn how to download free crypto data and convert it into minute bars for backtesting purposes using Python and Pandas.
If you're not interested in learning how to manipulate data with Pandas, you can download the data below.๐
๐ Subscribe for more tutorials like this: https://bit.ly/3lLybeP
๐ Download Historical Crypto Data: https://analyzingalpha.com/historical-crypto-price-data
๐ Crypto Price Shear Mean Reversion Strategy: https://analyzingalpha.com/crypto-price-shear-algo-trading-strategy
0:00 Introduction
0:57 Open Jupyter Notebook
1:33 Download Crypto Data from Kaggle
1:43 Import Pandas & ZipFile
2:03 Create ZipFile Object
2:44 Create Crypto Price Dataframe
5:44 Modify Dataframe Format
9:53 Resample Dataframe Into 1-Min Crypto Price Bars
11:26 Fill Missing Dataframe Data
#cryptocurrency
#download
Show More Show Less View Video Transcript
0:00
hello world it's leo and i'm back with
0:02
video number two in the first video we
0:04
created a crypto price share mean
0:06
reversion system if you haven't seen
0:08
that video i'll put a link or a bubble
0:10
right here
0:11
and if that doesn't happen because again
0:13
this is only my second video like this
0:15
if i fail i'll put a link in the
0:16
description below
0:18
now the goal of this series is to create
0:21
a profitable crypto trading strategy now
0:24
obviously who wants an unprofitable one
0:26
but it's likely that we're going to have
0:28
to create a bunch of strategies that
0:30
lose money before we finally find one
0:31
that makes money let alone the holy
0:33
grail and potentially buy a private
0:35
island now i thought to myself leo
0:37
instead of wasting everyone's time and
0:39
showing you how to import uh you know
0:42
the crypto data every single video
0:44
that's what this video is about instead
0:46
we're going to import the kaggle data at
0:48
one minute resolution re-sample it and
0:50
get it prepared for future back testing
0:52
purposes so without further ado let's
0:55
create some code
0:56
so the first thing that we'll do is
0:58
we'll open up jupiter notebook now if
1:00
you're not familiar with jupyter
1:02
notebook i'll put some introductory
1:04
materials on how to get it installed in
1:05
the link below but long story short it's
1:08
just a web interface where we can put
1:10
python
1:11
and markdown code
1:13
and that's really nice because it makes
1:15
it easy to see so if i put some python
1:17
code or creating some type analysis i
1:19
can explain that analysis or give some
1:22
explaining language around more complex
1:24
code right within the web interface so
1:26
it's pretty nice
1:27
so before we do any coding though we've
1:30
got to download the data
1:32
so go to this link right here and i'll
1:34
put this link in the description below
1:36
and download that so you can follow
1:38
along with me
1:41
so the next step we'll take is we'll
1:42
import pandas import pandas and fpd
1:47
and then we're also going to use zip
1:48
file so do from zip file import
1:52
zip file okay perfect so now that we've
1:55
got our imports
1:58
we'll do import
2:00
imports now import
2:02
now that we've got our imports
2:05
we'll select the
2:07
columns that we want and create our zip
2:08
file objects we'll do zf equal zip file
2:12
and then put in your download location
2:15
obviously be different than mine
2:17
um
2:18
google
2:19
downloads
2:21
archive.zip
2:23
archive.zip
2:26
and then we'll select the columns that
2:28
we want will be time
2:31
open
2:32
i just got these from the kaggle
2:33
description above on the other tab open
2:36
i
2:37
flow
2:38
close
2:39
volume
2:41
control enter that seemed to work
2:43
perfect okay
2:44
now
2:45
the next step
2:47
there's a fair amount going on so i'm
2:48
going to explain it at a high level and
2:50
then i'll explain every line of code and
2:52
then you know kind of wrap up explaining
2:54
it one more time
2:56
you know the thing about python it's
2:57
great you can do so much and so little
2:59
code but at the same time sometimes it's
3:03
almost
3:05
too concise right so
3:07
let's think about what we're doing we've
3:09
got an archive.zip file and within that
3:12
archive.zip file we've got a bunch of
3:14
csv files and each one of those csv
3:17
files contains our crypto data with the
3:19
time open high low close and volume
3:22
so we want to loop through all of those
3:24
csvs create a data frame and then stack
3:27
those data frames on top of each other
3:30
so let's see how that works we'll do dfs
3:33
that way um
3:35
here we'll do for data frames and we'll
3:37
do pd.concat
3:39
which that again stacks the data frames
3:41
on top of each other we'll pass that a
3:44
dictionary we'll do text file dot file
3:47
name dot split
3:50
and we'll make that
3:51
the first element a little bit of key
3:54
we'll do pd.read psv so now reads
3:58
pd.readcsv creates a data frame from a
4:00
csv file but our csv file is actually
4:04
within a zip file so we need to use that
4:06
zip file object we'll use it zf.open
4:09
set the
4:11
file name
4:13
name
4:14
and then only pass the you know the
4:18
columns that we want right so use calls
4:20
equal false perfect
4:22
now
4:23
uh the next part of the code where we
4:25
loop through each text file within the
4:27
zf
4:29
zip file so we'll do text file in df
4:32
info list
4:34
and instead of just looping through and
4:36
importing every file we'll check to make
4:37
sure that the file within the zip file
4:40
ends with dot csv i believe it does in
4:42
this case but you know we'll just add
4:45
this to make our code a little more
4:47
robust we'll do text file dot file name
4:50
ends
4:51
with
4:53
dot csv
4:57
then cfs hit enter and i made a typo so
5:01
let's see what did i do here invalid
5:02
syntax
5:05
okay
5:07
oops
5:11
okay perfect so it looks like it's
5:13
running and again what we're doing we
5:15
have you know our zf info list which
5:19
outputs you know the list of the file
5:21
paths and we'll do text file you know
5:24
for each one of those
5:26
files in the info list we create a csv
5:29
out of them and
5:31
then
5:32
we don't create a csv we put a data
5:33
frame out of them and then we smush all
5:36
the data frames on top of each other
5:38
with dfs
5:40
as you can see here and that looked like
5:42
it worked perfectly now
5:44
the challenge is it's actually not in
5:46
the format that we want right so our
5:48
level 0 and level 1 indices are you know
5:51
the ticker which is great and but you
5:53
know this we have this index here that's
5:55
just useless it's just an integer index
5:58
uh so we want to actually change this
6:00
multi index to the time and then the
6:03
ticker and you know that brings me to a
6:05
point is that any time that we're you
6:07
know coding this stuff up
6:10
we want to try to get as close to our
6:12
data in a format as close to how we're
6:14
going to receive it from a broker
6:16
and their api so you know every minute
6:19
we will receive a you know bar and then
6:21
within or that data and then within that
6:24
data we'll have all of our ticker values
6:26
so we'll we'll set level 0 due time and
6:30
then level 1 to ticker
6:34
but
6:35
before we do that we what we'll do is
6:37
we'll get rid of
6:38
this
6:40
index
6:41
and then we'll reset our index so that
6:43
way we can get our rename our ticker
6:45
from indexes will just be named index do
6:49
thicker
6:50
and then we'll reset that back to that
6:52
multi index
6:53
and then time will also
6:55
make it a little bit more
6:58
human readable so let's go ahead and
7:00
you know take those steps now so we'll
7:02
do
7:05
cf equals dfs drop level so we're
7:09
dropping that integer index will reset
7:11
the index so now our ticker will be just
7:14
named as index we'll do rename
7:17
columns
7:18
equal
7:20
index
7:22
picker
7:24
perfect
7:25
now what we'll do is we'll
7:28
only grab the u.s dollar pairings we'll
7:31
use we'll do some boolean indexing for
7:32
that we'll do df picker
7:35
access the string methods contains usd
7:39
so this checks and return check to see
7:41
if the ticker contains usd
7:44
and it'll return a series of true and
7:47
false values so that's not really
7:49
helpful to us but we can use boolean
7:52
indexing to pass that to our data frame
7:55
so now what that will do is only the
7:57
rows where the ticker contains usd will
8:00
be returned
8:01
the falses will essentially be dropped
8:04
so we'll only have a data frame with us
8:06
dollar pairings
8:08
and now we'll change the time
8:10
field to date even though it's a date
8:12
time but and it's shorter and easier
8:14
types we'll do pd i'm sorry df.date was
8:17
pd.2
8:19
date time
8:21
and we'll pass it the time column
8:24
and we need to let it know it's in
8:26
milliseconds
8:28
we're done with that
8:29
now we need to sort the values and the
8:31
reason we sort the values is that we're
8:34
going to
8:35
uh select we want to be able to slice
8:37
the time index we'll do sort
8:39
values by
8:41
date picker
8:43
we've got that we no longer need that
8:46
time column so we'll drop that so df dot
8:48
drop
8:49
columns equal time
8:53
we'll set the index the f equal df dot
8:55
set index
8:57
date
8:58
time note
9:00
date ticker
9:03
and then we will
9:04
slice you know we don't want to have the
9:06
whole data frame
9:08
just because it'll take a while to
9:09
process you can select however much data
9:11
you want i'm just going to select uh you
9:14
know a year so we'll do cf2020701
9:20
to uh
9:22
20
9:23
20
9:25
uh zero i'm sorry 20 21 007 year one
9:29
that's good enough okay
9:31
perfect and then we'll turn that hit
9:33
enter see if i made any mistakes
9:38
and it looks good
9:40
so we can see we've got our level zero
9:43
set to the date
9:44
and we've got our level
9:46
one set to the ticker and then the open
9:49
close high low volume
9:51
awesome okay so now remember
9:55
um
9:56
you know we don't have
9:59
a you know a minute bar for every single
10:01
ticker because sometimes there wouldn't
10:03
be a trade on those tickers so we need
10:05
to resample this into one minute bars
10:08
and fill the gaps so that's the step
10:10
we're going to take next
10:13
so the first thing that we'll do is
10:14
we'll create a new data frame so we'll
10:16
just call it bar bars
10:19
bars one m equal df
10:21
and that way i'm not accidentally
10:24
overwriting what we've done previously
10:26
and then what we'll do is we want to
10:30
change the index now just to a
10:33
date time index for resampling we'll do
10:36
bars one m
10:37
for bar is one minute bars one m equals
10:40
bars one m dot reset
10:42
index
10:44
we'll set the index back to date so
10:46
that's the only index we'll group by
10:49
thicker
10:51
and then we'll resample
10:54
to one minute and that's one m i n not
10:57
one month one t always works now since
11:00
we're really just resampling one minute
11:03
bars to one minute bars we can
11:05
essentially say last or sum or whatever
11:08
so i'll just do last and then we don't
11:10
want that
11:11
uh index uh we just created sort of do
11:14
drop
11:15
level zero
11:17
perfect so that's step number one
11:20
so the next thing we need to do
11:22
is we need to fill
11:24
the columns right
11:26
but we don't want to fill everything we
11:28
only want to fill the price data because
11:30
we don't want to accidentally add volume
11:32
data where you know no trades existed
11:36
so let's do that now we'll do bars
11:38
1m.lock
11:40
and then we'll pass all of the rows and
11:42
the columns that we want will be bars
11:44
one m dot columns
11:46
and we'll
11:48
do some indexing here to negative one
11:50
which will drop
11:52
the um which won't add the
11:54
volume column so this is you know our
11:56
first column which is zero
11:58
we do negative one we'll go to volume
12:01
and it's the start so before the colon
12:03
is start
12:04
and then stop which is exclusive mean it
12:07
doesn't include
12:10
this column so that's how i know you
12:12
know negative 1 doesn't include there's
12:14
also a step but we're going to worry
12:15
about that at the moment
12:18
and then we'll
12:19
do equals
12:21
rs 1m
12:23
r is one m dot columns again we're just
12:25
grabbing the same columns negative one
12:28
and we'll just forward fill f fill
12:31
perfect
12:34
mess up there okay
12:43
there we go now we need to do the same
12:45
thing for volume because we set
12:48
you know open close high low we forward
12:51
filled that
12:52
and now we need to forward fill the
12:54
volume only we'll fill the volume with
12:56
zero so we'll do bars one m.lock
12:58
same thing all rows but this time we
13:01
don't need to asset you know essentially
13:03
a list of our columns we'll just enter
13:06
volume we know the column correctly
13:09
we'll do bars 1m
13:11
volume
13:12
dot fill n a and then the value equals
13:15
zero
13:16
awesome
13:17
perfect and now because we you know
13:21
change the index let's get that back to
13:22
how we want it we'll do bars one m equal
13:25
bars one m dot reset
13:27
index
13:28
and then set index and we'll set it back
13:30
to the date
13:32
and thicker and then rs1m to output that
13:36
and see if that worked
13:42
and it looks like it did we'll just do a
13:44
quick spot check we've got the date
13:46
and the ticker and open close high low
13:49
volume
13:51
awesome and now we have it we have
13:53
multiple assets and one minute bars that
13:56
we can use to analyze on our hunt for a
13:58
profitable crypto trading strategy so if
14:00
you like this video i'd love it if you
14:02
hit the thumbs up button to let the
14:04
google algorithm know that this is a
14:06
video worth sharing and i hope that you
14:08
join me on our next adventure as we
14:10
search for that elusive profitable
14:13
crypto trading strategy thank you and
14:15
i'll see you in the next one
