Learn how to backtest a high-frequency price shear mean reversion algorithmic trading system on multiple assets with multiple timeframes using Python and Pandas.
π Subscribe for more tutorials like this: https://bit.ly/3lLybeP
π Follow along: https://analyzingalpha.com/blog/crypto-price-shear-algo-trading-strategy
00:00 Introduction
00:17 What Is a Price Shear?
01:04 Project Overview
01:50 Open Jupyter Notebook
02:15 Get Imports
03:03 Import Crypto Universe
04:55 Read CSV Files Within a Zip file
07:28 Create Multilevel DateTimeIndex with USD Pairs
12:41 Create Indicators
13:41 Average True Range (ATR) Indicator
15:20 Exponential Moving Average (EMA) Indicators
16:36 Resample Time Series in Pandas
22:10 Create 5-min Bar Aggregates
29:00 Create Daily Bar Aggregates
30:46 Merge Dataframes
Show More Show Less View Video Transcript
0:00
i almost did it again i almost forgot to press record hi i'm leo and i'm a trader
0:05
of the algorithmic variety today we're going to create a crypto trading strategy using python and pandas more
0:12
specifically we're going to create a crypto price sheer mean reversion system
0:17
you might say leo what in the world is a price share well a price here is just fancy trader lingo for an excessive move
0:23
to the up or down side now let's think about this using bitcoin as an example if we're trading bitcoin
0:30
on the five minute bar and there's a large five minute bar relative to the historical bars algos and traders are
0:36
going to come in and sell to try to capture some of that profit the same thing happens to the downside there's a
0:41
large move down again relatively speaking algos are going to you know end up covering their shorts same with the
0:48
shorts and longs are going to come in and try to get a better value actually i certainly said value i should
0:54
have said price can only analyze bitcoin using a relative pricing model because it doesn't create cash flows but now i
1:01
digress you might say leah why should i stick around for the next 45 minutes or so which is roughly how long i think this
1:07
video is going to take well there's a few reasons the first is you're going to be able to follow along with me every step of the way and second we're going
1:14
to do a few things that i haven't seen done before so we're going to take multiple assets
1:20
on multiple time frames and apply multiple indicators creating a signal so
1:25
that by the end of this video hopefully just hopefully we'll be able to buy that private island who knows we'll see
1:31
so the other reason is that if you have any questions i've got about four subscribers so you can feel free to
1:37
leave that question in the comment below and i'm pretty sure i'll be able to get back to you so with uh without dragging
1:44
our feet any little bit longer why don't we create some code we'll start by opening jupyter notebook if you're not
1:50
familiar with what jupyter notebook is i'll add a link in the description below so you can read a little bit more about
1:55
it and how to get jupyter installed but essentially jupyter notebook is just a web interface where we can add python
2:01
and markdown code in fact you can see that i've already added in markdown the title and the seven steps that we're
2:08
going to take to prototype our crypto trading strategy but with that out of the way let's go ahead and get our
2:14
imports we'll grab date time numpy and pandas in
2:19
fact step one is by far going to be our easiest step we'll do import date time as dt
2:26
import numpy as np and import pandas as pd
2:33
and if you're not familiar with pandas you can think of it like an ultra powerful excel in fact
2:39
pandas was developed by wes mckinney at aqr which it cures one of the largest if
2:44
not the largest quantitative hedge fund the reason why i'm bringing this up is that pandas that was designed to do
2:50
exactly what we're going to do analyze massive amounts of financial data in an easy way with that being said let's grab
2:58
our universe the universe is just the assets that
3:03
we're going to analyze our universe could be stocks it could be bonds it could be stocks and bonds but in our
3:09
case our universe is crypto and our universe can be found in this download link i'll put this link from kaggle in
3:16
the description below but go ahead and download that so you can follow along
3:21
now even though this step will have only a few lines of code there's a lot going on
3:27
so what i'm going to do is i'm going to explain what we're going to do and how to import using zip file then we'll go
3:33
through the code and then i'll explain it one more time because this is pretty dense but my goal is to make sure that
3:38
you can follow along and uh understand everything and it's crystal clear
3:43
so what we're going to do is we're going to grab that archive.zip that we just downloaded we're going to open it in
3:50
python we're going to list all of we're going to loop through every csv file
3:56
that exists and create a data frame out of every single csv file and then
4:01
push them all together and make one large data frame but first let's uh start by creating a zip file object so
4:08
we'll do from zip file import zip file we'll do zf equal
4:15
zip file now obviously your directory is going to be different than mine if it's not
4:22
give me a call because we might be related i'll see downloads archive dot
4:27
zip and then we'll select the columns that we want so we'll do calls we'll create a list we'll do time
4:32
open high low
4:38
low close and volume and hit control enter hopefully no
4:44
mistakes but i promise you i will make plenty the perils of live coding on youtube
4:51
okay all right so now um now that we have our zip file we're going to create our data frame so we'll
4:57
do dffs equals pd concat which again just smushes one data frame on another
5:03
we're going to pass concat these data frames we'll do text file
5:10
file name dot split which is e the key portion
5:16
now we're going to split on the uh on the period so we don't want this dot csv part of the file name we want
5:22
the stuff that ha that is before it right the name that's the key and now the value
5:29
uh for the uh is the actual data frame of the data for
5:35
that asset itself so we'll do pd.read csv which creates a data frame from a
5:40
csv file we can't access the csv file directly because we have to
5:46
use vf.open to open the zip file to grab that csv file we'll do text file dot file
5:53
name right and then we'll select the use calls equal calls okay
5:59
now for more of the magic so now we do for text file that's where we're getting
6:06
text file from in df which is our zip file object info
6:11
list and then we only want to create you know we only want to go through
6:17
files that end in dot cs so if x file file name dot ends
6:23
with no not which ends with dot csv
6:30
okay and then we'll list it out i'll hit enter cross my fingers i didn't make any mistakes
6:36
okay so again you know we're looping through all of the files in our zip file and if it's a csv
6:43
we're creating a data frame out of it and using concat to merge them all together
6:49
now one thing i want to mention if you're new to programming i highly recommend you open up the terminal and
6:56
open python and run these commands in a more granular fashion so what you could do is
7:02
you could do zf info list on the archive file so you can see exactly what you
7:08
know that outputs right because there's like i said a lot going on and so far we're one for one but again i
7:15
promise you i'll make mistakes again trying to do youtube and explain
7:21
it all at once an interesting dynamic but anyways progressing so we returned our data
7:27
frame but the data frame isn't in the format that we want now whenever we're
7:32
doing this stuff i highly recommend you format the data in the manner that most
7:39
represents how the data is returned from a broker so for instance whenever we
7:45
connect to interactive brokers or alpaca or some other brokerage api which i will create videos on in the future because
7:51
there's all sorts of fun caveats to it we want the data to be in the format
7:59
that most represents how we're going to get the data okay so what i mean by that
8:04
is for every minute right alpaca will or interactive brokers or whatever
8:10
broker we choose connection oh sorry that's speaking of interactive
8:15
brokers um let me know but anyways um
8:22
what we want to do is we want to we're going to get our data from a one minute bar and then in that minute
8:29
we're going to get all of the asset you know prices and values so
8:34
so let's go ahead and do that so again what we want to do is we want the first level of the index which right here to
8:39
the time and then the second level will be the ticker then we'll have open high low close volume although the time i
8:46
don't know about you i can't read that i believe that's unix epoch time we'll make that into a more readable
8:53
format so let's do that now okay so we'll do
8:58
df so we're going to create a new data frame because i probably will mess this up and i don't want to accidentally
9:05
you know overwrite the data frame that we created in the previous step so we'll do df equal dfs
9:10
we'll drop level one because we don't want that right that's that that integer uh
9:17
the integer index right there then reset the index and since there's only one index the
9:23
name of our index will be index then we want to rename this
9:29
to sticker so rename columns
9:35
index which is the current name and fixture is perfect so the next thing we want to do is we
9:41
don't want to trade non-usd pairs so let's use some boolean filtering to only
9:47
grab the usd pairings do this by doing df and i'll just show you
9:53
boolean logic first thicker access to string methods contains usd
10:00
so what this will do this line of code will return a true or a series which contains true
10:07
or false for every row depending whether or not the ticker contains usd so that's
10:12
great but we don't just want a true and false data frame but what we can do we can use boolean indexing
10:20
where we pass a bunch of true and false values so now we're only going to return the rows
10:25
where the boolean index was true so in other words this will create a data frame that only contains the tickers
10:32
where the ticker contains usd super awesome okay so now that's done
10:38
now let's fix the time so we'll do df date equals cd
10:43
2 date time df and we pass it to time and the time
10:49
is in milliseconds we'll do unit equal milliseconds awesome so that's the
10:55
third step okay now let's sort the values we have equals df.sort values by
11:03
we'll pass it a list of the date and the ticker we'll then drop
11:08
the time because now remember we created a date column with the proper formatting and time is no longer needed so we'll do
11:15
df equals df.drop columns time
11:21
and now let's set our index we're going to again use a multi-index the first level is time in the second level so
11:26
level zero is time uh the date time and the uh second level or should i should say
11:33
level one is ticker so df equals df.set index
11:39
list date and ticker and now because i don't want these steps
11:45
to take forever and the fact that our level zero index
11:50
is a uh you know date time now we can now slice using time so we'll do
11:57
df equals df and then what do we want to do we'll do six months i think that'll be fine
12:02
okay 20 20 12 31.
12:08
i'll put the df and i'll hit enter and keep my fingers crossed to see that this works
12:20
and it worked so it looks like we've got the exact format that we want we have the multi-index with date and ticker
12:27
as level 0 and level 1 then the open close high low volume awesome okay now let's go ahead and move
12:34
on to creating our indicators the easiest way to determine the
12:41
indicators that we need is to think about the signal that we're going to create if you recall at the beginning of
12:48
the video i said we're going to create a crypto price shear strategy
12:53
and a price shear is simply an extreme move in one direction or another
12:58
now an extreme move you know is relative to you know what we make of it there's multiple ways to measure it we could use
13:04
a more quantitative method where we're analyzing the standard deviation of log returns or we could use a more
13:10
traditional method and use the average true range i'm going to opt for the latter because i found average true
13:16
range i believe it was on stack overflow so we'll copy and paste that and make our lives a little easier
13:22
and um that brings up to brings me to a good point is that the great thing about
13:27
python is that there's so much code out there especially in the uh the financial
13:34
space so okay so let's cover the function here
13:39
so first off we create a function called atr right average true range it brings
13:44
in a data frame because we need multiple columns we need high low and close and i've set n for the number of
13:52
you know essentially rows that we need to calculate um the 12 because i'm going to be using the
13:58
ema 12 because that's a popular day trading dma so with that being said the next thing
14:05
we need to move on to is the actual true range calculation now if you're not familiar with what atr
14:12
is it's simply the maximum move right from either the high the close so you can see that right here the absolute
14:18
value of the higher minus the close and it also considers overnight movement
14:23
so it's the absolute value of the high minus the prior close or the absolute value of the
14:30
low minus the prior close what does that mean we're basically trying to find the largest move
14:35
including the overnight move and or extended hours it's different than the adr which is the average daily range
14:43
which just measures standard hours movement okay and then simply we find the maximum
14:50
of that right and then once we have uh you know essentially a series with
14:56
that uh that data we pass it to the wells wilder ema it's very similar to a
15:02
standard ema and that returns our values i'm going to actually remove this different okay so
15:08
this is how we return um the the average true range
15:14
we're also going to create a few more indicators but these are pretty simple we'll create an ema 12 and ema 26 again
15:21
these are popular day trading emas it's one of those things where you know i don't know if it's a
15:27
self-fulfilling prophecy you know everyone looks at is looking at the 200 simple the 50 simple and the 12 and 26
15:34
uh emas but i'm digressing once again we'll do ema 12 equals use a lambda
15:40
expression and the x for x exponential band equals 12
15:47
min periods equals 12 adjust equal false ignore
15:53
n a we don't want to do that either and then not mean so this will create uh the this is a function we're
15:59
returning a function that we can apply that will return uh the ema values so we'll see what i
16:07
mean in a second so then there's 26 26 26 i'll hit enter
16:13
or i hit enter up here just oh i didn't okay
16:21
okay perfect okay so it looks like those two are correct so now that we have our
16:26
indicators assuming they are correct let's go ahead and move on to resampling our time frame
16:35
when we resample we take one time frame and convert it to another in our strategy we're going to use the
16:42
five minute and daily bars so stop for a minute and think about what time frame are we going to resample
16:49
first and this is a trick question so actually we're going to re sample our
16:57
minutely data that we got from kaggle into the minute time frame so why would we do
17:03
that well the key is there's a lot of data that we received from kaggle but unfortunately i guess it
17:10
doesn't really matter unfortunately unfortunately some of the rows or some of the timestamps are missing data right
17:17
if we connect to our broker and no trades existed we're not going to just get no trade existed we're not going to
17:22
get any data there's bandwidth reasons behind that but the whole point is
17:27
if there's no trades for that period either our data provider such as polygon
17:33
or you know a um broker won't send us the data but that's not good for back
17:38
testing in fact it can create errors so what we want to do is we actually want to take our minutely data and resample
17:44
it into minute data so that way we fill all of the missing rows and then since
17:50
we fixed our lowest time frame or a highest frequency time
17:56
frame all of those corrections will flow through to the higher time frames okay
18:02
we'll create uh bars one m equals df again not to mess up our prior work
18:09
we'll do bars one m equals bars one m dot m.reset
18:14
index and the reason why i'm doing this is that right now we have a multi-index we actually want to
18:20
get only the date in the level 0 so we can resample so we'll do reset index and
18:26
we'll set the index back to date and we'll group by
18:31
quicker and then we'll resample one min now this is important don't do
18:37
one m because that'll be one month you can do one m i n or one t and then since it just doesn't really
18:44
matter we'll do last it could be mean again we're resampling uh you know one bar to one bar so last
18:51
is perfectly fine and then we'll drop level because we don't
18:56
need that index anymore so now that we've created a new data
19:02
frame where we have a bunch of rows where we're missing data where there was no trades likely in the smaller
19:08
app cryptos right now what we need to do is we need to fill the values so the close price you
19:15
can get this forward for that now it's not there's some copies so but long story short we can just forward fill that data
19:21
but what we don't want to do is we don't want to forward fill the volume right because we don't you know volume
19:28
we're looking for the sum of the volume so we don't accidentally want to create all this volume when it actually really
19:33
didn't exist so we've got two fills to do first is the the prices where we'll
19:39
forward fill and then the second is the volume where we'll fill any missing values with zero
19:46
hopefully that makes sense we'll do bars one m dot lock we'll select all rows and then we'll
19:51
just use bars one m dot columns we get the list of the columns but instead of doing all of the columns we'll do all
19:58
but the last one so we can use negative one right to get
20:03
let's see i'll show you what i mean here so this is our zero element we do negative one it comes back
20:11
to volume and since uh it's exclusive and not inclusive it does not include
20:18
that bar for that column i should say so we've got that
20:23
we'll do equals bars one m
20:28
bars one m dot columns then negative one and then we'll do f fill for forward
20:35
fill that takes care of uh everything but volume and we'll just
20:41
do the same thing for volume colors.lock select all the rows then we
20:46
can just type volume here equals bars 1m
20:51
volume and this time instead of forward filling we're doing fill and a value equals zero
20:58
zero awesome now let's set our index back to how we
21:03
had it previously because remember uh we adjusted that so we'll do
21:08
bars one m equals bars one m plot reset index
21:15
and now the index that we had which was just the date is now reset then we'll do
21:20
set index and we'll pass it a list which again is date and ticker then we'll do
21:28
bars one m and hit enter and hopefully i didn't make any errors
21:41
and it looks like it worked
21:48
perfect now that we've got the one minute bar data frame let's move on to
21:55
the five minute right so we fixed we fixed the lowest level so creating the
22:00
five minute and the daily now is going to be how do we do that well we use something
22:06
called a aggregate function right so when we have the you know one minute bars we need to you
22:12
know kind of smush those minute bars into one five minute bar now we need to
22:17
tell pandas how we're gonna handle uh the data right you know what's the the
22:23
close well the close is obviously the last close value so this is pretty
22:28
common so we'll create a dictionary we'll just let pandas know how to handle everything
22:34
so the open will be the first value right
22:40
and then the high is the max the low
22:46
is the min the close is the last
22:52
and the volume is the sum perfect
22:59
okay now what we do we scroll up here
23:04
is now we need to use group by and pd.grouper to change the frequency so
23:10
whenever we're doing this we want to create these five-minute bars but we need to aggregate the bars and also
23:17
group by the ticker we don't want you know you know ethereum's bars to actually
23:22
um you know somehow get mixed with you know bitcoins bars so go ahead and do that
23:29
under bars 5m the bars one and we'll group by
23:34
and we'll do dot grouper but we'll take the level zero we're going to change it to the
23:39
frequency again instead not 5m but five min and then we're going to make the label
23:48
to the right label now before i move on i want to explain what that means but whenever we're working with time data we
23:55
can use the left or right labels now think about that five minute bar the five minute bar begins at zero zero
24:02
colon zero zero colon zero zero and ends at zero five colon zero zero
24:07
colon zero zero we actually want to use the right edge for the label and the reason for that is
24:15
so it helps us prevent you know a cardinal sin called look ahead bias but think about this for a minute if we
24:21
receive a daily bar today which is september 17th we actually don't get that bar until
24:27
tomorrow so we don't want to accidentally merge the
24:32
bars incorrectly so by setting the label to the right hand side
24:38
essentially that label is when we receive that bar so we receive the daily
24:43
bar not at the beginning of the day but at the end of the day are technically you know tomorrow so that's what we're
24:49
doing we're avoiding some of this look-ahead bias by oh by labeling everything when we get the bar hopefully
24:56
that makes sense okay so we've got our first group we've changed the frequency to five-minute
25:04
grouper level equal one now we pass it to an aggregate function which essentially we give it that
25:11
dictionary so it lets pandas know how we're going to handle all of those fields
25:16
and that's it okay perfect now what we want to do is we want to create our you
25:22
know create columns for our indicator so we'll do bars 5m equals ema 12 we'll do our ema 12 then
25:30
we'll do our 26 we'll do bars 5m group by quicker
25:36
close apply dma12 i'll copy and paste this we can do the
25:42
ma26
25:49
so that's really nice remember we created those functions previously those lambda functions ema12 and ma26 now we
25:56
can apply that now one of the things to understand is that when we group by ticker we're sort of looping through
26:03
each ticker taking that closed series and applying that so it all works properly
26:09
i want to mention this because it's really important when working with a multi-index that we group by for
26:14
shifting or not grouping you know doing any type of
26:20
operations and we're not group buying we can accidentally mix the data so it's really important to
26:26
understand we need the group by okay all right so the next thing is we're going to apply that atr
26:33
function that we created previously or create our atr indicator now the way it works we kind of change
26:39
the index a bit so the way we're going to create this atr is we're actually
26:45
going to use a merge so we'll do bars 5m equals bars ibm
26:50
merge we'll pass our bars 5m to the atr function
26:56
and then we'll merge on our index so on and list
27:01
date and now if we think about it we're going
27:06
to have missing values we have an ema 26 so we're not going to get any values for
27:12
those 26 bars right um we we need to warm up it's called warm up our indicator it will drop that
27:20
uh any data that's missing but we won't drop any data that's just simply uh you know simply missing because
27:27
you know there could be cryptos and i'm sure there are that don't have the entire history right
27:32
there might be new cryptos so we only want to drop any rows that have missing values in the ema
27:40
26 which is our largest ema so rs 5m equals bars 5m
27:47
drop n a and this is where we pass that subset equals ema26 right so that 26 will obviously
27:55
also correct or remove values for the ema12 and since we're using multiple time
28:01
frames we want to change the names of these columns so we'll do rs5m columns we'll use that list
28:07
comprehension will be c plus we'll do underscore 5m
28:15
or c in bars five m columns
28:22
okay so what this is doing is for every column in the columns change the name to
28:28
the column name plus that plus 5m
28:33
okay and then what we'll do here it will be bars 5m and keep our fingers crossed make
28:41
sure that working
28:49
and it looks like it did so now with our five minute data frame
28:54
let's move on to our daily data frame and this should be easy because we're essentially doing the exact same thing
29:00
so i'm going to go ahead and copy paste
29:09
we'll change one day we need to change the frequency to one day
29:15
got our ema 12
29:22
now you can see why i created the indicators first right so um makes it easy we can apply the same
29:28
indicator over multiple time frames we don't need the average true range we
29:34
do want to drop anything that we do want to drop n a values
29:40
and we definitely need columns
29:47
one d one more time one d if i got that so bars one d
29:53
we're taking the one minute time frame we're grouping by again the first level which is frequency we're changing that
29:59
frequency to one day we're also grouping by the ticker and we're setting and um each column we tell pandas how to
30:06
use it using the or how to how to merge or not merge but how to re-sample using the dictionary
30:14
and we've got our ema 12 is one day ema26
30:19
got our bars n a so we're dropping dma one more time and renaming the
30:24
column so that looks good so i'm going to hit enter and hopefully we'll go two for two
30:33
and we did perfect so now that we have our five minute uh data frame and our daily data frame
30:41
it's time to actually merge the data frames now if you think about it
30:47
our five minute and daily time frames no longer align and the reason for that is
30:53
is that the 26 ema on the daily takes a lot longer to warm up in the five-minute ema right
31:00
makes sense so let's go ahead and print out the starting and ending values of
31:05
both of our data frames we'll do print bars one d index 0
31:12
0 which will give us the starting date we copy and paste here
31:17
five minute and then we have to again negative
31:26
enter and we can see that again it makes sense right it took 26
31:32
days for our daily ema to warm up we dropped that because we remember we started on
31:38
first of july and our ending date no longer aligns
31:44
let's go ahead and get those to align we'll start by resetting the index with
31:50
bars 5n equals bars 5m reset index
31:56
and bars 1d equals bars 1d reset index hit enter
32:02
and now we'll align the start and then times using boolean index angle so we'll
32:08
do bars 5m equals bars 5m dot lock bars 5m date and we can see up above
32:16
that we want the five minute we only want the five minutes that are greater
32:22
than or equal to the daily right because right now from the first we want it to be the seventh at
32:28
zero zero zero zero zero zero two bars one d date
32:35
dot min one there we go so that'll give us all of the uh that'll give a little align
32:42
the the starts now it's time to align the end times we'll do bars one d equals bars one d
32:49
dot lock r is one b eight and we're just doing the same thing here
32:58
and that'll be less than or equal to the bars five minute date dot mac
33:05
okay so let's look at that so the one the the daily data
33:10
you know ends on the first well our five minute data ends on the 31st so obviously we want
33:19
our daily data that is less than right because right now it's more than our five-minute data
33:26
but we also need one final line because that'll move it back to the 31st at 0-0
33:32
but we still have a 5-minute bar on the the 31st for the 5-minute so now we need to make the five minute also
33:39
align to our newly adjusted daily we'll do bars five m equals
33:45
rs5m.lock rs5m date
33:51
is less than or equal to the bars one d date dot max
33:59
and then enter a new um cell and we'll just make sure that these
34:07
align and again since we reset the index we can't you copy and paste
34:12
do print rs1b date i lock zero
34:20
times five minute
34:26
minute again we're gonna use negative one negative one and enter
34:32
and it does indeed align so now that our start and end
34:37
dates align it's time to do our merge okay so not too hard we'll do bars because we're
34:44
only going to have one data frame because we're going to you know take the five minute and the daily data frames
34:50
and mush them all together ours equals bars five m dot merge we're gonna use the five minute on the
34:57
left hand side and we're gonna align the daily data to the
35:02
the bars five minute index which will create a bunch of empty values because there's obviously a lot more rows in the
35:08
five-minute data than the daily data we'll do bars 5m merge and we'll merge it with the 1d
35:15
data frame you want it merging it on i'll pass it the date and the ticker
35:21
right and how equals left which again means any values that don't exist
35:27
uh that that don't exist in the five minute that exists in the daily will be dropped but that doesn't matter because
35:34
um you know the five minute includes all of the daily times that make sense
35:40
okay so now with that said we now need to fill all of the missing values right
35:45
and we've sort of done this before so we'll do bars bars one d dot columns
35:51
equals bars dot group by picker
35:57
r1b rs 1d columns
36:04
transform lambda x and x dot build
36:10
perfect and then now we'll just reset that index so bars equal bars dot set
36:16
index make that date and the kicker and then we'll print out our new data frame
36:23
right and now uh oh i made an error let's see
36:31
bill okay so i forgot to the other f fl
36:38
perfect so now we have our new data frame and now if you're really paying attention you'll notice that we forward
36:45
fill volume we're not going to use the daily volume so that's okay but if you're you know just
36:51
something to think about if you know you're ever merging multiple time frames again always really think
36:57
through what you're doing especially when you're feeling or you are doing any uh shifting or
37:04
anything like that with multiple assets now that we have the data frame with multiple time frames it's time to create
37:11
our signal awesome stuff now there's multiple ways to determine
37:17
what a price share is i mean we can model it multiple ways the way i'm going to model
37:23
it is i'm going to say that whenever price moves you know two and a half times the
37:29
average true range away from the atr meaning it's getting extended from the atr i'm going to consider that the price
37:35
share and i'm assuming that the price is going to revert back to the atr which is
37:40
you know that the we're assuming the mean so let's see this in python we'll do
37:47
bars bear or share bear
37:52
uh what's the tongue twister equals we'll do np.abs for the absolute value
37:58
we take the close price so bars close of five minute minus
38:03
the ema right so ema 12 on the five minute so that gets the distance between the closed price
38:11
and the ema now what we want is we want to see if it's two and a half times the
38:17
atr so the bars atr five minute
38:24
times 2.5 so that gives us you know so what are we saying here that gives us
38:29
the close is more than two and a half times the atr from the from the moving average now that's just a share in
38:36
general right what we need to do is we need to make this
38:42
we need to identify that it's a bear share and the way we do that is we do bars
38:48
close five minutes is less than bars
38:53
ema well five minutes right so if it's under the ema we know it's a bare share and if
38:59
it's above we know it's a i'll hit enter that seemed to work perfect
39:04
now what we want to do is we want to identify when we have a position so the way this strategy is going to work is
39:10
whenever a bear share occurs you know the first instance of a bear share so even if there's a two bear shares in a
39:16
row we only trade the first one whenever a bear share occurs we then sell
39:21
on you know on the the next bar that makes sense we buy it because it's a downward move and then when it moves to
39:28
the next bar we sell hopefully that makes sense so let's identify when we're going to have a position so the way we do this is
39:34
we do bars position equals mp.net okay so we just create a column and we
39:41
set it all to name values now we identify when a bear share occurs
39:48
and assuming if the bear share occurs and the bear share didn't have a bear share on the
39:54
previous bar that's when we enter their position right so it's the first instance of a bear share
40:01
the bars position use equals np dot where
40:06
we'll do bars bear there
40:12
okay so they're better for the true so we know that a shared bear is you know occurred
40:19
and dot group by because we need to always
40:26
remember we've got a group by the ticker
40:31
here there dot shift which gives us prior value is equal to false right so
40:40
if the first instance of a fair share because the previous bar did not have a bear share or return a one
40:46
otherwise it's a zero and now we have you know ones and zeros
40:53
in our data frame right so if the bear share it's a one if it's not we return a zero so now
41:02
what we want to do is we actually want to shift our position by one bar and the reason we do that is
41:09
it again look ahead bias we don't we cannot make a decision
41:15
to buy or sell on the same bar that we receive the signal it has to be on the next bar because we didn't have the
41:21
information yet so we'll do bars position
41:27
equal bars group by because we have to keep everything together once again group by
41:33
picker position position
41:40
dot shift alright and then hit bar
41:45
type bars and we'll hit enter hopefully we didn't make any mistakes and that looks good so now we have
41:53
a data frame with a bunch with it with either the bare share so we do one true here
42:01
but we can't see if there's any position so let's let's identify to see if we can find a position so we'll look at bts and
42:08
that be this so btc equals bars swap level
42:13
i'm going to swap the index level because i want to be able to easily access the btc so
42:19
up level one zero so swap zero and one right but
42:24
excess for cross section btc usd now what i'll do is i'll say etc
42:32
where btc air bear
42:39
is equal to true right so that's again that boolean logic and i'll just be detail
42:47
okay so here is where all of the bear shares occurred on um
42:52
but again you can't take your position until the next bar right so the bear share is true the position is zero that
42:59
is correct but i want to make sure that we're working so i'm going to take this date because i know that's when a bear
43:05
share occurred and move this
43:11
and i want to see values around that time so we'll do btc put in our time that i just copied
43:19
and then we'll do add 20.
43:25
okay so now we should have um actually i'll move it that's fine
43:32
let's see we have a share bearer through through okay so doesn't look
43:38
like it's working there but there might be multiple bear shares so here's our first instance and then we have position
43:44
okay so i need to go back a little bit all right so let's see 20
43:50
we were 21 20 and i'll just do 45
44:00
i'm going to make sure that string of all right so it does look like it's working so we have the signal on this
44:07
bar we take the position and it looks like there was a fair a fairly large drop so we probably
44:13
didn't get any meaner version on this one right but it does look like our position we bought it and then we sold on the next
44:18
bar so awesome it looks like it's it's working okay
44:24
so now that we have um you know our signal and our position
44:29
it's time to examine our results but before we dive in i want to take a
44:35
brief moment to talk about log prices and log returns so log prices have a ton
44:41
of various benefits in quantitative finance
44:46
but for our purposes we're able to simply add log prices to get the return right
44:53
now if you think about standard returns that you're used to most likely you know you can't just add percentage changes
44:59
you'll get an incorrect value actually if you're interested in log prices and the
45:04
different benefits and drawbacks in quant finance and algorithmic trading let me know in the comments and maybe
45:10
you'll create a video on that but again like i said the primary benefit is that we're able to add
45:16
log returns together so if you think about what we're doing we're going to get the return on each bar
45:23
then we're going to sum those returns you know into daily returns which we can
45:30
use just like we did prior and we resampled the volume into those aggregate bars so let's go ahead and see
45:37
exactly what i mean under v 3 b
45:42
0.0 obviously that's not true but we'll set it to zero for right now now we're
45:48
going to create our close of the our five minute closes into log five minute closes we'll do bars
45:56
close by minute blog is equal to bars
46:01
close five minute 54 minute no five minute apply and we'll do np.log so this
46:07
converts our current closed prices into log prices now just like i said we could add them to
46:14
get the returns we can also difference them or subtract them to get the change in returns right
46:20
per bar so we'll do bars we'll do bar return log
46:26
is equal to bars dot root by again we always have to remember to group by
46:32
and probably so you could be telling me that but it's really important and then we'll
46:37
do close five minute log make the difference so basically what
46:42
we're doing is we've got our log prices and then we're subtracting it from the prior values which gets us the change
46:49
and now the next thing we're going to do is we're going to drop that first column because obviously
46:55
in order to get the change we need a prior value and the first row did not have a prior value so we'll have a an a
47:00
in it right we'll do bars equal bars dot drop n a
47:07
subset equals our return log
47:13
just like before we obviously don't want to drop anything with an a in it we just want to drop
47:18
the bar return log which is just really the first row now the next thing we're going to do is we're going to sum up
47:26
um by date we're going to group by date we're going to sum the positions right
47:32
and the reason for that is we don't want to accidentally assume we have like a thousand x margin so
47:37
let's say we have a signal on multiple you know on the same bar for multiple assets well
47:43
we don't want to take the return um you know just simply add those
47:48
returns right because we only have so much capital so what we'll do to fix that is we'll simply say okay we'll take
47:54
the number of positions if there's four of them well then we'll then divide um
48:00
that return by four hopefully that makes sense i'm sure when we go through it it will be much more lucid we'll do bars
48:07
equals bars join ours.group by
48:13
date we want to group by the date i'm going to take the position so we're grouping by date and then position and
48:19
we're summing it remember when we don't have a position at zero when we do have a position it's one
48:25
then we'll join on equal date and we'll just name the r suffix
48:31
topics this x equals
48:40
suffix equals count okay hopefully that makes sense
48:45
and now we'll do the rs r which is lowercase for log return
48:52
equals bars r return log
48:57
times r bars bars
49:03
position divided by bars position
49:09
out right which we just created here that's why i did that at suffix so let's walk through that again so
49:16
we have a bar return if there is a bar position it will be one right
49:23
and then it will divide that our position
49:28
um by the position so we'll only get you know if there's if we have a one return
49:34
and we've got four positions it'll be 0.25 hopefully hopefully that makes sense actually it would because it's log
49:39
but oh yeah would anyways now that we've got our log return let's
49:45
add in our keys so bars b's equals
49:51
np dot where ours position is not equal to zero
50:00
then i hit enter and then this could give us the bars and the bar returns
50:11
now what we want to do is we want to see if what we
50:17
have is what we expected so let's see for any position
50:23
um where it's one and the position count is greater than zero so we'll see what i
50:28
mean here so let's just analyze our data frames we'll do bars position
50:34
is equal to one
50:42
and then check that return there have so it looks like we're good so so far so good
50:49
all right so now what we want to do is a new cell
50:56
and we want to group the bars into daily frequencies right so
51:03
whenever we're graphing if it's five minute it just we just basically want to
51:08
analyze returns by day okay so we'll create a new data frame performance
51:14
rs.group by and we've already sort of seen this code before we'll do
51:19
a list pd to group here level zero frequency
51:25
equals one day so we're resampling this into the daily frequency
51:30
now we'll do pd dot grouper because we need to make sure the ticker is group two
51:36
and then we'll do the aggregate and what we'll do is we'll say
51:41
the close five minute is the left
51:48
the r return log and this is why we use log prices is the sum that's where the magic
51:55
happens the log returns is um
52:01
and the fees also perfect
52:08
we've got that so now that we have that let's create a new column that will be
52:13
the returned plus the fee even though we know right now is a zero so rf returns
52:22
while returning the fee equal performance of the log returns plus the performance
52:29
so basically we're just adding the key and now what we want to do is we want
52:34
the total return so performance total return
52:40
is equal to the performance of the return of the fees
52:46
and the cumulative sum so you know again another reason why
52:52
the prices are awesome but now what we'll do is we'll hit enter hopefully i didn't make any mistakes at this point i
52:58
feel like i need more coffee oh
53:03
i just made another mistake at the time performance all right so awesome so now we've got
53:09
the returns by asset by day that make sense
53:16
okay cool so now what we want to do
53:21
is we want to create a benchmark and analyze it versus
53:27
the benchmark here okay so what we'll do is i'm going to create an if else
53:32
statement if there's only one asset then it'll just compare it versus
53:38
that asset if there's more than one asset we'll compare it versus the benchmark which will which we'll say is
53:44
btc this will allow you uh whenever you're going into this notebook if you want to just try it out
53:50
on one asset to play around uh that'll that'll that'll work so we'll do if length performance
53:58
index levels one dot unique
54:03
right so basically we're saying if the there's uh if the ticker is you you know it this
54:08
basically gets the unique tickers is greater than one but the benchmark equal to performance
54:16
top level zero comma one and we did this before but basically
54:21
we're just grabbing bitcoin uh you're sharing and then that's the benchmark
54:27
and then the performance is equal to performance dot group by
54:32
date dot lath drop columns if you don't need
54:40
clothes okay so let's walk through that again so the
54:47
performance group by date say last because we're using the cumulative sum over here all right so
54:53
now we'll do else again this is one of the uh the dangers and pitfalls of doing
54:59
live coding on youtube but hey that's what it is okay um
55:04
performance equal performance uh prop level one
55:10
so now we're going to drop that because we only have one asset right so that um
55:15
we don't need the ticker column because it's just btc and the benchmark is simply performance dot copy
55:23
awesome all right so now that we have you know our if else determining whether or not we have multiple assets and we've got
55:29
our benchmarks set up let's analyze our benchmark return so your benchmark
55:35
r is equal to benchmark
55:40
bar return log benchmark
55:45
dr equal to benchmark our
55:51
group sum and benchmark equals benchmark dot drop
55:59
columns for there and then we'll i'll put our benchmark
56:04
benchmark for any errors nope we got it okay
56:10
there's our benchmark so we see that you know what uh bitcoin is returning this is a
56:15
total cumulative return that's really what we're interested in so the first return should be r that
56:23
makes sense and the total return is
56:28
uh you know right there that should be the same right and then this return will be different
56:37
okay now let's see what we got let's see we probably
56:42
graph it right let's graph our strategies return versus bitcoin return oh my goodness
56:49
we're almost there all right so instead of using matplotlib i'll use
56:54
plotly but in most instances when i'm testing this stuff out i'll just use matplotlib but plotly is prettier out of
57:02
the box so we'll do import plotly.graph objects
57:08
object as go from plotly offline import iplot
57:17
init notebook mode init notebook
57:22
mode connected to go
57:29
and then fig equal go bigger right so now what we need to do is we
57:36
need to add the traces so we'll do fig dot add trace
57:41
get a scatter for the x is equal to the benchmark dot index
57:46
right and the y is equal to the benchmark total return again that's the
57:53
accumulated return right so that'll graph it appropriately for us and not just you know some type of data return
57:58
and then we'll name this benchmark
58:07
and now what we'll do is we'll do the same thing for our performance we'll do performance
58:12
dot index y is equal to performance and the name we'll call it strategy
58:20
all right and then let's make the figure a little bit bigger fig dot update
58:26
layout title equal get a dictionary we'll do text
58:32
equal strategy versus bitcoin do font
58:38
log font equal dict size equal 24
58:46
and legend goal
58:56
in the future remind me to um do this ahead of time and i'll just copy
59:02
and paste do you think that's a good idea let me know in the comments again any improvement this is like i
59:08
said this is my first video so any improvement um that you want to see just let me know
59:14
and uh i will make it x axis
59:19
into tick font so we can see a little better pick font equal dictionary value
59:25
20 color equal four three
59:30
four i'll do big dot update y axes
59:37
click click font okay not my only typo
59:44
i 20 color equal and i probably should just copy and paste that again
59:49
one more cup of coffee would have been key big dot show all right i'm going to hit enter and now
59:56
we're going to see how our strategy formed against the benchmark hopefully
1:00:01
after all this work we can buy an island or correct our mistake so hit number connect
1:00:07
hey
1:00:14
connected is what it should have been all right
1:00:20
look at that return but remember these are logarithmic returns and not
1:00:26
arithmetic returns which means we're compounding at an astronomical rate okay let's just see
1:00:32
you know let's see you know what our returns are right so we'll use
1:00:37
our simple returns we'll do round benchmark high lock negative one
1:00:45
and we'll take the close so we're taking the last price right divided by
1:00:52
the first price benchmark and there's that's not the the log prices but by
1:00:58
lock zero close five minutes so this takes our first
1:01:03
five minute bar price for bitcoin divided by
1:01:08
so our last five minute divided by our first five minute we subtract one
1:01:13
and we'll round it by two decimal points and that is 1.62 so that's a
1:01:19
serious return now if you're like how do i convert from log prices to simple you know to um you know
1:01:28
standard prices and vice versa um it's pretty easy so we'll do the same
1:01:34
thing we'll do round np.exp benchmark dot i lock negative one so
1:01:40
that's our last price but this time we're going to take our total return right remember that was the cumulative value
1:01:46
and then we're going to minus times mark
1:01:52
dot i lock zero r
1:01:59
negative one divided by two and hit enter i should get the same
1:02:04
thing perfect so we converted exponential uh we converted
1:02:11
logarithmic by using the exp function which just simply converts it back to
1:02:17
what we're used to all right so now what we want to do is let's see what the return is for
1:02:23
our strategy so we'll do round np.exp same thing right so we can you know
1:02:28
compare where you know i wanted to show you how to convert to you know us using a simple return what
1:02:35
our returns were for a benchmark now we're going to convert our performance of our strategy
1:02:42
using that logarithm returns back into something that we can compare with our
1:02:47
benchmark so we'll do performance dot i lock minus one
1:02:53
dr minus one and
1:03:01
one two this is our strategy return okay obviously
1:03:07
we are now quad billionaires or whatever so what's the catch right um so
1:03:14
obviously our strategy is awesome and maybe i made some programming mistakes you know i definitely wouldn't doubt it
1:03:20
um doing it live but where's the issue well here's where the issue is let's go
1:03:27
back to that fees and instead we'll use um
1:03:32
an actual c that that might be appropriate we'll do b
1:03:38
equals mp.log 0.0018 so we're taking
1:03:43
you know that zero zero one eight or you know point eighteen percent
1:03:49
um which is a standard fee uh you know even though interactive broker just came out with crypto trading i'm not 100 sure
1:03:55
what that fee is yet but uh point is let's see how we look
1:04:00
using this feed it's going to run all of these cells one
1:04:06
more time and see hopefully with the fee
1:04:12
added in we still can buy that island again we're not considering slippage or
1:04:17
anything else too right so but it's really important whenever you're modeling this stuff to
1:04:23
you know include this type of stuff and even then you still want to live trade or i'm sorry
1:04:30
not live trade paper trade before you live trade oh what did i do here
1:04:36
oh yep all right let me just run the whole thing again restart and run all that'll fix i
1:04:41
already modified the data frame so through the magic of the internet and
1:04:47
editing i'm going to fast forward to see what our results are i'll see you in a
1:04:52
second and that's where we see what really happens when we add fees and
1:04:58
slippage in the mixing and actually in this case just fees but uh it's funny because i was just reading uh read it
1:05:05
last night on the agri-trading board and someone just said that they created
1:05:10
this strategy it was supposed to print them money they're going to be able to buy an island and then it just burns
1:05:17
cash well that's what happens when you don't model fees and slippage in you have to be really careful uh
1:05:24
whenever you're you're prototyping this stuff but uh you know i'd always say never
1:05:30
actually take an algorithm from back testing into live production always
1:05:35
always always always test on your paper before you earn something
1:05:41
live but i don't know about you but i'm exhausted that was a ton of info and i'm
1:05:46
pretty sure we've been over that 45 minute mark i guess next time i'm gonna need some more coffee so i'd like to
1:05:52
thank you for joining me today and you know i hope you found value in this video i think we did some pretty cool
1:05:58
stuff and i'd also like to ask you a favor if you could hit that like button it lets you the google algorithm know
1:06:04
that this video was valuable to you and the second thing i'd like to ask is that you know when creating this video i was
1:06:10
trying to think what would i want to learn when first embarking on my data slide slash algorithmic trading journey
1:06:16
years ago and you know instead of me guessing why don't you let me know what you'd like to learn about in the
1:06:22
comments below again i'd like to thank you and i can't wait to see in the next one
#Business & Industrial
#Computers & Electronics
#Finance
#Commodities & Futures Trading
#Currencies & Foreign Exchange
