Hacking MySQL for Big Data | Software Architecture Conference
0 views
Aug 6, 2025
MySQL is a powerful, but a very complex beast. In this talk, we will go through the things that make MySQL & its counterparts capable of performing well on bigger data sets: you will walk away knowing how to push your MySQL instances to the maximum when dealing with millions or even billions of rows, we will also discover some fascinating things related to MySQL as a whole and discuss very specific bugs related to MySQL. 🔗 Conference Website: https://softwarearchitecture.live 📺 CSharp TV - Dev Streaming Destination http://csharp.tv 🌎 C# Corner - Community of Software and Data Developers https://www.c-sharpcorner.com #CSharpTV #CSharpCorner #CSharp #SoftwareArchitectureConf
View Video Transcript
0:03
uh we are talking about hacking MySQL
0:05
uh we are talking about hacking MySQL
0:05
uh we are talking about hacking MySQL for big data and how to use your
0:07
for big data and how to use your
0:07
for big data and how to use your database to basically work with bigger
0:11
database to basically work with bigger
0:11
database to basically work with bigger data sets right so what we already
0:13
data sets right so what we already
0:13
data sets right so what we already covered in the beginning was uh what is
0:17
covered in the beginning was uh what is
0:17
covered in the beginning was uh what is Big Data and why you should choose MySQL
0:19
Big Data and why you should choose MySQL
0:19
Big Data and why you should choose MySQL when working with big data what we will
0:21
when working with big data what we will
0:22
when working with big data what we will also cover is why would you choose my S
0:24
also cover is why would you choose my S
0:24
also cover is why would you choose my S for big data and things you should know
0:28
for big data and things you should know
0:28
for big data and things you should know right so as I said I'm a security and
0:30
right so as I said I'm a security and
0:30
right so as I said I'm a security and database guy I run a datab bre search
0:32
database guy I run a datab bre search
0:32
database guy I run a datab bre search engine at database di.com I run a
0:35
engine at database di.com I run a
0:35
engine at database di.com I run a YouTube
0:37
YouTube
0:37
YouTube channel uh and I also wrote a book uh
0:40
channel uh and I also wrote a book uh
0:40
channel uh and I also wrote a book uh for aess titled hacking MySQL I'm also a
0:44
for aess titled hacking MySQL I'm also a
0:44
for aess titled hacking MySQL I'm also a speaker at multiple conferences um
0:46
speaker at multiple conferences um
0:46
speaker at multiple conferences um including this one right so uh we
0:49
including this one right so uh we
0:49
including this one right so uh we already covered a little bit of big data
0:51
already covered a little bit of big data
0:51
already covered a little bit of big data but as I said developers each developer
0:55
but as I said developers each developer
0:55
but as I said developers each developer understands Big Data differently right
0:57
understands Big Data differently right
0:57
understands Big Data differently right so for some big data is 100 Mill rows
1:00
so for some big data is 100 Mill rows
1:00
so for some big data is 100 Mill rows for some uh it is 1 billion rows and
1:03
for some uh it is 1 billion rows and
1:03
for some uh it is 1 billion rows and above but for our use case where're as I
1:06
above but for our use case where're as I
1:06
above but for our use case where're as I said we're talking about anything with
1:09
said we're talking about anything with
1:09
said we're talking about anything with more or equal to 100 million rows and
1:12
more or equal to 100 million rows and
1:12
more or equal to 100 million rows and you can you should also consider your
1:14
you can you should also consider your
1:14
you can you should also consider your use case as I said
1:16
use case as I said
1:16
use case as I said above as I said
1:18
above as I said
1:18
above as I said earlier now why MySQL we already I
1:22
earlier now why MySQL we already I
1:23
earlier now why MySQL we already I started talking about this and then I
1:26
started talking about this and then I
1:26
started talking about this and then I had to change to Google Drive so I'll
1:29
had to change to Google Drive so I'll
1:29
had to change to Google Drive so I'll repeat it again MySQL is not only MySQL
1:31
repeat it again MySQL is not only MySQL
1:31
repeat it again MySQL is not only MySQL MySQL has three flavors primarily it's
1:34
MySQL has three flavors primarily it's
1:34
MySQL has three flavors primarily it's MySQL server perona server and also
1:36
MySQL server perona server and also
1:36
MySQL server perona server and also Maria DB each of those database
1:39
Maria DB each of those database
1:39
Maria DB each of those database Management Systems have their own unique
1:41
Management Systems have their own unique
1:41
Management Systems have their own unique sets of cap uh unique sets and
1:44
sets of cap uh unique sets and
1:45
sets of cap uh unique sets and um capabilities including storage
1:47
um capabilities including storage
1:47
um capabilities including storage engines for example extra DB will Al
1:50
engines for example extra DB will Al
1:51
engines for example extra DB will Al only be available on per Corner server
1:52
only be available on per Corner server
1:52
only be available on per Corner server generally INB will be available on all
1:55
generally INB will be available on all
1:55
generally INB will be available on all three and TB will be available on perona
1:58
three and TB will be available on perona
1:58
three and TB will be available on perona server and marad B most likely
2:00
server and marad B most likely
2:00
server and marad B most likely myql as I said also supports acid which
2:03
myql as I said also supports acid which
2:03
myql as I said also supports acid which means that basically if your electricity
2:05
means that basically if your electricity
2:05
means that basically if your electricity goes out your data doesn't get corrupt
2:08
goes out your data doesn't get corrupt
2:08
goes out your data doesn't get corrupt and basically usage of MySQL or no usage
2:12
and basically usage of MySQL or no usage
2:12
and basically usage of MySQL or no usage of my SQL it depends on your use case
2:15
of my SQL it depends on your use case
2:15
of my SQL it depends on your use case right so here's we we got stuck so uh in
2:19
right so here's we we got stuck so uh in
2:19
right so here's we we got stuck so uh in essence my SQL has a couple of Parts it
2:21
essence my SQL has a couple of Parts it
2:21
essence my SQL has a couple of Parts it um consists of the file system storage
2:25
um consists of the file system storage
2:25
um consists of the file system storage engines it consists of a parser the
2:28
engines it consists of a parser the
2:28
engines it consists of a parser the optimizer and the SQL interface it has
2:31
optimizer and the SQL interface it has
2:31
optimizer and the SQL interface it has hashes and buffers which can either be
2:33
hashes and buffers which can either be
2:33
hashes and buffers which can either be Global or engine specific and it also
2:37
Global or engine specific and it also
2:37
Global or engine specific and it also has a connection pool and it also has
2:39
has a connection pool and it also has
2:39
has a connection pool and it also has connectors which means the ways you
2:41
connectors which means the ways you
2:41
connectors which means the ways you connect to your database you can connect
2:43
connect to your database you can connect
2:43
connect to your database you can connect through C through python through PHP or
2:45
through C through python through PHP or
2:46
through C through python through PHP or through other measures the reason why it
2:48
through other measures the reason why it
2:48
through other measures the reason why it is so important is because the structure
2:52
is so important is because the structure
2:52
is so important is because the structure of the database it's often the first
2:55
of the database it's often the first
2:55
of the database it's often the first thing that you will get uh you will get
2:59
thing that you will get uh you will get
2:59
thing that you will get uh you will get stuck with when dealing with bigger data
3:02
stuck with when dealing with bigger data
3:02
stuck with when dealing with bigger data sets the thing you have to understand is
3:04
sets the thing you have to understand is
3:04
sets the thing you have to understand is that for example mysql's primary storage
3:07
that for example mysql's primary storage
3:07
that for example mysql's primary storage engine that is in a DB it stores all of
3:10
engine that is in a DB it stores all of
3:10
engine that is in a DB it stores all of its data in one file that is IB data 1
3:13
its data in one file that is IB data 1
3:13
its data in one file that is IB data 1 IB data one stores data indexes and all
3:16
IB data one stores data indexes and all
3:16
IB data one stores data indexes and all of the other metadata newer versions of
3:19
of the other metadata newer versions of
3:19
of the other metadata newer versions of myql mind you split the file the files
3:24
myql mind you split the file the files
3:24
myql mind you split the file the files that are in your database and the data
3:27
that are in your database and the data
3:27
that are in your database and the data that is in the file the ibata one file
3:30
that is in the file the ibata one file
3:30
that is in the file the ibata one file because in Old the versions of my
3:32
because in Old the versions of my
3:32
because in Old the versions of my SQL it um wasn't possible to basically
3:37
SQL it um wasn't possible to basically
3:37
SQL it um wasn't possible to basically deduct the uh if you would drop your
3:41
deduct the uh if you would drop your
3:41
deduct the uh if you would drop your table in your database your data would
3:44
table in your database your data would
3:44
table in your database your data would still reside in the IB data one file and
3:46
still reside in the IB data one file and
3:46
still reside in the IB data one file and the only way uh for you to get rid of
3:49
the only way uh for you to get rid of
3:49
the only way uh for you to get rid of the IB data one file would be to delete
3:51
the IB data one file would be to delete
3:51
the IB data one file would be to delete it but if you deleted it all of your
3:53
it but if you deleted it all of your
3:53
it but if you deleted it all of your data would be gone so newer versions of
3:55
data would be gone so newer versions of
3:55
data would be gone so newer versions of my SQL overcome this problem but anyway
4:00
my SQL overcome this problem but anyway
4:00
my SQL overcome this problem but anyway there's a couple of more things that you
4:02
there's a couple of more things that you
4:02
there's a couple of more things that you should understand when dealing with
4:04
should understand when dealing with
4:04
should understand when dealing with bigger data sets and the primary thing
4:06
bigger data sets and the primary thing
4:06
bigger data sets and the primary thing is your use case what I show here uh for
4:09
is your use case what I show here uh for
4:09
is your use case what I show here uh for those who are not
4:11
those who are not
4:11
those who are not really not really uh aware of this it's
4:15
really not really uh aware of this it's
4:15
really not really uh aware of this it's my data bre search engine available at
4:17
my data bre search engine available at
4:17
my data bre search engine available at breach directory.com uh and I'm using
4:20
breach directory.com uh and I'm using
4:20
breach directory.com uh and I'm using Maria DB to facilitate searches through
4:23
Maria DB to facilitate searches through
4:23
Maria DB to facilitate searches through basically through data using the search
4:25
basically through data using the search
4:25
basically through data using the search engine and if you input your email
4:27
engine and if you input your email
4:27
engine and if you input your email username IP address or a domain
4:30
username IP address or a domain
4:30
username IP address or a domain you can basically get a message whether
4:33
you can basically get a message whether
4:33
you can basically get a message whether your data is at risk or not or whether
4:36
your data is at risk or not or whether
4:36
your data is at risk or not or whether you should change your passwords or not
4:39
you should change your passwords or not
4:39
you should change your passwords or not basically basically it tells you whether
4:41
basically basically it tells you whether
4:41
basically basically it tells you whether have you been hacked or not so I highly
4:43
have you been hacked or not so I highly
4:43
have you been hacked or not so I highly recommend you using this I have um a lot
4:47
recommend you using this I have um a lot
4:47
recommend you using this I have um a lot of wellknown people universities
4:50
of wellknown people universities
4:50
of wellknown people universities companies and so on using this so it is
4:52
companies and so on using this so it is
4:52
companies and so on using this so it is a reputable service and I'm very glad I
4:54
a reputable service and I'm very glad I
4:54
a reputable service and I'm very glad I made it right so when uh dealing with
4:58
made it right so when uh dealing with
4:59
made it right so when uh dealing with bigger data sets most likely you will
5:02
bigger data sets most likely you will
5:02
bigger data sets most likely you will need to run a couple of queries through
5:05
need to run a couple of queries through
5:05
need to run a couple of queries through your your database right so uh the fact
5:09
your your database right so uh the fact
5:09
your your database right so uh the fact that you need to run queries through
5:11
that you need to run queries through
5:11
that you need to run queries through bigger data sets most likely comes
5:13
bigger data sets most likely comes
5:13
bigger data sets most likely comes through a through a need uh to Loop
5:16
through a through a need uh to Loop
5:16
through a through a need uh to Loop through something and then run some
5:19
through something and then run some
5:19
through something and then run some kinds of queries and then return results
5:21
kinds of queries and then return results
5:21
kinds of queries and then return results to a user right so what you see now is
5:25
to a user right so what you see now is
5:25
to a user right so what you see now is essentially a loop that Loops through
5:27
essentially a loop that Loops through
5:27
essentially a loop that Loops through all of the tables in the database
5:30
all of the tables in the database
5:30
all of the tables in the database puts the table name into a variable name
5:32
puts the table name into a variable name
5:32
puts the table name into a variable name and then basically uses a prepared
5:36
and then basically uses a prepared
5:36
and then basically uses a prepared statement to Loop through all of the
5:38
statement to Loop through all of the
5:38
statement to Loop through all of the available tables where a specific search
5:42
available tables where a specific search
5:42
available tables where a specific search type is equal to a search query provided
5:44
type is equal to a search query provided
5:44
type is equal to a search query provided by the user the important thing here is
5:47
by the user the important thing here is
5:47
by the user the important thing here is that the search query and the rest of
5:50
that the search query and the rest of
5:50
that the search query and the rest of the query are separated so there's no
5:53
the query are separated so there's no
5:53
the query are separated so there's no SQL injection which is a primary
5:55
SQL injection which is a primary
5:55
SQL injection which is a primary objective of prepared
5:57
objective of prepared
5:57
objective of prepared statements and then we we grab it and
6:01
statements and then we we grab it and
6:01
statements and then we we grab it and put everything into a results query as
6:03
put everything into a results query as
6:03
put everything into a results query as you can see here if it's visible and
6:05
you can see here if it's visible and
6:05
you can see here if it's visible and then we do whatever else we want to
6:09
then we do whatever else we want to
6:09
then we do whatever else we want to do the most likely scenario will be
6:12
do the most likely scenario will be
6:12
do the most likely scenario will be something like this but for this
6:14
something like this but for this
6:14
something like this but for this scenario to be true you would have to
6:16
scenario to be true you would have to
6:16
scenario to be true you would have to have the first thing data your data will
6:19
have the first thing data your data will
6:19
have the first thing data your data will be of some class so if you're dealing
6:22
be of some class so if you're dealing
6:22
be of some class so if you're dealing with data breaches it will be most
6:24
with data breaches it will be most
6:24
with data breaches it will be most likely more sensitive data if you're
6:26
likely more sensitive data if you're
6:26
likely more sensitive data if you're dealing with a I don't know banana
6:28
dealing with a I don't know banana
6:28
dealing with a I don't know banana Factory most likely you're storing types
6:31
Factory most likely you're storing types
6:31
Factory most likely you're storing types of bananas or something there right
6:33
of bananas or something there right
6:33
of bananas or something there right regardless of the data class you will
6:35
regardless of the data class you will
6:35
regardless of the data class you will have some sort of data and you will have
6:37
have some sort of data and you will have
6:37
have some sort of data and you will have a lot of the data right next thing
6:39
a lot of the data right next thing
6:39
a lot of the data right next thing you're going to have is a database
6:41
you're going to have is a database
6:41
you're going to have is a database management
6:42
management
6:42
management system and uh another thing is a VPS
6:47
system and uh another thing is a VPS
6:47
system and uh another thing is a VPS that you store the data uh data in and
6:51
that you store the data uh data in and
6:51
that you store the data uh data in and you're most likely going to upgrade a
6:53
you're most likely going to upgrade a
6:53
you're most likely going to upgrade a dedicated server when the time comes uh
6:56
dedicated server when the time comes uh
6:56
dedicated server when the time comes uh because your data will be bigger than
6:59
because your data will be bigger than
6:59
because your data will be bigger than usual most likely and VPS won't be able
7:02
usual most likely and VPS won't be able
7:02
usual most likely and VPS won't be able to deal with it anymore that works when
7:05
to deal with it anymore that works when
7:05
to deal with it anymore that works when you um have more than five six billion
7:09
you um have more than five six billion
7:09
you um have more than five six billion records like something around that range
7:12
records like something around that range
7:12
records like something around that range if you're dealing with such big data
7:15
if you're dealing with such big data
7:15
if you're dealing with such big data sets you're inevitably going to need a
7:17
sets you're inevitably going to need a
7:17
sets you're inevitably going to need a backup plan as well
7:19
backup plan as well
7:19
backup plan as well so a backup plan to big back up big data
7:23
so a backup plan to big back up big data
7:23
so a backup plan to big back up big data sets is also a very important option and
7:26
sets is also a very important option and
7:26
sets is also a very important option and we're going to talk about it in a second
7:28
we're going to talk about it in a second
7:28
we're going to talk about it in a second and you're going to also o need luck
7:30
and you're going to also o need luck
7:30
and you're going to also o need luck depending on what kind of services you
7:31
depending on what kind of services you
7:31
depending on what kind of services you run you you are going to meet some
7:34
run you you are going to meet some
7:34
run you you are going to meet some people that are probably not having your
7:37
people that are probably not having your
7:37
people that are probably not having your best interests in mind or just having
7:40
best interests in mind or just having
7:40
best interests in mind or just having issues with the hosting provider that
7:42
issues with the hosting provider that
7:42
issues with the hosting provider that you need to know how to fix and stuff
7:44
you need to know how to fix and stuff
7:44
you need to know how to fix and stuff like that
7:46
like that
7:46
like that so here we come to your server the
7:50
so here we come to your server the
7:50
so here we come to your server the slides right here a little bit off since
7:53
slides right here a little bit off since
7:53
slides right here a little bit off since it's a Google slide but it's okay here
7:55
it's a Google slide but it's okay here
7:55
it's a Google slide but it's okay here we come to your server right so for your
7:57
we come to your server right so for your
7:57
we come to your server right so for your server you would need to optim
8:00
server you would need to optim
8:00
server you would need to optim a couple of parameters to optimize Maria
8:02
a couple of parameters to optimize Maria
8:02
a couple of parameters to optimize Maria DB MySQL or pron server these parameters
8:06
DB MySQL or pron server these parameters
8:06
DB MySQL or pron server these parameters would most likely be in know the be
8:07
would most likely be in know the be
8:08
would most likely be in know the be buffer pool size in the be log file size
8:10
buffer pool size in the be log file size
8:10
buffer pool size in the be log file size flushing log at transaction commit and a
8:13
flushing log at transaction commit and a
8:13
flushing log at transaction commit and a couple of other parameters if necessary
8:16
couple of other parameters if necessary
8:16
couple of other parameters if necessary these can be found at the documentation
8:18
these can be found at the documentation
8:18
these can be found at the documentation what these parameters do is the buffer
8:21
what these parameters do is the buffer
8:21
what these parameters do is the buffer pool size found
8:22
pool size found
8:22
pool size found here it makes your insert queries faster
8:26
here it makes your insert queries faster
8:26
here it makes your insert queries faster when you insert a lot of big data right
8:28
when you insert a lot of big data right
8:28
when you insert a lot of big data right because it makes MySQL use be able to
8:32
because it makes MySQL use be able to
8:32
because it makes MySQL use be able to use more operating memory when inserting
8:34
use more operating memory when inserting
8:34
use more operating memory when inserting data which in turn makes your insert
8:37
data which in turn makes your insert
8:37
data which in turn makes your insert queries faster now you don't need to
8:39
queries faster now you don't need to
8:39
queries faster now you don't need to have 64 GB of RAM on a server to make
8:42
have 64 GB of RAM on a server to make
8:43
have 64 GB of RAM on a server to make your database and application work but
8:46
your database and application work but
8:46
your database and application work but it's recommended that you set this
8:48
it's recommended that you set this
8:48
it's recommended that you set this parameter to at least 60% of available
8:51
parameter to at least 60% of available
8:51
parameter to at least 60% of available memory on your server next we have the
8:53
memory on your server next we have the
8:53
memory on your server next we have the loog files the loog files are necessary
8:56
loog files the loog files are necessary
8:56
loog files the loog files are necessary to restore your database from a crash
8:58
to restore your database from a crash
8:58
to restore your database from a crash when storing a database from a crash the
9:01
when storing a database from a crash the
9:01
when storing a database from a crash the database reads through log
9:03
database reads through log
9:03
database reads through log files and uh basically the the the
9:08
files and uh basically the the the
9:08
files and uh basically the the the bigger they
9:09
bigger they
9:10
bigger they are the the faster your database will
9:14
are the the faster your database will
9:14
are the the faster your database will get restored from a crash so it's
9:16
get restored from a crash so it's
9:16
get restored from a crash so it's essentially recommended to put a quarter
9:19
essentially recommended to put a quarter
9:19
essentially recommended to put a quarter of the Buffa pool value that you put
9:21
of the Buffa pool value that you put
9:21
of the Buffa pool value that you put here into the log file size value the
9:24
here into the log file size value the
9:24
here into the log file size value the flush log a transaction commit variable
9:26
flush log a transaction commit variable
9:26
flush log a transaction commit variable controls the balance between ASD and
9:29
controls the balance between ASD and
9:29
controls the balance between ASD and faster query insert speeds right ASD it
9:32
faster query insert speeds right ASD it
9:32
faster query insert speeds right ASD it means that basically your database
9:34
means that basically your database
9:34
means that basically your database remains integral your data remains
9:37
remains integral your data remains
9:37
remains integral your data remains integral uh despite any errors um power
9:41
integral uh despite any errors um power
9:42
integral uh despite any errors um power outages or whatever but it can be turned
9:44
outages or whatever but it can be turned
9:44
outages or whatever but it can be turned off it can be turned off by setting the
9:47
off it can be turned off by setting the
9:47
off it can be turned off by setting the the value of this variable by zero or
9:50
the value of this variable by zero or
9:50
the value of this variable by zero or two instead of the default value one
9:53
two instead of the default value one
9:53
two instead of the default value one which makes your database AET compliant
9:55
which makes your database AET compliant
9:55
which makes your database AET compliant these three parameters are most likely
9:57
these three parameters are most likely
9:58
these three parameters are most likely everything you will need to optim when
9:59
everything you will need to optim when
9:59
everything you will need to optim when working with INB this are the most
10:01
working with INB this are the most
10:01
working with INB this are the most important ones some of the other
10:03
important ones some of the other
10:04
important ones some of the other parameters exist but these can be found
10:06
parameters exist but these can be found
10:06
parameters exist but these can be found in documentation because there are so
10:07
in documentation because there are so
10:07
in documentation because there are so many I will not be able to cover all of
10:11
many I will not be able to cover all of
10:11
many I will not be able to cover all of them aside from parameters you will also
10:14
them aside from parameters you will also
10:14
them aside from parameters you will also need to deal with indexes in your
10:17
need to deal with indexes in your
10:17
need to deal with indexes in your database right indexes are a very
10:20
database right indexes are a very
10:20
database right indexes are a very important thing uh in databases because
10:23
important thing uh in databases because
10:23
important thing uh in databases because indexes make select queries faster which
10:26
indexes make select queries faster which
10:26
indexes make select queries faster which means that if the primary use case of
10:28
means that if the primary use case of
10:28
means that if the primary use case of your application
10:30
your application
10:30
your application is a read use case which means that if
10:33
is a read use case which means that if
10:33
is a read use case which means that if if the primary use case is a search
10:35
if the primary use case is a search
10:35
if the primary use case is a search engine or something like that you will
10:37
engine or something like that you will
10:37
engine or something like that you will probably going to need to make use of
10:39
probably going to need to make use of
10:39
probably going to need to make use of indexing or partitioning or both we're
10:42
indexing or partitioning or both we're
10:42
indexing or partitioning or both we're going to cover this in a second right so
10:45
going to cover this in a second right so
10:45
going to cover this in a second right so how do you decide when and what to Index
10:48
how do you decide when and what to Index
10:48
how do you decide when and what to Index this is the most important question
10:50
this is the most important question
10:50
this is the most important question probably that all of you are going to
10:51
probably that all of you are going to
10:51
probably that all of you are going to have and the answers to this are
10:56
have and the answers to this are
10:56
have and the answers to this are very very simple you basically basically
11:00
very very simple you basically basically
11:00
very very simple you basically basically um index select operations on bigger
11:02
um index select operations on bigger
11:03
um index select operations on bigger data sets you index columns that you run
11:05
data sets you index columns that you run
11:05
data sets you index columns that you run select operations on because columns are
11:08
select operations on because columns are
11:08
select operations on because columns are the things that are going to be used by
11:10
the things that are going to be used by
11:10
the things that are going to be used by a query and that answers the second
11:14
a query and that answers the second
11:14
a query and that answers the second question as well you need to in on you
11:16
question as well you need to in on you
11:16
question as well you need to in on you need to index necessary columns by the
11:19
need to index necessary columns by the
11:19
need to index necessary columns by the time you are running select operations
11:21
time you are running select operations
11:21
time you are running select operations on bigger data sets bigger data sets
11:24
on bigger data sets bigger data sets
11:24
on bigger data sets bigger data sets with indexes will make select queries
11:27
with indexes will make select queries
11:27
with indexes will make select queries faster at the expense of making
11:29
faster at the expense of making
11:30
faster at the expense of making um updates deletes and inserts slower
11:33
um updates deletes and inserts slower
11:33
um updates deletes and inserts slower because data has to be inserted updated
11:35
because data has to be inserted updated
11:35
because data has to be inserted updated or deleted together with the index as
11:38
or deleted together with the index as
11:38
or deleted together with the index as well and this is why the index um takes
11:41
well and this is why the index um takes
11:41
well and this is why the index um takes up space on the disk because it's a
11:43
up space on the disk because it's a
11:43
up space on the disk because it's a separate data structure as
11:45
separate data structure as
11:45
separate data structure as well right so uh Google Slides doesn't
11:50
well right so uh Google Slides doesn't
11:50
well right so uh Google Slides doesn't show the structure of the indexes here
11:53
show the structure of the indexes here
11:53
show the structure of the indexes here but what this slide is supposed to
11:55
but what this slide is supposed to
11:55
but what this slide is supposed to depict is basically here you have your
11:57
depict is basically here you have your
11:57
depict is basically here you have your you would have your SQL qu query and
12:00
you would have your SQL qu query and
12:00
you would have your SQL qu query and here your database would essentially
12:05
here your database would essentially
12:05
here your database would essentially consider your index what kind of the
12:07
consider your index what kind of the
12:07
consider your index what kind of the index it h it has to use and whether it
12:11
index it h it has to use and whether it
12:11
index it h it has to use and whether it it is a covering index or not covering
12:14
it is a covering index or not covering
12:14
it is a covering index or not covering index means that your database can read
12:16
index means that your database can read
12:16
index means that your database can read data straight from the index instead of
12:19
data straight from the index instead of
12:19
data straight from the index instead of reading the data from the disk instead
12:22
reading the data from the disk instead
12:22
reading the data from the disk instead of accessing the dis and here basically
12:26
of accessing the dis and here basically
12:26
of accessing the dis and here basically it denotes uh a couple of steps your
12:29
it denotes uh a couple of steps your
12:29
it denotes uh a couple of steps your database
12:30
database
12:30
database takes that makes it decide what index to
12:33
takes that makes it decide what index to
12:33
takes that makes it decide what index to use what best what index is best to use
12:36
use what best what index is best to use
12:36
use what best what index is best to use because if your query is selecting two
12:40
because if your query is selecting two
12:40
because if your query is selecting two columns or all columns after the work
12:42
columns or all columns after the work
12:42
columns or all columns after the work Clause your database has to make a
12:45
Clause your database has to make a
12:45
Clause your database has to make a couple of decisions which are depicted
12:47
couple of decisions which are depicted
12:47
couple of decisions which are depicted here to choose
12:49
here to choose
12:49
here to choose a proper index now Google Slides
12:53
a proper index now Google Slides
12:53
a proper index now Google Slides unfortunately doesn't show everything
12:55
unfortunately doesn't show everything
12:55
unfortunately doesn't show everything that is um supposed to be depicted here
12:58
that is um supposed to be depicted here
12:58
that is um supposed to be depicted here but think of this as your database
13:01
but think of this as your database
13:01
but think of this as your database deciding what kind of index it should
13:03
deciding what kind of index it should
13:03
deciding what kind of index it should use the kind of index your database
13:06
use the kind of index your database
13:07
use the kind of index your database should use is can be accessed by running
13:10
should use is can be accessed by running
13:10
should use is can be accessed by running an explain query in front of any of your
13:13
an explain query in front of any of your
13:13
an explain query in front of any of your database queries so if you run select
13:15
database queries so if you run select
13:15
database queries so if you run select queries you just put explain in front of
13:18
queries you just put explain in front of
13:18
queries you just put explain in front of the select query and your database will
13:20
the select query and your database will
13:20
the select query and your database will explain all of those steps that it takes
13:23
explain all of those steps that it takes
13:23
explain all of those steps that it takes to reach um to basically to choose the
13:26
to reach um to basically to choose the
13:26
to reach um to basically to choose the perfect index it should use if it does
13:29
perfect index it should use if it does
13:29
perfect index it should use if it does not uh choose the index that you want to
13:32
not uh choose the index that you want to
13:32
not uh choose the index that you want to use you would have to modify the query
13:35
use you would have to modify the query
13:35
use you would have to modify the query the rules here are very simple your
13:37
the rules here are very simple your
13:37
the rules here are very simple your database will choose the index based on
13:40
database will choose the index based on
13:40
database will choose the index based on what Clauses what columns are selected
13:43
what Clauses what columns are selected
13:43
what Clauses what columns are selected after the work Clause so if your query
13:46
after the work Clause so if your query
13:46
after the work Clause so if your query is
13:48
is
13:48
is Select ID from users where email equals
13:53
Select ID from users where email equals
13:53
Select ID from users where email equals something the index that will be
13:56
something the index that will be
13:56
something the index that will be considered for use will be on the email
13:58
considered for use will be on the email
13:58
considered for use will be on the email column so this is how you should think
14:00
column so this is how you should think
14:00
column so this is how you should think about that indexes actually deserve a
14:02
about that indexes actually deserve a
14:02
about that indexes actually deserve a book of their own so I'm not going to
14:05
book of their own so I'm not going to
14:05
book of their own so I'm not going to going to get um too much in depth about
14:07
going to get um too much in depth about
14:07
going to get um too much in depth about them but this is how they are next thing
14:11
them but this is how they are next thing
14:11
them but this is how they are next thing um that you should keep in mind is your
14:13
um that you should keep in mind is your
14:13
um that you should keep in mind is your backups and your backup internals um the
14:16
backups and your backup internals um the
14:16
backups and your backup internals um the way you probably backup data by now is
14:18
way you probably backup data by now is
14:18
way you probably backup data by now is by taking logical backups like this what
14:20
by taking logical backups like this what
14:20
by taking logical backups like this what this depicts is probably letters are
14:23
this depicts is probably letters are
14:23
this depicts is probably letters are very small but what this depicts is
14:25
very small but what this depicts is
14:25
very small but what this depicts is creation of
14:26
creation of
14:26
creation of table which with the structure of table
14:29
table which with the structure of table
14:29
table which with the structure of table table as it is and then inserting the
14:31
table as it is and then inserting the
14:31
table as it is and then inserting the data into the table using insert
14:33
data into the table using insert
14:33
data into the table using insert statements as you can probably a little
14:35
statements as you can probably a little
14:35
statements as you can probably a little bit see right
14:38
bit see right
14:38
bit see right here so um what this does is basically
14:43
here so um what this does is basically
14:43
here so um what this does is basically this is called a logical backup and why
14:45
this is called a logical backup and why
14:45
this is called a logical backup and why is this called a logical backup is
14:47
is this called a logical backup is
14:47
is this called a logical backup is because your database backs up logical
14:50
because your database backs up logical
14:50
because your database backs up logical SQL statements that is SQL statements
14:52
SQL statements that is SQL statements
14:52
SQL statements that is SQL statements that recreate the data which is what is
14:55
that recreate the data which is what is
14:55
that recreate the data which is what is being done here but if you have a lot of
14:58
being done here but if you have a lot of
14:58
being done here but if you have a lot of data you would would
14:59
data you would would
15:00
data you would would probably uh think or see that these
15:03
probably uh think or see that these
15:03
probably uh think or see that these types of queries are a little bit slow
15:06
types of queries are a little bit slow
15:06
types of queries are a little bit slow and these types of queries are a little
15:07
and these types of queries are a little
15:07
and these types of queries are a little bit slow because each insert query that
15:10
bit slow because each insert query that
15:10
bit slow because each insert query that you see here this this is one insert
15:13
you see here this this is one insert
15:13
you see here this this is one insert query because it's in bulk but it still
15:16
query because it's in bulk but it still
15:16
query because it's in bulk but it still be rather slow because you have um a lot
15:19
be rather slow because you have um a lot
15:19
be rather slow because you have um a lot of data to insert as you can see you
15:21
of data to insert as you can see you
15:21
of data to insert as you can see you have multiple rows and each of those
15:23
have multiple rows and each of those
15:23
have multiple rows and each of those rows that are inserted each insert
15:25
rows that are inserted each insert
15:25
rows that are inserted each insert statement comes with a little bit of
15:27
statement comes with a little bit of
15:27
statement comes with a little bit of overhead so this is what each insert
15:29
overhead so this is what each insert
15:30
overhead so this is what each insert statement does so it has to start it has
15:32
statement does so it has to start it has
15:32
statement does so it has to start it has to check for permissions it has to open
15:34
to check for permissions it has to open
15:34
to check for permissions it has to open tables initialize update data end the
15:38
tables initialize update data end the
15:38
tables initialize update data end the query close tables free items and clean
15:40
query close tables free items and clean
15:40
query close tables free items and clean up and you can understand if you have uh
15:44
up and you can understand if you have uh
15:44
up and you can understand if you have uh data with let's say one billion rows
15:47
data with let's say one billion rows
15:47
data with let's say one billion rows that means that these steps that that
15:50
that means that these steps that that
15:50
that means that these steps that that you should you see here this steps that
15:53
you should you see here this steps that
15:53
you should you see here this steps that take milliseconds right now to insert
15:55
take milliseconds right now to insert
15:55
take milliseconds right now to insert one row it will be multiplied by billion
15:59
one row it will be multiplied by billion
15:59
one row it will be multiplied by billion and this is why people wait weeks for
16:01
and this is why people wait weeks for
16:01
and this is why people wait weeks for queries to complete because each query
16:04
queries to complete because each query
16:04
queries to complete because each query takes this duration right here of time
16:07
takes this duration right here of time
16:07
takes this duration right here of time to complete and if you have a million or
16:10
to complete and if you have a million or
16:10
to complete and if you have a million or 100 million or a billion rows to add it
16:14
100 million or a billion rows to add it
16:14
100 million or a billion rows to add it naturally your database naturally uh
16:17
naturally your database naturally uh
16:17
naturally your database naturally uh multiplies this duration by the rows you
16:20
multiplies this duration by the rows you
16:20
multiplies this duration by the rows you need to add this is a very simple
16:22
need to add this is a very simple
16:22
need to add this is a very simple explanation how do you overcome this
16:25
explanation how do you overcome this
16:25
explanation how do you overcome this people overcome this essentially by
16:27
people overcome this essentially by
16:27
people overcome this essentially by importing or using a backup without
16:29
importing or using a backup without
16:29
importing or using a backup without overhead and backups without overhead it
16:32
overhead and backups without overhead it
16:32
overhead and backups without overhead it means that essentially they backup raw
16:35
means that essentially they backup raw
16:35
means that essentially they backup raw data so instead of as you can see
16:37
data so instead of as you can see
16:37
data so instead of as you can see backing up statements that recreate data
16:40
backing up statements that recreate data
16:40
backing up statements that recreate data as you can see create table insert into
16:42
as you can see create table insert into
16:42
as you can see create table insert into blah blah blah they just back up raw
16:44
blah blah blah they just back up raw
16:44
blah blah blah they just back up raw data sets which means that uh they have
16:48
data sets which means that uh they have
16:48
data sets which means that uh they have a data from one column second column
16:51
a data from one column second column
16:51
a data from one column second column third column fourth column and whatever
16:54
third column fourth column and whatever
16:54
third column fourth column and whatever and it's usually separated by some kind
16:56
and it's usually separated by some kind
16:56
and it's usually separated by some kind of U some kind of uh
16:59
of U some kind of uh
16:59
of U some kind of uh uh character as you can see here
17:02
uh character as you can see here
17:02
uh character as you can see here sometimes it's a DOT sometimes it's a
17:03
sometimes it's a DOT sometimes it's a
17:03
sometimes it's a DOT sometimes it's a comma sometimes it's a different
17:05
comma sometimes it's a different
17:05
comma sometimes it's a different character as you can see here and all
17:07
character as you can see here and all
17:07
character as you can see here and all kinds of data can be separated like this
17:09
kinds of data can be separated like this
17:09
kinds of data can be separated like this the reason why you back up data like
17:11
the reason why you back up data like
17:11
the reason why you back up data like this is because importing raw data it's
17:14
this is because importing raw data it's
17:14
this is because importing raw data it's usually comes with thousands of times
17:16
usually comes with thousands of times
17:16
usually comes with thousands of times less overhead than
17:17
less overhead than
17:17
less overhead than importing uh
17:19
importing uh
17:19
importing uh logical logical backups like these now
17:23
logical logical backups like these now
17:23
logical logical backups like these now how do you come up with data like this
17:26
how do you come up with data like this
17:26
how do you come up with data like this and how do you import data like this so
17:28
and how do you import data like this so
17:28
and how do you import data like this so this is where it comes uh for you to I
17:31
this is where it comes uh for you to I
17:31
this is where it comes uh for you to I need to take backups like this so
17:34
need to take backups like this so
17:34
need to take backups like this so essentially you would have a loop that
17:37
essentially you would have a loop that
17:37
essentially you would have a loop that runs queries like this one that you see
17:40
runs queries like this one that you see
17:40
runs queries like this one that you see on the screen it means select the data
17:43
on the screen it means select the data
17:43
on the screen it means select the data you need from some table into an out
17:46
you need from some table into an out
17:46
you need from some table into an out file called something file name and then
17:50
file called something file name and then
17:50
file called something file name and then a bunch of options what this does
17:52
a bunch of options what this does
17:52
a bunch of options what this does basically it tells the database that it
17:54
basically it tells the database that it
17:54
basically it tells the database that it should select all data from some table
17:58
should select all data from some table
17:58
should select all data from some table where
17:59
where
17:59
where basically you specify whatever Clauses
18:02
basically you specify whatever Clauses
18:02
basically you specify whatever Clauses you need into an a file called the file
18:07
you need into an a file called the file
18:07
you need into an a file called the file that you specify here and the fields
18:09
that you specify here and the fields
18:09
that you specify here and the fields terminated by something makes it so your
18:12
terminated by something makes it so your
18:12
terminated by something makes it so your fields are terminated by some characters
18:14
fields are terminated by some characters
18:14
fields are terminated by some characters you can see here like this character
18:17
you can see here like this character
18:17
you can see here like this character this this this right now
18:20
this this this right now
18:20
this this this right now so there's a bunch of other options that
18:24
so there's a bunch of other options that
18:24
so there's a bunch of other options that you can use you can ignore lines you can
18:26
you can use you can ignore lines you can
18:26
you can use you can ignore lines you can specify lines starting by something or
18:28
specify lines starting by something or
18:28
specify lines starting by something or you can
18:29
you can
18:29
you can basically only select um specific
18:33
basically only select um specific
18:33
basically only select um specific columns these kinds of backups can be
18:35
columns these kinds of backups can be
18:35
columns these kinds of backups can be restored like
18:37
restored like
18:37
restored like this uh by specifying load data in file
18:41
this uh by specifying load data in file
18:41
this uh by specifying load data in file and then your file location basically
18:44
and then your file location basically
18:44
and then your file location basically into table that your table name and this
18:47
into table that your table name and this
18:47
into table that your table name and this basically everything you need to do
18:49
basically everything you need to do
18:49
basically everything you need to do these kinds of queries these two queries
18:51
these kinds of queries these two queries
18:51
these kinds of queries these two queries that you saw right here one of them will
18:54
that you saw right here one of them will
18:54
that you saw right here one of them will actually select data from your table
18:57
actually select data from your table
18:57
actually select data from your table into a file and the second one will
19:01
into a file and the second one will
19:01
into a file and the second one will actually load the data inside of a file
19:05
actually load the data inside of a file
19:05
actually load the data inside of a file into a database without any
19:07
into a database without any
19:08
into a database without any overhead because they will not come with
19:10
overhead because they will not come with
19:10
overhead because they will not come with logical statements that the database has
19:12
logical statements that the database has
19:12
logical statements that the database has to repeat it will only come with raw
19:16
to repeat it will only come with raw
19:16
to repeat it will only come with raw data like this not logical statements
19:19
data like this not logical statements
19:19
data like this not logical statements like these but raw data like like this
19:22
like these but raw data like like this
19:22
like these but raw data like like this and this means by the time your database
19:26
and this means by the time your database
19:26
and this means by the time your database completes inserts like thousand queries
19:29
completes inserts like thousand queries
19:29
completes inserts like thousand queries uh thousand rows with queries like this
19:32
uh thousand rows with queries like this
19:32
uh thousand rows with queries like this you would insert let's say 10 million
19:35
you would insert let's say 10 million
19:35
you would insert let's say 10 million queries like this and you can obviously
19:39
queries like this and you can obviously
19:39
queries like this and you can obviously see by the time you're um about to use
19:42
see by the time you're um about to use
19:42
see by the time you're um about to use such queries load the data in file it
19:45
such queries load the data in file it
19:45
such queries load the data in file it will be sometimes even thousands of
19:47
will be sometimes even thousands of
19:47
will be sometimes even thousands of times faster than usual insert
19:50
times faster than usual insert
19:50
times faster than usual insert statements because again it doesn't have
19:52
statements because again it doesn't have
19:52
statements because again it doesn't have any overhead you I suggest you to refer
19:55
any overhead you I suggest you to refer
19:55
any overhead you I suggest you to refer to the documentation if you're going to
19:57
to the documentation if you're going to
19:57
to the documentation if you're going to use this because it probably seems a
20:00
use this because it probably seems a
20:00
use this because it probably seems a little bit scary by the by the way it's
20:02
little bit scary by the by the way it's
20:02
little bit scary by the by the way it's written here but these are the pretty
20:04
written here but these are the pretty
20:04
written here but these are the pretty much all of the available options that
20:06
much all of the available options that
20:06
much all of the available options that you can use and aside from backups you
20:09
you can use and aside from backups you
20:09
you can use and aside from backups you also need
20:11
also need
20:11
also need to uh you also need to adhere to uh some
20:15
to uh you also need to adhere to uh some
20:15
to uh you also need to adhere to uh some sort of um bug in the future right it's
20:19
sort of um bug in the future right it's
20:19
sort of um bug in the future right it's likely that you will come across bug or
20:22
likely that you will come across bug or
20:22
likely that you will come across bug or two uh because myql is such a database
20:25
two uh because myql is such a database
20:25
two uh because myql is such a database management system where some in some
20:27
management system where some in some
20:27
management system where some in some areas some is broken so for example here
20:31
areas some is broken so for example here
20:31
areas some is broken so for example here you can see two queries the text is a
20:34
you can see two queries the text is a
20:34
you can see two queries the text is a little bit broken because it's a Google
20:36
little bit broken because it's a Google
20:36
little bit broken because it's a Google Slides but you can see the main idea
20:39
Slides but you can see the main idea
20:39
Slides but you can see the main idea anyway so the the first query is like
20:42
anyway so the the first query is like
20:42
anyway so the the first query is like this select all from table where match
20:45
this select all from table where match
20:45
this select all from table where match column against demo where demo is the
20:47
column against demo where demo is the
20:47
column against demo where demo is the users input and the second query is
20:50
users input and the second query is
20:50
users input and the second query is select all from table where match column
20:53
select all from table where match column
20:53
select all from table where match column against something that has an ETA sign
20:56
against something that has an ETA sign
20:56
against something that has an ETA sign and Boolean mode and in this case the
20:59
and Boolean mode and in this case the
20:59
and Boolean mode and in this case the ETA sign would crash the database if
21:02
ETA sign would crash the database if
21:02
ETA sign would crash the database if your database is running myql
21:04
your database is running myql
21:04
your database is running myql 5.7 the reason why this happens is
21:08
5.7 the reason why this happens is
21:08
5.7 the reason why this happens is because your your query decides to take
21:11
because your your query decides to take
21:11
because your your query decides to take a different path when executing why this
21:14
a different path when executing why this
21:14
a different path when executing why this happens why does it take a different
21:16
happens why does it take a different
21:16
happens why does it take a different path no one can explain this is a bug
21:18
path no one can explain this is a bug
21:18
path no one can explain this is a bug within my SQL this is actually a real
21:21
within my SQL this is actually a real
21:21
within my SQL this is actually a real bug that I've uncovered with the
21:22
bug that I've uncovered with the
21:22
bug that I've uncovered with the assistant of Charles Bell which is which
21:25
assistant of Charles Bell which is which
21:25
assistant of Charles Bell which is which has led the mys replication team in the
21:27
has led the mys replication team in the
21:27
has led the mys replication team in the past with this is how it works you are
21:29
past with this is how it works you are
21:30
past with this is how it works you are likely to encounter such buxs if you
21:32
likely to encounter such buxs if you
21:32
likely to encounter such buxs if you work with bigger data sets or if you dig
21:35
work with bigger data sets or if you dig
21:35
work with bigger data sets or if you dig deeper into MySQL so this is kind of
21:39
deeper into MySQL so this is kind of
21:39
deeper into MySQL so this is kind of everything I needed to share I hope that
21:42
everything I needed to share I hope that
21:42
everything I needed to share I hope that this talk has been insightful even
21:44
this talk has been insightful even
21:44
this talk has been insightful even though I had to repeat myself and kind
21:46
though I had to repeat myself and kind
21:46
though I had to repeat myself and kind of redo my talk a little bit but I hope
21:49
of redo my talk a little bit but I hope
21:49
of redo my talk a little bit but I hope that you have enjoyed and I'm now AV
21:56
[Music]
#Engineering & Technology
#Hacking & Cracking