What is MapReduce?
Nov 18, 2024
What is MapReduce?
View Video Transcript
0:00
in this video we are going to discuss
0:02
what is MapReduce MapReduce is one of
0:05
the main components in the Hadoop
0:07
ecosystem in case of MapReduce what is
0:10
happening we're having a huge set of
0:12
data or data set that very data set will
0:15
be divided into multiple smaller parts
0:18
and they'll be assigned to the working
0:20
nodes so that the processing can be done
0:22
in parallel in a faster way on the
0:25
working nodes the mapper methods will be
0:27
working and this particular method
0:30
mapper method will be working on the
0:31
chunk of the data set or the part of the
0:34
data set assigned to that this input
0:37
will be in the form of key value pairs
0:39
key means the file references and values
0:43
means that data sets the output of this
0:45
mapper methods will be obtained as the
0:48
input to the reducer method the reduce
0:50
it will execute some functions these
0:53
functions are the custom functions
0:55
depending upon the business need some
0:57
functions will be operating on this
0:59
particular output of mapper methods and
1:02
those functions will do some aggregation
1:05
some processing on data then the reducer
1:08
will produce a final output and that
1:10
will be also in the form of key value
1:12
pairs and that is a basic concept of
1:14
this MapReduce so let us go for further
1:17
more discussions with some diagrams and
1:19
examples to discuss what is MapReduce so
1:24
here you are having this input and here
1:27
we are having these map tasks and then
1:30
we're having the reduced tasks will be
1:32
there whether reduced method will be
1:34
working here it was working as map
1:36
methods here it will be working the
1:38
reduced methods and the final output
1:40
will be obtained in the aggregated form
1:43
that is the basic theme behind this
1:45
MapReduce so the MapReduce is one of the
1:47
main components of the Hadoop ecosystem
1:49
in our Hadoop ecosystem video we have
1:52
discussed there are so many components
1:54
are there under this Hadoop ecosystem
1:56
that also we had that MapReduce
1:58
ecosystem component MapReduce is
2:01
designed to produce a large amount of
2:03
data in parallel by dividing the work
2:06
into some smaller pieces and independent
2:10
tasks so large amount of data
2:13
will not be processed at a time it will
2:15
be divided into smaller pieces and those
2:18
pieces will be assigned to the working
2:19
nodes and those tasks will be executed
2:22
in parallel for the faster processing
2:25
the whole job is taken from the user and
2:29
divided into smaller tasks and assign
2:31
them to the working nodes MapReduce
2:35
programs take input as a list and
2:37
convert to the output also as a list so
2:41
it will take the input as a list and it
2:43
also converts output in the form of a
2:45
list so let us go for some further
2:50
criterias the map or the mapper tasks a
2:53
set of keys and values we can say that
2:56
it is it is as a key value pair as input
2:59
so this particular data will be in the
3:02
form of key value pairs now questions
3:04
might be coming in mind that what is key
3:06
and what is the value so key is actually
3:09
nothing but our reference to our data
3:11
set and values are nothing but the data
3:14
sets so key can be treated as a
3:16
reference to a data set or reference to
3:19
a file and the value is nothing but our
3:21
data set the data may be in a structured
3:24
or unstructured form and the framework
3:27
can make it into keys and values so the
3:31
data set may be in the structured that
3:33
means in the form of say database and
3:35
databases tables where the data can be
3:38
divided or can be represented in the
3:40
form of rows and columns and
3:41
unstructured mid where will be going for
3:44
takes files PDFs we're having the images
3:47
were having the videos and they'll be
3:49
known as the unstructured data the
3:51
framework can make it into keys and
3:54
values the key are the difference of
3:57
input files and values are the data sets
4:01
that user can create a custom business
4:04
logic based on their need for the data
4:07
processing so what kind of processing
4:09
will be done that can be customized
4:11
depending upon the business need the
4:14
respective operations will be carried
4:16
out the respective processing will be
4:18
carried out on the data set the task is
4:21
applied on every input value now we are
4:25
going for the
4:26
tasks the reducer takes the key value
4:29
pair which it which is created by the
4:31
mapper as input so mapper is taking the
4:35
input and mappers output will be the
4:37
input to the reducer and reducer will
4:40
produce the respective output
4:42
accordingly the key value pairs are
4:44
shorted by the key elements in case of
4:48
reducers and in the reducer we perform
4:51
the shorting aggregation or
4:53
summarization type of jobs that means
4:56
here we are going for some aggregation
4:58
type of job we can go for say summation
5:00
we can go for say counting we can go for
5:02
say maximum minimum calculations and so
5:04
on how MapReduce task works so now let
5:09
us go for the the macro view of the
5:11
system the given inputs are processed by
5:14
the user different methods all different
5:16
business logics are working on the
5:18
mapper section so business logic will be
5:21
working at the mapper section and the
5:23
mapper generates intermediate data and
5:25
reducers takes them as input as I told
5:28
you earlier that the output of the
5:30
mapper will be the input to the
5:32
respective reducer that data are
5:35
processed by the user different function
5:36
in the reducer section so the data
5:39
processing will be done at the reducer
5:40
section depending upon the business
5:42
logic depending upon the user different
5:44
function or operation and the final
5:47
output is stored in HDFS that is Hadoop
5:50
distributed file system where the final
5:52
result will be stored there now let us
5:56
go through one proper diagram here so
5:59
you are having here is the FS split so
6:02
multiple splits are there so Hadoop
6:04
distributed file system split will be
6:06
there so input will be key value pairs
6:08
in this way it will be the input will be
6:11
obtained to the respective mappers so
6:13
now here the mapping is taking place and
6:15
it will be now it will be dealing with
6:16
multiple keys and the respective value
6:18
so q1 value one two key k2 value K in
6:22
this way the mapper will be working now
6:24
their outputs will be coming to this
6:26
saffle and short so here the aggregate
6:29
values by the key so depending upon the
6:31
same key so they will get aggregated
6:33
then the result will be obtained that
6:36
means the output of this
6:37
shuffle and short opera
6:39
we'll be obtained to the reducer so
6:42
these are the reduce methods are working
6:43
here so we are having the q1
6:45
intermediate values and then key K
6:47
intermediate values will be coming to
6:49
that respective reducers and then final
6:51
q1 and final value and final key K and
6:54
final value will be obtained in this way
6:57
so initially we are having this key K q1
7:00
value 1
7:01
he came value K in this way we're having
7:03
then here we are having this suffle and
7:05
shot then the reduced method will be
7:07
working on it then we are fine finally
7:09
getting final key one value final key K
7:13
then final value in this way that things
7:16
will be obtained as output the pictorial
7:18
representation on how the MapReduce task
7:21
works we have shown that one in this
7:23
diagram so let us go for another
7:25
elaborate example for the better
7:27
understanding so here we are having one
7:30
example here so deer beer River car car
7:35
River dear God beer so these are the the
7:39
value sets were having so set of values
7:41
were having so now they'll be splitted
7:43
in this way the splitting has been done
7:45
now the mapper is there so here the
7:48
mapping is taking place so I'm finding
7:50
dear for one count beer for counter one
7:53
and River for count one card for count
7:55
to one car for count one and river for
7:57
counter one in this way the mapping is
7:59
taking place here you are having this
8:01
suppling and shorting so here you can
8:03
find we're having this beer only the
8:05
beer keys are there or the car keys are
8:08
there only the deer keys are there only
8:10
the river keys are there then here we
8:12
are going for the reducing so this beer
8:15
has occurred for two so here the
8:16
aggregation method is count actually so
8:19
here car has offered for three times
8:21
deer has occurred for two times and even
8:23
has occurred for two times so here this
8:25
is my reducer which is doing the
8:27
reducing and here you are having this
8:29
final result that is beer two car three
8:31
deer two and liver two so here you can
8:34
find that how the overall map produced
8:37
what count process process is getting
8:40
executed into multiple different phases
8:43
so this example is a very interesting
8:45
one and also it is clear our doubts and
8:48
it is cleared of a conception also the
8:51
pictorial representation on
8:53
how the map produced a squawks we have
8:55
shown that one in this way so in this
8:58
video we have got the idea that is what
9:00
is nap reduce thanks for watching this
9:03
video
#Data Management
#Programming

