Data Flow in MapReduce Framework
2K views
Nov 18, 2024
Data Flow in MapReduce Framework
View Video Transcript
0:00
in this video we are discussing data
0:02
flow in MapReduce framework in the
0:05
previous videos we have discussed how
0:07
the mapper is working and next we have
0:10
discussed how the reducer is working so
0:12
please watch all those videos for the
0:14
better understanding of this concept so
0:17
here you can find that that is one
0:19
diagram is the earth so input will be
0:20
there input will be consisting of
0:22
multiple different records and those
0:24
records will be splitted by the input
0:27
split into multiple logical model which
0:30
can be handled by the mapper and then
0:33
after this split will be having the
0:35
record reader which will convert this
0:37
records into key value pairs because
0:40
mapper can only read and handle key
0:44
value pairs then the output of that will
0:47
be made available to the mapper and
0:49
mapper will be handling this key value
0:51
pair the developer can put the logic in
0:53
the mapper according to the business
0:55
logic requirement or business logic
0:58
implementation and then this mapper will
1:00
produce a intermediate output and these
1:03
intermediate output will not be stored
1:04
on to the HDFS but it will be stored on
1:07
the local HDD and then then all this all
1:11
this outputs from this mappers will be
1:14
will be done the suffering and shorting
1:16
and that will be stored here and that
1:18
will be made available to the respective
1:20
reducer so this shorting will be done on
1:23
the values stored in the keys and then
1:26
the reducer will do the reduce and it
1:29
will produce the output that is a part 0
1:31
on the HDFS that is Hadoop distributed
1:33
file system so data flow in MapReduce
1:37
framework the data which will be
1:39
processed through the MapReduce are
1:41
stored in HDFS and the data are stored
1:44
in different blocks and stored in a
1:47
distributed format so that means the
1:49
data will be stored into multiple
1:51
commodity hardware's so steps of data
1:55
flow in each mapper one split will be
1:59
will be there to process at a time and
2:01
developers can put their own business
2:03
logic in these mappers and mappers run
2:07
in parallel fashion in all of the
2:09
missions so mapper will be running in
2:11
parallel
2:12
between the processing on all of the
2:14
missions in parallel and the business
2:17
logic will be put in a mapper by the
2:18
developer the output of the mapper is
2:21
stored on the local disk but not on to
2:24
the HDFS and this report is suffle tool
2:27
to reduce her to part from the reduce
2:30
task when all of the mappers have
2:33
completed their task the output is then
2:35
shorted and then they will get modest
2:38
the reducer takes those data and
2:41
performs the reducing task and then the
2:44
output is stored onto the HDFS in the
2:47
form of part 0 so that is our mapper and
2:51
reducer and how the data flow is taking
2:53
place we have discussed that one into
2:55
details thanks for watching this video
#Data Management
#Programming