What is MapReduce? Hadoop
23K views
Oct 24, 2024
What is MapReduce?
View Video Transcript
0:00
In this video, we are going to discuss what is MapReduce
0:04
MapReduce is one of the very important and major components in Hadoop ecosystem
0:10
Whenever we are having a large set of data, a huge data set, then in that case the huge
0:16
data set will be divided into smaller pieces and processing will be done on them in parallel
0:22
in MapReduce. And multiple number of working nodes will be there on which these small pieces of this
0:29
datasets will be assigned and the processing will be done in parallel and then the result will be
0:35
obtained in the form of a list. MapReduce takes list as input and produces list as output
0:42
and that is the main purpose of MapReduce. So, let us go for some more discussion on this
0:48
map reduce. So, here we are having this input and here we are having these map tasks and then
0:58
we are having the reduced tasks will be there where the reduced method will be working
1:02
here it was working as map methods here it will be working the reduced methods and the
1:08
final output will be obtained in the aggregated form that is the basic theme behind this map reduce so the map MapReduce is one of the main components of the Hadoop ecosystem In our Hadoop ecosystem video we have discussed
1:21
there are so many components are there under this Hadoop ecosystem. There also we had that
1:26
map produce ecosystem component. MapReduce is designed to produce a large amount of data in
1:32
parallel by dividing the work into some smaller pieces and independent tasks. So, that you
1:39
large amount of data will not be processed at a time it will be divided into smaller pieces
1:45
and those pieces will be assigned to the working nodes and those tasks will be executed
1:50
in parallel for the faster processing the whole job is taken from the user and
1:57
divide it into smaller tasks and assign them to the working nodes map produce programs take
2:04
inputs as a list and convert to the output also as a list so it will take
2:09
the input as a list and it also converts output in the form of a list
2:16
So, let us go for some further criterias. The map or the mapper tasks a set of keys and values we can say that it is as a key value pair as input So this particular data will be in the form of key value pairs Now questions might be coming in mind that what is key and what is the value
2:36
So, key is actually nothing but a reference to our data set and values are nothing but
2:41
the data sets. So, key can be treated as a reference to a data set or reference to a file
2:47
and the value is nothing but our dataset. the data may be in a structured or unstructured form and the framework can make it into keys
2:57
and values so the data set may be in the structured that means in the form of say database
3:03
and database tables where the data can be divided or can be represented in the form of rows
3:09
and columns and unstructured mid where we'll be going for text files predfs we are having the
3:15
images we're having the videos and they'll be known as the unstructured data the framework
3:19
can make it into keys and values the key are the reference of input files and values are the
3:28
datasets the user can create a custom business logic based on the need for the data processing
3:35
so what kind of processing will be done that can be customized depending upon the business need the respective operations will be carried out the respective processing will be carried out on the dataset The task is applied on every input value
3:54
Now we are going for the reduce task. The reducer takes the key value pair which is created by the mapper as input
4:03
So mapper is taking the input and mapper's output will be the input to the reducer and reducer
4:10
will produce the respective output accordingly. The key value pairs are shorted by the key elements in case of reducer
4:18
And in the reducer, we perform the shorting, aggregation or summarization type of jobs
4:25
That means here we are going for some aggregation type of job. We can go for, say, submission, we can go for, say, counting
4:31
We can go for, say, maximum minimum calculations and so on. How MapReduce task works
4:37
So, now let us go for the macro view of the system. The given inputs are processed by the user defined methods
4:45
All different business logics are working on the mapper section. So, business logic will be working at the mapper section, and the mapper generates intermediate data and reduces takes them as input
4:58
As I told you earlier, that the output
#Data Management
#Programming