MapReduce and Design Patterns - What is MapReduce ?
797 views
Oct 18, 2024
MapReduce and Design Patterns - What is MapReduce ? https://www.tutorialspoint.com/market/index.asp Get Extra 10% OFF on all courses, Ebooks, and prime packs, USE CODE: YOUTUBE10
View Video Transcript
0:00
In this video we are going to discuss what is MapReduce
0:04
MapReduce is one of the main components in Hadoop ecosystem. In case of MapReduce we are having the mapper methods and the reducer methods
0:14
We are having a set of data, there is a huge set of data
0:17
And this set of data will be splited into smaller tasks and those tasks will be assigned
0:23
to the multiple mappers, multiple working nodes to work on them in parallel
0:29
and then the output of that mapper will be obtained as an input to the reducer and now the reducer
0:36
will work on them depending upon the customization depending upon the business logic or custom
0:41
function the reducer will decide what operation it is going to do on them so it might be some
0:47
aggregation operation there and then reducer will produce the output and this mapper is taking
0:53
the data set in the form of key value pairs and reducer is also producing the output
0:59
in the form of key value pairs. Here the keys are nothing but referencing some files
1:05
that is the data set files, and values are nothing but the data sets. So, that is the basic
1:10
concept of this map reduce. So, let us go for more discussion with some diagrams on this
1:16
topic, that is what is map produce. So here we are having this input, and here we are having
1:22
these map tasks, and then we are having the reduced tasks will be there, where the reduced
1:29
will be working here it was working as map methods here it will be working the reduce methods
1:35
and the final output will be obtained in the aggregated form that is the basic theme behind this
1:40
map produce so the map produce is one of the main components of the hadup ecosystem in our
1:46
hardup ecosystem video we have discussed there are so many components are there under this
1:50
hardup ecosystem there also we had that map reduce ecosystem component map produce is designed to
1:57
produce a large amount of data in parallel by dividing the work into some smaller pieces and
2:05
independent tasks So large amount of data will not be processed at a time It will be divided into smaller pieces and those pieces will be assigned to the working nodes and those tasks will be executed in parallel for the faster processing
2:21
The whole job is taken from the user and divided into smaller tasks and assign them to the working nodes
2:29
Map produce programs take inputs as a list and convert to the output also as a list
2:36
So it will take the input as a list and it also converts the output in the output in the list. the form of a list. So, let us go for some further criteria. The map or the mapper tasks a set
2:49
of keys and values, we can say that it is as a key value pair as input. So, this particular
2:56
data will be in the form of key value pairs. Now, questions might be coming in mind that
3:01
what is key and what is the value. So, key is actually nothing but a reference to our data
3:06
set and values are nothing but the datasets. So, key can be treated as a reference to a data set or reference to a file and the value
3:15
is nothing but a dataset. The data may be in a structured or unstructured form and the framework can make it into keys
3:24
and values. So, the data set may be in the structured, that means in the form of, say, database and
3:31
database tables where the data can be divided or can be represented in the form of rows
3:36
and columns and unstructured mid where we'll be going for text files
3:40
pre-d-fs we are having the images we're having the videos and they'll be known as
3:44
the unstructured data the framework can make it into keys and values the key
3:51
are the reference of input files and values are the data sets the user can create
3:58
a custom business logic based on their need for the data processing so what
4:03
kind of processing will be done that can be customized depending upon the
4:07
business need the respective operations will be carried out the respective processing will be carried out on the dataset the
4:15
task is applied on every input value now we are going for the reduced task the
4:24
reducer takes the key value pair which it which is created by the mapper as input so mapper is taking the input and mapper output will be the input to the reducer and reducer will produce the respective output
4:39
accordingly the key value pairs are shorted by the key elements in case of reducer and in the
4:46
reducer we perform the sorting aggregation or summarization type of jobs that means here
4:53
we are going for some aggregation type of job we can go for a submission we can go for
4:57
say counting we can go for say maximum minimum calculations and so on how map reduce task works
5:05
so now let us go for the the macro view of the system the given inputs are processed by the
5:11
user defined methods all different business logics are working on the mapper section so business
5:17
logic will be working at the mapper section and the mapper generates intermediate data and
5:22
and reducers takes them as input. As I told you earlier, that the output of the mapper
5:28
will be the input to the respective reducer. The data are processed by the user-defense function
5:34
in the reducer section. So, the data processing will be done at the reducer section, depending upon the business logic
5:40
depending upon the user-defence function or operation, and the final output is stored in HDFS
5:46
that is, Hadoop distributed file system, where the final result will be stored there
5:52
Now, let us go through one proper diagram here. So, you are having here, HDFS split
5:59
So, multiple splits are there. So, Hadoop distributed file system split will be there
6:04
So, input will be key value pairs in this way. It will be, the input will be obtained to the respective mappers
6:10
So now here the mapping is taking place and it will be, now it will be dealing with multiple
6:14
keys and the respective values. So Q1, value 1 to key K to value K
6:19
In this way, the mapper will be working. Now their outputs will be coming to this shuffle and short
6:25
So here the aggregate values by the key. So depending upon the same key, so they will get aggregated
6:31
Then the result will be obtained. That means the output of this shuffle and shot operation will be obtained to the reducer
6:38
So these are the reduced methods are working here So we having the Q1 intermittent values and then key K intermediate values will be coming to the respective reducers and then final Q1 and final value and final key K and
6:51
final value will be obtained in this way. So, initially we are having this key K, key 1
6:58
key K value K, in this way we are having. Then here we are having this shuffle and shot
7:03
then the reduced method will be working on it. Then we are finally getting final key 1 value
7:09
final key k then final value in this way the things will be obtained as output the
7:15
pictorial representation on how the map reduce task works we have shown that one in this
7:20
diagram so let us go for another elaborate tip example for the better understanding
7:25
so here we're having one example here so dear beer river car car river dear car
7:34
beer so these are the value sets we're having so set of values we're having so now
7:39
they will be splited in this way. The splitting has been done
7:43
Now the mapper is there. So, here the mapping is taking place. So, I am finding deer for one count, beer for count one and river for count one, car for count
7:52
one, car for count one, car for count one, in this way the mapping is taking place
7:57
Here we are having this shuffling and sorting. So here you can find we are having this beer, only the beer keys are there, only the car keys
8:04
are there, only the deer keys are there, only the river keys are there
8:09
Then here we are going for the reducing. So this beer has occurred for two
8:13
So here the aggregation method is count actually. So here card has occurred for three times, deer has occurred for two times and river has
8:20
occurred for two times. So here this is my reducer which is doing the reducing and here we are having this final
8:26
result that is beer two, car three, deer two and river two
8:31
So here you can find that how the overall map produce what count process is getting executed
8:38
into multiple different phases. So this example is a very interesting one and also it is clear our doubts and it is clear
8:46
our conception also. The pictorial representation on how the MapReduce task works, we have shown that one
8:53
in this way. So in this video we have got the idea that is what is MapReduce
8:59
Thanks for watching this video
#Business & Productivity Software
#Computer Education
#Data Management
#Programming