Basics of MapReduce Algorithms | Hadoop
37K views
Oct 24, 2024
Basics of MapReduce Algorithms
View Video Transcript
0:00
In this video we are discussing basics of MapReduce algorithm�
0:05
What are the main components in MapReduce and what is the operation is going to take place
0:10
So let us go for further discussion on it. So here we are having the mapper class
0:17
Also we are having the reducer class. The output of mapper class will be the input to the reducer class
0:24
The mapper class is taking the input and ultimately from the reducer class we are having
0:28
the output formed. And these inputs and outputs will be in the form of key value pairs
0:34
This mapper class will do the tokenizing input, mapping and shuffle and short
0:41
And here this reducer will do the searching and reducing operations. So, these are the main prime operations in mapper class and these are the two operations to be done in the reducer class
0:52
So, different tasks in map reduce algorithms. So let us discuss more into details
0:57
The map reduce programs have two tasks, the map task and the reduced task as it has been shown here
1:05
The map task is done by the mapper class and the reduced task will be done by the reducer class So this is the map task done by the mapper class Reduced task will be done by the reducer class Mapper class takes an input tokenizes it and maps and shorts it
1:22
You can find that map class or the mapping class is taking one input, tokenizing it, and
1:28
doing the shuffling and short. So the mapper class takes the input, tokenizes it, and maps
1:34
and shorts it. The output of the mapper class is used at the input to the
1:39
Reducer class, as I have mentioned earlier, which in turn such as the matching pairs and
1:46
reduces them according to the business logic, whatever is required, the respective outputs
1:52
will be obtained and respective customized function will be working at the Reducer class
1:57
for the reducing operation. MapReduce task example. So one of the most fundamental MapReduce example is what count problem
2:09
work on problem is something like our hello world problem in MapReduce. So, it takes the string
2:15
with different words as input and counts the number of words in of each type. And let's see
2:21
how the mapper class and reducer class work for this task So a string will be taken as input and then it will count the number of distinct words in that particular string and that is known as the word count problem The Mapper class tokenize the strings and makes a
2:39
shorted list of the words and it makes the words as the key and the number as the value
2:44
So, world will become the key and the number, that means the frequency of occurrence, that is
2:49
a number, that is a count, will be the value in that case. The reducer class takes the list and count the number of entries of each word in the input
2:59
which is the output from the mapper. So, the reducer class takes the list and counts the number of entries of each word
3:07
It will check that how many times one word has occurred. Finally, creates another list with key and their respective values, that is the count values
3:17
that means number of times for which the word has occurred in the sentence
3:22
So, here is one example for your better understanding. So, here you see, there is one sentence is there
3:29
One input is here. That is our ad duck is a bird
3:33
We have taken a very simple example. So mapper class here we are having take line from the input and tokenize it And for each word in the line emit word comma one So there is a it will create such tokens here for each and every word in the sentence it will go for emit word comma 1 so in
3:53
this way it will produce output like a comma 1 duck comma 1 is comma 1 a comma 1 and bot comma 1 so
4:01
that is a mapper output next it is going for the shorted output so now it has got shorted
4:06
in the alphabetical order so a abdi i so in the alphabetical order on the keys the the corresponding short-ed output has been
4:14
obtained now remove duplicate keys so here you can find that a has occurred for
4:20
twice so that's why the key a has got removed here and rest of them are unique so
4:25
no elimination was taking place so now this this output will be going as the input to
4:31
this reducer class so reducer class is having one algorithm something like this
4:36
so reducer key comma values so sum is equal to 0 for each value in key sum is equal to sum plus value and emit key comma sum
4:47
So in this way where it is taking the key values as input and emitting the key sums
4:53
as output and that is also a key value pair. So, A will become, we'll have the sum 2, bird 1
#Computer Education
#Computer Science
#Programming