MapReduce and Design Patterns - Binning Pattern Example
2K views
Oct 18, 2024
MapReduce and Design Patterns - Binning Pattern Example https://www.tutorialspoint.com/market/index.asp Get Extra 10% OFF on all courses, Ebooks, and prime packs, USE CODE: YOUTUBE10
View Video Transcript
0:00
In this video we are discussing binning pattern example and using this example we shall also give
0:07
you some idea regarding the implementation of the code and running of the code
0:14
So in this example we shall be using postlinks.xml to divide the post links based on their
0:21
post links and the code will generate two files, one is the duplicate another one is the link type
0:27
links from the XML file. So to have this particular assignment getting implemented, so let us go
0:34
for one practical demonstration for the easy understanding of the concept. In this example
0:40
we are going to discuss binning pattern example falling under the data organization pattern
0:47
We shall be dealing with the postlinks.xml under the folder slash input slash post links. And here we're
0:54
We are going to discuss the current content of this post links.xmlちょっと we have shown some of the rows, all the rows have got enclosed within this post link, having got multiple rows are there and these are the I, the respective attributes I'm showing here
1:08
So ID creation date, post ID, related post ID and linked type ID. So these are the multiple attributes are there
1:14
So here we shall separate links by link type ID and it will be only a map only job. So we're having multiple rows are there. Here we have shown some of
1:24
them to make you understand this is my Java program so binning mrt tasks. Java
1:31
under that I'm having one mapper class extended that is a links mapper under
1:38
this we have defined one variable one object that is a multiple op that means
1:43
output under the class multiple outputs and we are having the setup and within the setup we shall in instantiate this multiple output variable This multiple output is equal to new multiple outputs in the current context so that has been done in the setup method
2:02
we have done the overriding next one we are going to override the map method this map method
2:08
we're having one xml parts that is the there is a hash map there is a hash map object
2:14
and xml to map is a method which will read one xml file and returns one hash map object
2:20
and that will instantiate this XML parsed. So now string link type is equal to XML parsed
2:27
Get link type ID. So using that ID from the hash map, you are getting the link type
2:32
If the link type is equal to is equal to null, then return means rest of the code will not be executed
2:37
But here this link type can be having the value either one or three
2:42
So if the link type dot trim, thus eliminating the blank spaces before after, if equals one
2:47
then you shall go for this multiple output right, bin. value null writable dot gate and link so this particular text is there so separate
2:55
the links for the link tab ID is equal to one and then in the else part we are
3:00
going for the same but here we're writing that one for duplicate we have
3:05
kept that one in the tricatch block we're having the cleanup where we shall only
3:09
close the multiple output object we're having the respective let us discuss it is a
3:17
it is a new reducer only it is having the map only job so we shall discuss the main function this main function with the
3:25
request when the program will be executed two arguments two arguments to be
3:29
required the first one is a bidding MR task that is a class name then the
3:33
input file location the folder and the output file location folder so that is
3:37
required otherwise exit one job has been defined with the name that is a
3:42
binning the post links so that is a job ID so there is a job name we have defined we have assigned the respective job where We have assigned the respective mapper class whatever you have defined earlier And here the reducer there is no reducer map only job
3:58
And for this set input paths and set output path will be using this argument 01, argument 1, common line arguments
4:05
And therefore, multiple outputs. We are going for add named output. There is a job, bins
4:12
There is a bins. Then we'll be having the text format which is of the text format class and then text class and output will be null writable class
4:22
Next one we're having the multiple outputs or set counter enable so job true so we are just enabling the counter
4:28
But just making the enable the counter and depending upon the completion status zero or two will be returned
4:34
System dot exit either zero or two zero means successful completion two means unsuccessful completion
4:40
So in this way the main function has been written Now it is a high time to create the jar file
4:45
So to create the jar file, we'll be going for the respective package and then we shall click
4:50
on the export and then we shall give the jar file name, we shall give the path and then next
4:56
and finish. So jar file we have already created. So this particular step we are not executing right now
5:05
So we shall go for the command in the console. The command is Hadoop
5:10
jar so there is a command I've already written so Hadoop jar then we'll be going
5:15
for this map produce design pattern jar files the respective jar file path and
5:21
the jar file name binning is the there is a package name binning amort
5:26
task is a class name input post links will be the this the binning is the package
5:32
name and these are class name and then input and post links
5:36
this is the path where the the XML file is residing and the output path has been given Now let me execute the command Let me show you that it is really getting separated by the link type ID and here we having two link tabs ID that is a one
5:52
and three so let me see the output folder at first let me see so going for the
6:00
output folder so going back and then output folder yes so here we're having
6:05
the duplicate M one file linked M another file is there to two power power power
6:10
files is there part file also has got grid created but it is having the zero bytes
6:15
so that indicates that nothing has been written in this part file so we'll be
6:18
concentrating on this duplicate and linked these two files were concentrating so let me see the content of them I hope that in the link will be having the
6:27
link type ID 1 all the records and in because of duplicate will be having the ring
6:32
tab link type IBD 3 so here the command I'm executing first one executing the
6:39
I wrote this comment earlier so I'm taking that one from the history so you can find
6:49
that all the link tab ID is equal to three for duplicate part duplicate output file
6:54
all the link tab ID is equal to three you can find this one and for the other one
7:00
yes for this one if we execute we can find that If I execute this one you'll be getting the link type ID is equal to one yes it is the current content
7:16
So in this way you have shown you that how to create such a binning pattern example step by step into details
7:23
So now we can delete the output folder so that the next map D2 stocks can be executed and I hope you have enjoyed the video
7:39
Thanks for watching
#Computer Science
#Data Management
#Education
#Java (Programming Language)
#Programming