MapReduce and Design Patterns - Partitioning Pattern Example
696 views
Oct 18, 2024
MapReduce and Design Patterns - Partitioning Pattern Example https://www.tutorialspoint.com/market/index.asp Get Extra 10% OFF on all courses, Ebooks, and prime packs, USE CODE: YOUTUBE10
View Video Transcript
0:00
In this video we are discussing partitioning pattern example
0:05
So here this particular video will be associated with one practical demonstration for the
0:10
easy understanding of the implementation. So in this example we shall be using the postlinks.xml to divide the post links based on
0:20
their respective criteria. So there are two different post links in the dataset, the post links with the value
0:26
1 and the post links with the value 3. So let us go for one practical demonstration of the implementation of this problem and to know
0:35
it better. So here is the demonstration for you. In this example we are going to discuss another data organization pattern
0:44
That is a partition pattern example. We are having the XML file that is a postlinks.xml and we shall partition this post links into
0:53
the type of ID 1 and ID 3. So we are supposed to make partitions we'll be creating two part files in the output that is with the ID1 and ID3
1:05
So how to execute this one? So we're having slash input slash post links is the path in which the file is existing with the size 2.35 mb
1:13
So here is the postlinks.xml. We have kept some of the rows but there are so many rows are there
1:19
Under the post links tags we're having each and every row consisting of ID, then creation date, post ID and related
1:26
post ID and the link type ID so here we're having one and some of the all we'll be
1:32
having some other values than one we're going for this Java program only one Java
1:37
program that is a part post link AMR task so here we're having multiple inner
1:43
classes the first inner class is our links partitioner this inner class is
1:49
extending the partitioner class and within that we're defining one's private static final string that is a partition values is equal to partitioned values so
2:01
there is a final string we are creating here and also the other instance
2:05
variables like your configuration config string partitions and add a list part list
2:11
of the type which will be holding of the type of data string and then these are
2:16
the different instance variables will be reference variable will be used here
2:19
we're going to override the method that is a configuration get confit conf which is
2:25
returning the configuration here this is a get conf also we are overriding the
2:31
method that is a set confine set configuration and the conf here we are going for the set conf here so this conf you a confe is equal to conf what about the input argument partitions is equal to conch dot get
2:45
partition values so partition values is the respective constant string which you
2:49
declared earlier and now these partitions will be splitted these are partitions
2:53
values which you define the final string now these partitions will be
2:58
splated taking the blank space as a delimiter into this particular list at a
3:04
list you can find that one the part splits so these are delimited is a blank space
3:10
now we shall go and t that one so that we should not want to have any kind of
3:14
blank spaces prevented and appended extra so after doing the t we are just
3:19
adding it to the part list which you declared earlier so in this way we'll just use
3:23
the blank space as the respective delimiter splitting it t it and then adding it to
3:29
the part list I think you are getting my point so next one
3:34
you are going to override the gate partition so get partition is another method
3:38
within the partition class so here we're having the string p value is equal to key
3:42
dot two string dot trim so from the key whatever has been passed as in writable
3:48
so this p value is getting initialized and the t after t if part
3:54
list dot contains p value then return part list dot index of p value so the respective
4:00
index of the p value will be returned else it will return the numb part
4:04
that is the I&T which has been passed as input argument so it will return this
4:08
num partitions so else it will return the numb partitions so that we passed as
4:14
input argument next we're defining one other method that is a set partitions
4:20
so static void set partitions which will take the job job and string
4:23
st here so job dot get configuration dot set parameter values comma string
4:28
styr so this is the respective the job configuration is getting set here
4:34
Now we are having one mapper class the name of the mapper class is post links mapper
4:39
which is extending the mapper having got some member variables here so we are having
4:44
this int writable output key and also the map we are overrating the map method
4:49
so XML parts you know we're having one function one function is about XML to map
4:55
which will take one XML as input and returns the hash map object as output and
5:00
that hash map will instantiate XML XML X parts that is a hash map object here we're going for the string link type is
5:09
equal to XML parts dot get so link type ID so from that very link red ID from
5:14
the XML that is a hash map we initializing this link type if link type is equal to null then you shall go for return otherwise link type is equal to integer dot value of link type dot I nt value so converting it to the integer we are
5:28
updating that one and output key will be updated with this INT link already
5:34
we have defined this output key that is an int writable so now we are going to
5:38
write the benefits dot right that is a key and value paired OPE key is containing
5:43
the key and the value is the value and that is getting written temporarily
5:47
We kept it under this try catch block. Here we're having the reducer name of the class is post links reducer
5:54
So this is a reducer class which is just we're overriding the reduce method
5:58
Text values dot values. So values is iterable there. So context. .write value comma null writable
6:05
So here in the context we're writing the key value pair here
6:10
So now let me discuss this main function. this main function requires three common land arguments if the length is not
6:17
equal to three that is error three arguments are input and this is a string that
6:23
is one blank space three I told you that we're having this type ID will be
6:28
one and three and also we required the output output file output folder defining
6:35
the job which is the job will be name has partition the post links by link
6:40
type ID that is a job we are defining also where a the jurr class and then the mapper class already we defined the partitional class
6:48
we defined set partitions that is a job comma argument one so argument one means
6:54
what that is a that is a word there is a one and three so there is argument one
6:59
and three yes now go for job dot set reducer we're just rid of putting the reducer
7:05
there whatever you have defined already we have just putting that one and then
7:09
we're going to create two reducers two partitions you know where going to have two partitions one for partition with the type ID one one for another
7:18
one for type ID 3 so we require two reducers two part files will be there in
7:23
the output so we are going for add input path and set output path with the
7:28
argument zero and argument two input path and output path so that is the input
7:33
and the argument two will be the output path whatever you are giving then we are
7:38
going for set output key class set output value class set output format class so everything we defined earlier's we're just mentioning that
7:47
one and depending upon the completion either zero or one will be returned so now
7:52
we shall create the jar file so we shall go for the respective project and then go for the right button click as we show we have shown that one for multiple times then export and select the jar file name and the respective path click on finish and next and
8:07
finish in this way the jar file has got created we have already created the jar file so you are going to execute the command so Hadoop jar then we're having
8:17
this map produced design pattern jar files is a folder and data organized
8:22
patterned or jar is the jar file name then after that we're giving the package name that is a partition pattern
8:31
is a package name and we're having the class there input folder we're giving that is a post
8:38
links and this is the string we're passing that is one and three and the output folder so as we
8:43
mentioned earlier so these are the respective parameters we are passing we're executing
8:50
the command I hope that we'll be getting two part files one for ID one another for ID 3 so let me go for the output folder
9:00
so here is the output folder there you can find that we are having this part
9:05
file ending with zero another part file ending with one so two part files are
9:08
there one for ID 1 another 4 ID 3 so let me
9:12
agree see the content so going back back to the console again
9:19
going back to the console and we are going to print the part file content
9:24
so here we had deducer to so the command will be HDFS dfs minus cat slash
9:34
will be going for say slash output slash the file name is part and then are then
9:42
zeros and then all zeros will be there so five zeros and then press center
9:48
you can find that in this case just look at the last one there is a link type ID is
9:54
for all the cases so that is one partition we have done we've done this data
9:59
organization so one partition we have done see the last field link type ID is
10:03
equal to one otherwise we're having the ID creation data and other things we're
10:07
having for the next one link type ID is equal to three here so in this way
10:15
you have you have seen that how we have implemented this program we have discussed
10:19
line by line step by step and so on we can delete the output folder as usual
10:24
for the next MR tucks to execute. I think you got my point. This is the command I'm issuing
10:31
And this output folder will be deleted. Thanks for watching this video
#Computer Education
#Data Management
#Educational Software
#Java (Programming Language)
#Programming