MapReduce and Design Patterns - Replicated Join Pattern Example
1K views
Oct 18, 2024
MapReduce and Design Patterns - Replicated Join Pattern Example https://www.tutorialspoint.com/market/index.asp Get Extra 10% OFF on all courses, Ebooks, and prime packs, USE CODE: YOUTUBE10
View Video Transcript
0:00
In this video we are discussing replicated join pattern example
0:05
So we shall go for the implementation of this pattern. So this program takes posts.xml and users.xml file and joins the records
0:16
And we have to select the join type while executing the task
0:20
So what kind of joining we're going to do that has to be selected
0:24
The join types are the left outer join, inner join and etc
0:29
let us go for one practical demonstration for the easy understanding of this concept
0:35
in this example we'll be going to implement one replicated joint example which is falling under the
0:40
joint pattern here we're having two XML files one is the posts dot XML which will be under the
0:46
folder input post and then users dot XML which will be under the folder input slash user
0:53
you can find here the post and user folders subfolders are there under the input folder in the
0:58
name node so let me go for the let me show you that if you go for the post folder
1:03
that is slash input slash post you can find posts dot XML is there now we'll be going
1:09
for this slash input slash user and we shall be getting this users dot XML there let me
1:16
show you the file contents so we're going for the file content so at first we're discussing
1:22
that is a posts dot XML within the post tag we're having the row tags multiple rows are
1:28
with multiple IDs multiple attributes ID post type ID these are the different
1:33
other view count then body then we're having this say this last edit date so
1:41
so many attributes are there so here I just marking that one highlighting that
1:46
one for your understanding we're having the tags there is answer count comment
1:51
count favorite count community own date and so on so these are the multiple
1:56
attributes we're having under users dot XML under the row we are having multiple IDs and all these rows are within this users tag
2:09
Users dot XML is having so many different rows But I have shown some of them so abhorts downboards account ID and so on
2:18
So in this way you are getting the content of this so let me go for the eclipse we're having only one class here that is a duplicate job
2:27
a join MR task it is a map only so no reducer will be there within this class we have defined
2:34
one inner class that is a replicate join comment user mapper which extends mapper within
2:40
this mapper we have defined one variable one object that is a user map of hash map type
2:45
which will take string and string as key value we're having this output value of the type of
2:50
text and join type of the type of string so there is a joint type we're overriding the set-up method
2:57
In the setup method we are having, we are checking the context
3:01
Dot get cache files if it is not equal to null and context. Dot get cache files dot length is greater than zero
3:08
Then we just defining one file object that is a user file So this rest of the code will be there User file is equal to new file Dot slash users and then buffer deeded object has been defined So after defining after
3:22
initializing that file object that is a user file we are going for this buffer reader object
3:27
also. Then within the while before going into the while we're having the line if the line
3:33
is equal to buffer deeter dot read lines will be reading line by line XML parts is there
3:38
So one map object XML for parts which will be initial as with this XML to map method. So let me go for the XML to map method. This method will take
3:49
one XML file as input and returns one hash map object as output. So here we are having this XML
3:55
to map method. This method will take XML as input and hash map object as output that will
4:01
instantiate our XML parsed. So from the XML parsed, we are going for .get ID. So the ID
4:08
will be taken in this user ID. If user ID, is equal to is equal to now then continue I shall I shall not go for the rest part of the code
4:15
otherwise user map dot put user ID comma line so we're just putting this this one
4:21
user ID and the line content will be put onto this user map we have kept it in the
4:26
try catch block we're kept it in the try catch block and you know we're having this buffer
4:33
reader we have just made that one close and here you can see that we have joint type
4:38
has been initialized with is equal to context dot get configuration. Dot get join. Type. So after closing this buffer reader, we are just going for
4:48
this join type. So joint type is equal to context. Dot get configuration dot get join
4:53
So regarding this joint dot type we'll be discussing that one later. So here we'll be selecting
4:58
only two types of join. One is the inner and the one is the left outer. Now this is my map method
5:04
Within this map method we are having this XML parsed. There is a map object. Again it is getting instantiated using this XML
5:11
to map which you did earlier also is XML to map I used that one earlier also yes
5:17
at the line number 39 the XML to map we're having this string owner ID so
5:26
from the XML to XML parts dot get user owner user ID we're initializing owner ID
5:31
if owner ID is null then I shall return map will not get executed as part so user
5:37
info is equal to user map dot get owner ID so now user
5:41
info will be initialized against this particular owner ID the user info if user info
5:47
is not equal to null the output value will be set with the user info and we'll be
5:52
writing onto the context that is key value paired value and output value so that will be
5:56
written onto the context temporarily else if joint dot type equals ignore case left outer if it
6:04
is left outer then we'll be writing this one as value comma new text null so if we are getting
6:10
the match then we'll be writing the value and output value otherwise we'll be writing
6:14
value and null because there is a left outer joint so now let me discuss our main
6:20
function let me discuss our main function here we have we require four
6:25
arguments so first one is the class name then post input folder user file location output folder then output folder then we having the fourth argument it should be either inner or left outer so if the argument length is not equal to
6:41
four then we shall exit now from the fourth argument that is the arcs three will be initializing
6:47
the joint type so joint type should be either equal to inner or left outer if it is not that
6:54
then error message will be printed joint type not set to inner or left outer or left outer and system dot exit 2 so here the program will get terminated now we shall
7:04
define one job instance we shall define one job instance so here the name of
7:09
the job is replicated join to posts and users so this is the job name we have
7:15
defining one job instance get configuration join type join type join type join dot type comma join file class has been set
7:23
jar file class has been set replicated join mr tux dot class so this joint type you have taken this one from the fourth argument as I
7:32
discussed earlier also again I'm repeating it should be either inner or left
7:36
outer set mapper class will be initialized but before that just see join
7:42
type we used earlier here we used the joint or type see the line number in
7:48
the line number 54 you have used this joint type in the line number 54
7:57
here we have used this joint type now that has been you that has been initialized
8:03
by the input argument that is argument number four now we're just setting the
8:08
mapper class the mapper class name has been set and reducer there will be no
8:13
reducer to reducer count is zero input path has been set with this arc zero that
8:19
is the first argument and output path has been set with the third argument that
8:23
is the arcs two that is output path so output key class
8:27
will be text type output value class will be also text text type text dot class and add job
8:34
cache file will be that that one that is a new uri arg 1 plus slash user so it is a we are doing this
8:41
concatenation that is a second argument plus this has users and depending upon the job completion
8:47
status 0 or 3 the value will be returned so now let me let me create that expected jar file now
8:53
So you shall be going for the package and then right button click and then export and you shall give the path and the jar file name as we did for other
9:02
cases so then we'll be creating the jar file I have created the jar file already so I'm skipping this particular step
9:10
so we are supposed to go for this export there. So now going for the terminal. So here is the command we're having
9:18
here is the command. So Hadoop jar then we'll be going for the
9:23
jar file name along with the path at first and then join pattern dot jar is a jar file name
9:29
Replicate join is the package name So next one will be the replicate join will be the package name and then replicate join a mark task will be the class name
9:41
Then we going to have that folder that is the input post and then input user Then users dot XML we are also mentioning users dot XML output folder and here we require the joining as inner
9:57
here we require the joining as inner let me execute the command instead of inner
10:02
also we can go for left outer because two options were there in our hand so let me go
10:07
for the execution of this command you know that in case of left outer join so all
10:12
all the all the records all the rows will be coming from these posts dot XML and
10:17
those records will be coming from the users or XML whenever there is a match
10:22
otherwise that will be printed as null null string is there so all the
10:27
records will be coming from the left-hand side there is a there is this first one
10:32
whatever you are mentioning and next one is my users users or XML so inner join
10:37
means all the records will be coming from the left and those records will be
10:41
coming from the right which are having a match so let me go for the content printing
10:46
of this part file so that you can see sdf s dfs minus CAD slash output slash part
10:54
star yes now let me place enter so this is a content there is a content you can
11:02
see the long content are there so all records will be coming from posts those
11:09
records will be coming from users for the joining wherever ID will match there is a
11:14
left outer join so I've discussed each and every step into details each and
11:21
every Java code we have explained I think now you're comfortable to work in
11:25
this example let me show you some rows with how the things are coming
11:39
Let me mark some of the rows there. Let us consider, but having so many, so many rows are there
11:47
All the rows are coming from posts. So let us go for one row marking
11:54
Say this one row, there's the last one I'm just marking. ID is 256058 and the other details are there
12:04
That is another row which is having a post involved. So that's why it is a long row, the content
12:09
In this week and you can go through the content whatever has been produced you can see all the IDs are there whatever there the score the view count the body and so on everything is there
12:22
that is that is a complete row comment count and so on this is another row reputation creation date display name last access date views up votes down votes account ID
12:39
everything is there so I think now you are comfortable to work in this example
12:45
so what you shall do we shall delete the output folder so that we can execute the
12:52
map reduce stars so we'll be going for this and then minus rm
12:58
minus r and then output thanks for watching
#Programming