MapReduce and Design Patterns - Reduce Side Join Pattern Example
151 views
Oct 18, 2024
MapReduce and Design Patterns - Reduce Side Join Pattern Example https://www.tutorialspoint.com/market/index.asp Get Extra 10% OFF on all courses, Ebooks, and prime packs, USE CODE: YOUTUBE10
View Video Transcript
0:00
In this video we are discussing Reduce Side Join Pattern example
0:05
So we'll be going for the implementation of this pattern so that you can understand how to
0:10
write the respective code, how to run it and how to get the output
0:16
So Reduce Side Join Pattern example. So this program takes posts.coml and postlinks.coml
0:24
So two XML files will be taken, file and then it will go for joining of the
0:29
records and we have to select the joint type while executing the task so the joint types are
0:35
left outer it can be the right outer inner full outer anti and so on so let us go for one practical
0:43
demonstration to show you that how these operations can be implemented using java in this example
0:50
we are implementing one join pattern and that is a reduced site join pattern example here we're
0:57
going to have two xml file one is mismo file one is a posts dot XML and another one is our post links dot XML this post dot XML is
1:06
residing under input slash post so you can find that we are right now under the
1:12
input folder and there is a input slash post posts dot XML is there we shall
1:17
show the content later on so now let me come back and then we'll be going for
1:22
input slash post links input slash post links so under that folder will be
1:28
having post links.xml. So this is our post links.xml. Now let me show you the content
1:43
So at first we are showing the content of posts.t.m. We're having everything under the
1:48
posts tag. Under that we're having row tags. Each and every row has got multiple attributes
1:53
that would like your ID post type ID accepted answer ID will be having creation
2:00
date score we shall have the view count body owner user ID last edited user ID
2:11
last edit date last activity date we shall go for title tags answer count
2:19
comment count favorite count and community owned date. So these are the multiple attributes are deciding under the row tags
2:30
This is our next one, there is a post links.xml. Under the post links tag we are having the
2:36
rows with the attributes ID, creation date, post ID, related post ID and link type ID. See here the
2:44
attribute is post ID and the previous one it was attribute was ID in case of posts.xml. We're having
2:51
the Java file only one Java file that is our red site join MR task is the name of the
2:57
class and these class is having two mappers we are having two mappers so the first
3:04
mapper you can easily see that is a red left post link mapper and this one is a left
3:10
post mapper and other one is the right post link mapper so two mappers are there is the
3:16
right post link mapper so two mapper classes we have written so for the left
3:21
left post mapper which extends mapper we're having two member variables one is a there is
3:26
a output key and output value both of text types we have overrating the map function we're
3:34
having the xml parts which is the hash map key there is a hash map object which has been
3:39
initialized with the xml to map method this method will actually take one a xmil as input
3:47
and it produces the hash map key as output, hash map object as output
3:54
So that is a purpose of XML to map. See the content here
4:00
Now let me go for the XML parts. So XML parts the object has been created, map object
4:05
Now we will be having our string post ID. So a post ID will be initialized from XMLParts
4:11
Dot get ID against the ID the value will be kept, respective ID will be kept here
4:16
the post ID is null if the post ID is null then you shall not go for the next otherwise
4:20
the output key will be set with the post ID now see how we are keeping the value on
4:27
the output value so here in case of after setting this output key with the post ID
4:33
we're going to do the update keep keeping the value on the output value so here
4:39
value will be converted to the string and one character capital P will be pre-pended
4:45
So it will separate posts by adding P at the beginning. So this particular character will be pre-pended with the value converted it to the string
4:53
and then it will be written back onto the context, back onto the context that is output
4:59
key and output value. So temporarily that will be written onto the context
5:03
Everything has been kept within the tri-catch block and you see here we'll be writing
5:08
the output key and the corresponding value there. Now let me come to this right post link mapper
5:14
So it is also extending mapper. Here also we are defining two text objects One is the output key and another one is the output value So two text objects We overriding the map method
5:27
Here XML parts again is the hash map object. So it is being instantiated with this XML2 map method calling it
5:36
So post ID is equal to XML parse. . Now this post ID, look at here
5:43
Here in this case, here in case of post links, it is a post ID. a post ID see in case of post links the attribute name is post ID but in case of
5:51
posts the attribute name is ID so you are supposed to remember this one so in
5:55
the post links it is post ID in case of posts dot XML it is actually ID so they
6:00
are actually corresponding so here you are taking not the ID but the post ID if
6:04
the post ID is null then you are returning otherwise the output key will be set with
6:08
the post ID and here you see for the output value after keeping this output key
6:14
with the post ID see in case of output value where converting this value to the string and pre-pending one capital L at the
6:22
before of this and then that will be setting that will be put onto this output value
6:28
So context. Right output key output value so in the current context it will be written there
6:35
The current context it will be written. So now we shall discuss the reducer
6:42
So join reducer is the name of the class which is extending the reducer. Here we're having two array lists, one is the posts, another one is the post links, two
6:51
array lists we have taken and they'll be containing texts. Here we are going to have the string join type is equal to null
7:00
So initially the joint type has been kept as null and now we shall override the setup method
7:06
We shall override the setup method. So in case of setup method we shall be writing joint type is equal to context
7:13
get configuration and dot get join dot type you can have this one that is a joint dot
7:19
joint dot type so here we are having again I'm showing that these are posts and
7:25
post links both of them are the addilists there is a string joint type
7:29
initially it will be initialized with null I'm just repeating once again so now
7:33
this is a setup method we're overriding the setup method and this joint tab
7:38
will be initialized with this joint dot type what is a join dot type that will be
7:42
discussed later we'll be having actually multiple different depths of joints are there we'll be having the
7:47
left outer join right outer join inner join and anti join so there are one two
7:52
three four types of joints are there we'll be discussing that one later so now
7:57
let me come to the reduce method we have over a ten we have cleared all this
8:01
at a list we have cleared them then for each and every value in values you know
8:07
the value is nothing but one iterable object and we are checking whether the
8:11
value is having the character at zero is p or not so character at zero p means it is a post it is a post here so
8:20
character at zero p so if if it is p then post dot add otherwise if we if
8:26
post dot add that is a value will be written there and then in case of post dot
8:31
link we're checking whether character at zero is L or not so joint tables context so
8:37
this is a joint table so this is a very important table we'll be having four
8:41
different types of joins you know that that in case of inner join we're supposed to take only those records where the both the IDs
8:50
are same so that means we are taking the inner join here going for equals ignore case so
8:57
if the post is not null is empty not of that is a not posts is empty and we're having this and
9:05
not post links is empty so in that case if both are present then only we'll be going for the
9:11
inner joining will be going for the inner joining you can find that inner
9:14
join means when both IDs are present and matching then only the the records
9:19
will be the rows will be will be joined so that is a logic of the inner join
9:24
so we are executing to nested fall loops and we are just writing onto the
9:29
that is a key value we'll be writing this one that is a post and post link now let
9:35
me come to this left outer join in case of left outer join this is a body in case
9:39
of left outer join what will happen We shall take all the rows from posts but if there is a correspondence ID in the post links
9:48
then only that post links record will be the row will be joined
9:53
So here also will be writing the context right context. That is the post dot post comma post links and otherwise we'll be going for context
10:00
Write post new text blank because the post link is not present against that post ID
10:07
So same thing will be happening but in case of write out of join we'll be taking all
10:11
the roles from the post links and those rules will be taken from the posts which
10:16
will have in the proper ID matching otherwise otherwise will be taking all the
10:23
records from the post link if there is no match so that has been depicted in
10:27
this logic and that is a right outer join that has been depicted in this logic
10:31
there is a right outer join so in that case there will be no record from the post
10:35
but the post link will be there if the IDs are not matching so let me go for the full outer join in case of full outer join we are just checking this one in case of full outer join what will happen All the records will be coming from left and also from right If there is a matching then the matching records will be coming
10:54
But if the IDs are not matching, then the record will be coming, keeping the either the right
10:59
end null or the left end null. So accordingly it will be doing that one. So here you can find
11:05
that we have written this one in the nested fall loops and
11:09
the logical statements I'm just showing you that how this full outer join will be
11:14
taken so if the post links is empty then you are taking this one from the B and
11:19
writing that one post and B otherwise we're writing this on the context we're
11:23
writing post and new text null there otherwise we're writing the B and keeping the new
11:30
text null and B so in this way you can find that we're writing onto the context
11:35
or write so that is my full outer join now we shall go for the anti join in case of anti join only those records will be
11:44
existing which are not common which are not common so that is the anti join here
11:49
so if the if the joint type is anti then we're checking if the post is empty
11:54
or post links is empty so zor means what you know that zero zero one one means
12:01
zero but zero one and one zero will produce one in case of zor so either one if the if the links
12:08
IDs are not matching either this side not that side or lever right-hand side not on the left-hand side then only those respective posts will be joined here
12:18
so that is my anti-joint that means those records which are not available in both the respective
12:26
both the respective tables so otherwise we shall throw one runtime exception in
12:33
case after that we are having the else part we shall throw the runtime exception
12:38
if the corresponding joins are not working you can find that I'm just marking this one to
12:43
show you that how the things are working so we are writing the post and the new text will be
12:48
null if the part of our other one we'll be writing the that is a null at first and then
12:55
will be writing the post link so in this way the anti-joint will be taking so after
13:01
considering all these join operations we shall come to this else part this is the
13:06
respective anti-joint body will be coming to the else part and here we'll be having
13:10
throw new runtime exception so if the joint types is not set to inner left
13:16
outer right outer full outer or anti then obviously we'll be throwing one
13:20
exception here so this is a runtime exception will be throwing with this
13:24
proper message so we the next function is about XML to map so
13:36
XML to map we have discussed this one earlier XML to map which will take the XML file as input and produces the hash map object as output so there is a XML to map here I'm just marking this one okay now let us discuss the main function so in the main function we are having we required four arguments
13:55
common land arguments to be passed so what are the four arguments that is the first one is a class name then the post input directory there's a folder post link input
14:06
folder then we shall go for this post and then we'll be going for the output
14:11
folder and then we are having either we'll be writing inner or left outer or
14:16
right outer or full outer or anti so if this any one of them will be written
14:22
so that I'm giving the guidance to the user if user is not providing four
14:27
common land arguments though this message will be coming because args dot
14:30
length is not equal to four then it will be coming so joint type is available at
14:35
the last that is at arcs 3 there is a fourth argument so that's why joint
14:39
type will be initialized with this arcs 3 so now we are writing this one that
14:43
if not joint type dot equals ignore case inner or joint type dot equals ignore case
14:48
left outer or joint type dot equals ignores right outer similarly for full outer and
14:55
all so or anti then we'll be producing one system error we shall tell that
15:00
joint type is not set to inert left outer right outer full outer or anti system
15:05
them dot exit 2 we'll be creating one job instance the name of the job is
15:10
join posts and post links on post ID so this is a there's a job we have created
15:16
the job will get the configuration will be setting the configuration with the
15:20
joint or type and that will be initialized with a joint type that joint type we have
15:25
got from the argument number 3 so you can see that here we have written this joint type yes the in here you will return in the setup at the line number 78
15:34
we've written this joint or type okay so now we will be setting the jar file
15:42
class there's the jar file will be creating so jar file class will be clear
15:46
will be set then you shall go for multiple inputs add input path will be
15:51
taken that one from arc zero and text input format will be class left post
15:56
for mapper will be the dot class will be the respective classes
16:00
are mentioned and then absolute input path will be taken that one from this arcs one There a text input format dot class and left post mapper
16:12
We are mentioning the respective mapper classes here So similarly you have done the same but for arc zero and arcs one so two input paths will be having so multiple inputs
16:23
Now we shall go for the Here you see the respective classes have mentioned now we shall go for the respective classes have mentioned now we shall go for the respective
16:31
reducer so the reducer class name we know that we have given this that is a
16:36
join reducer dot class so set reducer class this one there is a set a join
16:41
reducer dot class we shall go forward there is a set output path there is a job
16:48
and new path arcs 2 so there is the arcs 2 will be having the respective the
16:53
output path will be mentioning in the argument number 3 that is
16:58
a arcs 2 we'll be going for set output key class you can find that we'll be going for the set output key class and set output value class
17:06
both of them will be text and the completion will be successful if it returns zero otherwise
17:12
three means it is unsuccessful so in this way i've written the main function now it is a high time
17:19
to to make the respective jar file you know how to create the jar file going to the package
17:24
right button click export jar give the jar file path and the jar file name and we'll be creating
17:30
the jar files well already we have created the jar files so we're skipping that step so let me
17:36
come to the command what command will be required to execute this particular program will be going
17:43
for hadu jar then the jar file path and the jar file name then we'll be going for the package
17:51
then the class name then we'll be going for the slash input slash post there is the first
18:00
input path input folder next one is a slash input slash post links so that is
18:06
our second input folder and then slash output that is output folder and here the
18:11
joint type is left outer here the joint type is left outer so let me execute my
18:16
comment here so you know that left outer means all the all the rows will be
18:23
coming from posts all the rows will be coming from post dot XML and where the
18:29
the IDs will be matching in case of posts the attribute is here you can see that
18:33
the name node is actually in the shape mode so let me make it come out from the
18:37
shape mode so let me issue the command so that it will come out from the
18:42
shape shape mode leave so all the links will be all the rules will be coming from
18:58
the post and only whenever there is a match in the post links.xml then only they'll be joined
19:06
actually so that is a left outer join so we'll be getting all the post rows or
19:11
you'll be getting those post links where the post post ID is match posts ID
19:18
is matching with the post links post ID so that is a concept of left outer so let me
19:25
show you that I'll be showing the output also So let me print the output that is a part file content of the part file so we'll be going for
19:36
s df s so let me write the command for this part we'll be going for s dFS
19:45
df s minus gat slash output slash part star
19:59
So here is the output it is about to come. See the output here. We have discussed each in every step into a details. We have discussed the XML file formats. We have discussed the Java files. How to make the jar file. How to execute your code. What is a command? How to see the outputs. Everything we have shown you step by step. So I think you are comfortable with this with this topic now. So here you see just. Just
20:29
consider any row you can find that all the rules will be coming from posts where the
20:34
IDs will be there and against that post if there is any any kind of links in the in the in
20:39
the in the post links dot XML and if the post ID is matching there then the respective part will be
20:45
coming so you can you can go through this content you can find that things are working well
20:50
so let me delete the output folder see the content here I'm just skipping
20:55
so here we are having the multiple rows are there for each row if the links post
21:03
link is having the the respective post ID is matching with the post ID then only
21:08
the rules will be marched otherwise all the post dot XML rows will be there
21:13
so let me go for this deletion of this folder it is optional actually so I think
21:18
you have enjoyed this video output yes Thanks for watching
#Education
#Programming
#Web Services