MapReduce and Design Patterns - Composite Join Pattern Example
2K views
Oct 18, 2024
MapReduce and Design Patterns - Composite Join Pattern Example https://www.tutorialspoint.com/market/index.asp Get Extra 10% OFF on all courses, Ebooks, and prime packs, USE CODE: YOUTUBE10
View Video Transcript
0:00
In this video we are going to discuss composite joint pattern example
0:05
So in this example we shall implement this joint pattern. So let us get the assignment at first
0:11
At first we need to run the formatting task for the both 2 XML files
0:17
that is the users.xml and posts.xml individually. It stores the output in a different file
0:24
And then in the second phase, it takes the data from newly created files
0:29
and performs the joining. So we are dividing our assignment into two tasks will be executing
0:35
them one by one. So let us go for one practical demonstration of this concept
0:41
This is one example on composite joint pattern which is falling under the joint pattern category
0:48
So in this joint pattern design pattern in this particular problem we shall have two parts of the
0:54
task. So a tax will have the two parts. First one is the formatting part
0:59
and other one is a join part so in case of join where in case of formatting part so these particular
1:05
task will run for users and also for posts so users will be the users.xml and post will be the
1:13
posts dot XML and then the output which will be produced for post will be kept in the user's
1:20
format folder and the post dot XML output will be will be post onto the post format folder in
1:27
sdf s and these two folders will be given as inputs to the main join in tasks and join
1:33
type here will be inner join or the outer join so let me tell you the respective so here we're
1:40
having the input folder let me go to the input folder you can find that we're having the
1:44
folder post under that we're having posts dot XML so let me come out let me go to this user folder
1:51
input input slash users we're having users dot XML so these two files are there so
1:57
So let me also show you the content of these two files. At first we are concentrating on this posts.xml under the posts tag
2:05
We're having multiple rows are there. We have shown some of the rows here, but main file is having so many different rows
2:11
having different attributes under the row tag, that is our ID, post type ID
2:16
accepted answer ID, and so on. There are so many different attributes are there under the row tag
2:23
And this is a content of the posts.coml. So I think we're getting my point see the content here we're having multiple
2:32
attributes users dot XML under the users tag will be having the rows row tags are
2:38
there having got so many different attributes are there ID reputation creation
2:43
date display name last access date write website URL and so many so many
2:49
attributes are there under each and every row multiple records are there in this
2:53
particular users dot XML So we shall open the Java program and here we'll be having two Java classes
3:03
One is the comjoin formatted.java. Another one is a comjoin m r tachs
3:07
So at first we are discussing the first one. In case of comjoin formatted formatted
3:14
We are having the one inner class which is extending the mapper, which is exiting the mapper
3:21
And here this is a com join mr tax.java. That is the other
3:25
We'll be discussing that one later. So now let me considerate on this
3:29
So com formatted mapper is extending the mapper class, which is having one variable that is the output key
3:37
of the type of text and it is the mapper. So we are going to this, after defining this output key
3:44
we'll be going to override the map method. Within the map method, we're having map object
3:50
that is XML parsed, which will be instantiated after the, from the return argument of XML
3:56
XML to map method. This XML to map method will be taking one XML file as input and
4:04
this is a corresponding is this a method which will be taking one XML file as input and
4:10
returns one Hashmap object as output and that will initialize the XML parsed. We're having
4:15
the file split class object and it will be having the context.gat input split. String object is file
4:25
split. Get part. get name. So that will be initializing this file that is the string
4:30
So the name will be kept onto this file. Now we shall initialize the, see, this is a line
4:38
which will initialize the string file User ID will be initialized with null If file dot contains user that means if the file name is containing user any kind of user anywhere so containing
4:52
user string so uID is equal to XML parts dot get ID so will be taking the
4:57
ID against the value will be kept against ID onto this UID but if the file
5:02
contents if the file name contains post anywhere in that case it will take the
5:07
owner user ID and that will initialize the I and after doing this initialization if I find that ID is still remain null then I
5:16
shall return I shall return if the ID still remain null otherwise output key
5:21
will be set with this ID and you shall write this key value pair that is
5:26
output key and value temporarily onto the context keeping the whole block on the
5:31
try and catch so let me explain my main function so you know it is the
5:38
com join formatted Java you should have the class name then the input folder and the output folder so it will be having
5:45
required two arguments there so two folders are to be passed we're creating one
5:50
job instance the job name is composite joint formatted is a job instance
5:56
we're setting the jar of class we're setting the mapper class this is a
6:01
mapper class and jar class also we have set we have set now it is a normal
6:07
reducer the normal reducer work so one reducer for small data set now we shall also initialize our input paths using
6:17
argument zero that is the first argument also the set output path which will be
6:24
initialized with this argument one so input paths with argument zero and
6:28
output path will be initialized with this argument one and then set output key
6:33
will be the having the output key class will be text class and output value
6:37
class will be the text class and it will return zero for successful completion otherwise three
6:43
Now let me come to this join, com join, a more task
6:47
So under this one, we're extending the base class, that is a map reduce base class
6:52
and also implementing the interface mapper, implementing the interface mapper. So here we're defining the map method here
7:01
So under the map method, see, we're having this map method. This map method, we have written only a single line
7:08
that is output. collect that is the output dot collect output is output collector object dot collect
7:13
we're converting text type value dot get zero that is the first value within
7:18
this value value is a tuple writable that will be converted to text and value dot get
7:24
one that will be also converted to text and that will initialize this output
7:28
collect this output will be will be initialized accordingly so only a single
7:33
step we have written this one is a very basic one that is a com join mapper
7:37
extends map reduce base and also implementing the mapper interface so I think you were getting my point
7:44
how we did this one now we shall so this is the ultimate part there yes so only a simple
7:55
step we have written here this is a step now let me come to this main function
8:00
so within this main function we have defined one job configuration object and it
8:05
will be initialized the name is the composite join user comment we're setting the jar file class that is a com join mr task
8:15
dot class is the jar file class here you can find that we require four
8:19
arguments to be passed four arguments that less than that or greater than that
8:24
will not be accepted so system dot exit one we require the class name then we
8:29
require the user data path then post data path then output folder path or folder
8:35
and here you can pass either inner or outer as I told you that here we'll be doing either inner join or outer join user file path has
8:43
been initialized with the first argument that is the arc zero common file path
8:48
will be initialized with the second argument that is the arcs one output will
8:52
be initialized with the third argument that is our arcs three that is arcs two
8:58
and joint type will be initialized with this arcs three if the joint types equals
9:03
to inner or if the joint tabs is equals to outer and if it is not true that
9:08
means I put not before that then error message will be printed. Setting the mapper classes
9:13
and it is a map only job. So that's why for set reduced task I'm writing zero. I don't
9:19
require any reducer here Next set input format There is a composite input format dot class mentioning that one which will be the input format And here the config Set Maprate Composite input format that is a join type And
9:42
then Q, key value text input format.com slash which we mentioned earlier. And then that is
9:48
a user file path and comment file path. So these are the parameters to be passed. These are the
9:55
parameters to be passed for this compose method and that will that will instantiate our
10:02
configuration that is a config dot set accordingly the config dot set will be done
10:07
now we shall go for this text output format we shall go for the text output
10:13
format so there is a set output path config and output config and output will be the two
10:19
parameters for setting out this text output format then you shall go for the text output
10:25
key which is of the class will be of text class set output value class will be also the text
10:31
class this two we have initialized both of them will be of the type of text as marking i'm
10:40
just marking that one now we defining one job defining one job that is a running job
10:45
job job config so according to the config whatever you have defined so the job will be configured
10:52
Of accordingly and then if the job is job is not complete then you shall wait for one second if the job is not complete
11:00
then we shall wait for one second 1,000 millisecond means one second with the job
11:07
is successful then 0 will be returned otherwise 2 will be returned so accordingly we have
11:11
written this com join mr tux. java now it is the high time to define the
11:21
respective what should i say the respective jar file how to create the jar file you know
11:25
going to the package and then right button click export then jar give the path and the
11:32
respective path and the jar file name we're going to create the jar file as we did
11:37
in the other cases also but here we have already created the jar file so we may
11:42
skip the step we can directly go to the console to show you that how the commands
11:47
are to be executed as I told you that we are supposed to use this users dot XML and
11:52
posts dot XML will be creating user format folder and post format folder and then you shall go for this map reduce tax initialization
12:01
so let me write my comments so it is a long command for the first two times
12:07
we'll be issuing the same command changing them parameters and then you shall
12:10
call the MR task accordingly haddup jar then you shall go for the respective
12:16
folder for there is a jar file folder so map reduce underscore
12:25
design pattern then the jar files jar underscore files join pattern dot jar then you shall mention
12:33
the respective path here respective package here package dot class so composite join pattern
12:44
is the package name now we'll be writing the class name we're calling the first class that is a con joint formatter then you shall give the
13:01
input path of this posts dot XML that is the input slash post and then you
13:09
shall go for the other one is a and this is a post formatter that is pretty path I'm
13:17
giving so one error is there okay there is a spelling mistake is there so let me
13:23
make it jar I think the command the rest of the command will execute as it is so let
13:29
me go for this jar and let me execute the command now yes it is working so it
13:38
will read the file from input post and it will put that one to the root post
13:43
format folder will be issuing the same command also for
13:53
for the user but in that case it will be the user format and the XML file name will
14:00
be users dot XML so now let me bring back the previous command so from the history
14:06
you can press the up arrow to bring back the previous comment now let me delete this one because it should be user format is a folder name and here also for input we supposed to delete this post should make this one
14:27
as users user yes user okay now let me execute the command so reading to
14:40
xml files the formatted output has been dumped onto the respective post format folder
14:46
and user format folder as you see you have seen yes the command executed
14:55
successfully let me come to this folder now let me go back here you can find
15:04
that we're having this user format and the post format user format is having this
15:08
part file user format user format is having this part file user format is also having the part file two part files are there already we have created now coming
15:19
back to this console again so now let me execute the MR task as is well the
15:31
hadup jar and then path of the jar file then the jr file name then the package name then the
15:45
package name then the respective class name we'll be taking the class
15:58
name as MRTAS class class
16:10
I'm just giving the path of the format user format and also the post format paths I'm giving and also mentioning the output folder and the type of the joining we can have inner join we are going for inner join we can also go for outer join in this case
16:35
so you see you have shown that you have seen that how the different commands are
16:40
to be executed one by one we have explained our program we have explained our
16:46
program line by line we have shown all the commands we have also described that
16:51
what are the purposes we had and how the folders are getting created with
16:56
the respective part files so now let me show the content of the part file got
17:02
created under the output folder and you can find that inner join has
17:06
taken place minus cat then we shall go for slash output slash part star this is the content of the
17:24
part file after joining inner joining I hope that you enjoyed the video you
17:31
got the details that how we executed our code executed our commands step by
17:38
step we explained there is a content of this joined output inner join output
17:49
it's a long output I think it will be having a huge mb count also this is the content I'm just
18:08
that one on the screen you can see and read this is the content of our output I
18:17
shall show you the part file again so we shall go for refresh then the output
18:22
folder will come yes I've done the refresh output folder has come see the part
18:28
file having got size of 169.17 mb so it is really a big file that's why the
18:35
output is so big now we are just going to delete the output folders it is a normal practice so that we can execute
18:43
the next MR task I hope the conception has become clear to you output
18:59
thanks for watching this video
#Programming