MapReduce and Design Patterns - Job Merging Pattern Example
2K views
Oct 18, 2024
MapReduce and Design Patterns - Job Merging Pattern Example https://www.tutorialspoint.com/market/index.asp Get Extra 10% OFF on all courses, Ebooks, and prime packs, USE CODE: YOUTUBE10
View Video Transcript
0:00
In this video we are discussing job merging pattern example
0:05
So here we shall solve one assignment and here we'll be writing the Java code, we shall
0:09
compile it, execute it and we'll be getting the output for the better explanation
0:14
So what is the assignment here? So in this example we will provide posts.xml and finds the distinct records as well as random
0:24
sampling on it using job merging pattern. So this is our assignment here and here we are going to use only one XML that is a post dot XML and it will find the distinct records as well as another job is there
0:39
There is a random sampling on it using job merging pattern. So let us go for one the implementation on this of this assignment for the better idea
0:50
We are discussing one problem that is a job merging pattern. Here we will be provided with two XML file
0:58
the posts.xml which will be under slash input slash post and another XML file that is a
1:05
user.comal which will be under slash input slash user and from these files we shall be finding
1:12
the distinct records as well as the random sampling on it using the job merging pattern and this
1:19
job merging pattern is also falling under the meta pattern design pattern. So here we'll be
1:25
explaining our code will be showing the XML files we shall get the idea we shall
1:32
explain line by line we shall show you that how the outputs can be obtained
1:37
so at first we are going to show you the respective XML file contents that is the
1:41
posts dot XML and users dot XML so here we are going for the slash input slash post we are
1:49
finding that posts dot XML is residing there and going for slash input slash user users dot XML
1:55
is residing there let me show you the respective users dot XML it is containing
2:01
multiple different rows but I have shown here only two of them for your
2:04
understanding under the users tag we are having the row tags and row is having
2:09
multiple different attributes like ID reputation creation data and so on multiple attributes are there for the each and every row tag now let me
2:18
explore this post dot XML under the post tag we're having the row tags are
2:23
there containing multiple attributes like like ID, post type ID, accepted answer ID and so on
2:30
So here we're going to discuss now the Java file. The Java file were written onto the Eclipse editor
2:36
We shall show you that how to write the Java files, we shall explain line by line
2:41
Here we will be having only a single class, that is a single Java file containing multiple
2:46
inner classes as usual. So here the class name is our job merging MR task
2:53
the job margin MR task we are defining multiple public static final string
2:59
variables that is a multiple output random multiple outputs our outputs distinct
3:05
they have got initialized with random and distinct string respectively their final so that is constant here we're having one inner class before going for the
3:14
inner class so as I've shown you that here are the respective final string we
3:19
initialize with this random and distinct so here we're having one inner class that is a random distinct that is a random disk mapper so this is
3:29
the extending the mapper class within that we're having only one private static final text that is a distinct outf value which will be
3:40
initialized with the text constructor other than this we're having the random
3:47
object we have defined one random object is equal to new random we're having one
3:52
that is a tag text that is a random output out key and also the distinct output key
3:59
you can find this one and they have got initialized with the respective tag text
4:03
constructor also we have having another one that is a text type that is a random
4:08
output value we're having this random output value see we have just marking that
4:12
one random output value and that has been initialized with this text constructor so we are having this set of private variables now we are going to override the map method within
4:25
this map method we are defining we're calling two functions one is a randomize
4:30
map and another one is a distinct map what are they so there the bodies were
4:34
written just below it so at first we're going to discuss the randomize map
4:39
within this randomized map within the try catch block we are passing this sticky value and context to this randomized map method and here
4:50
we're having the trigatch block where defined one XML parsed which will be
4:54
instantiated by the method output that is the XML to map which will take one
4:59
XML file as input and returns one hash map file as output here is a respective
5:05
method body you can find that from the lambda number 179 so now if the XML parsed
5:12
size is greater than zero that means if it is not null we're defining one string
5:18
builder class object that is a string builder and this string builder is getting
5:24
appended with this row that is a that is a that is a greater than row that means
5:29
you know that in our in our XML files we had the row tag so we're just add this
5:34
row tag to the start of the XML so that's why I'm just going to show you the
5:38
XML file content so that you can fill yes we had the row
5:42
tags there so similarly here also we are writing obviously the closing tag also
5:46
will be giving in our code so there is a row tag we are having now for entry string
5:54
string entry for XML dot entry set so here the key value pairs are string and
6:01
string we're reading this one from xml parse dot entry set so if the entry dot get
6:06
key dot equals is user ID or only ID then ignore it that means
6:12
ignore these fields we're not going to consider them otherwise if it is a creation
6:17
date then we are strip out the time anything after the capital t in the value
6:23
why you are writing this one as capital t in the value see here in the creation
6:28
date we're having say 2010-0-09 hyphen 1313 then capital t is there then the
6:35
time part is there this capital t is actually working as a delimiter between the
6:40
date and the time so you shall take from the very first index that is zero to index of capital t that means index of capital
6:48
c t so that's why you've written this one as string builder dot append entry dot get key plus
6:53
this is the symbol and then entry dot get key dot substring 0 comma entry dot gate value dot index
7:00
of capital t and then slash slash slash backslash there otherwise else we are just writing
7:07
the everything there so only for this this creation date we are considering that
7:12
one and then appending the closing that is the angular bracket that is the end of
7:16
the tag random output key will be set as a random output key will be set as a
7:21
and then random output key dot set text will be integer dot string converting
7:26
to this random integer to string so that will be the set text random value dot set
7:31
that is a string builder dot to string so it will be converting to the true string and
7:36
that will be set as a random output random out value so we are writing this one there's a
7:41
key value pair we're writing temporarily onto the context their context dot write
7:46
random out key and random out value we have enclosed them in the proper try-catch
7:51
block now we are going to discuss one method that is a distinct map because we are
7:56
supposed to get the distinct records so here we have defined one XML parsed
8:02
which has been initialized with this XML to key XML to map method output
8:07
so from the XML parts we're getting this owner user ID and that will be
8:11
in it that will be kept in user ID of the type of string if the user ID is equal to
8:16
now then return need not to execute the rest part of the code and here the distinct
8:21
output key dot set tag D D here stands for the distinct and then set output key is
8:27
dot set text user ID so whatever the user ID we have having that one we are writing
8:31
that one and then on the context right we are going for this distinct output key and distinct out value So we going to discuss the reducer class There is a random this distinct reducer So here we
8:45
defining one multiple outputs object mallop and in the setup method we're just
8:51
initializing keeping this mallop initialized with this text and null writable context
8:58
and then we're going for this overriding the reduce method. So key dot get text
9:02
value comma context if a if the tag is a otherwise we'll be going for get text
9:08
value context otherwise so now we're going for this random reduce so the random
9:14
reduce method and the random reduce random distinct a distinct reduce so two
9:20
methods are there discussing the random reduce at first so multiple output
9:25
dot right multiple output dot right you can find is having the respective value
9:31
there is a multiple output random comma value comma null writable dot get multiple output random plus
9:37
concatenating slash part that is a random reduce in case of distinct reduce we're
9:43
writing the same we're writing the same but here it is the multiple output distinct
9:48
instead of writing multiple output random we are writing multiple outputs distinct the key
9:53
whatever has been passed as parameter that has been put there we're writing this cleanup
9:58
cleanup is doing nothing it is just closing that one now we are going for the tag
10:03
text in the tag text we're having one constructor parameterous constructor non
10:09
parameter constructor is void we're having the parameter as constructor set tag
10:13
there is our get tag so non parameter as constructors constructors is void and in the
10:18
parameter as constructed we're writing set tag that is a text dot get tag and set
10:23
text text text text text text text so in this way the tag and the text will be set here now we are having multiple setter
10:34
and getter methods multiple setter and getter methods so that we have
10:38
written afterwards this is the setter and getter as as usual we have
10:42
written these are the setter and getter methods I'm just marking them so that
10:48
you can understand here we are written only the setter and getter now we are
10:52
going for this read files so read files we're overriding this one tag is equal to
10:57
in dot read utf and text dot read fields in so in we have passed that that was as a data input
11:05
that is a in we are passing and that has been kept here for the right will be doing the same
11:10
but here we are writing this tag here we're writing this tag and then text dot write out
11:16
so in this way the respective the right method we have written now also we are overriding
11:22
this compared to so i nt compared is equal to tag dot compared to obfere
11:27
dj.g if the compared is equal to zero that means if the comparing comparison
11:33
is true right that means the comparison is zero that means we'll be going for
11:37
text dot compared to object dot get text so that will be that will be returned
11:43
and otherwise it will return the compare that is the other non-zero value so here we're overriding the two string method because if i want to print something so it will
11:51
be printed as there is a tag dot two string colon text dot two string so tag will be converted to text
11:57
to text then one colon will be appended here there is a concatenated and text dot two string
12:03
so in this way we have written multiple different overridden methods now let us discuss the main
12:08
function within the main function we require the the respective tax name then post input
12:16
folder and the output folder so two common and arguments we require to pass here we're
12:21
just defining one job instance the name of the job is job merging we're setting the mapper class
12:26
we're setting the mapper class we're setting the reducer class and here we're
12:31
taking the three reusers see reducers so set num reduced tax three we're
12:38
defining the input as a set input path and set output paths from argument
12:42
zero and argument one add the named output and add named output so we are having
12:48
two output folders so that is a multiple output random and we be having multiple outputs distinct so two output folders were defining and there will be text output format dot class and the key value pair will be text dot class and null or writeable class so in this way you
13:05
are going to create two output folders you can find that we'll be creating the
13:09
two output folders in our example in the outcome so here we're writing the
13:15
respective key value pairs the respective classes now going for job dot set output
13:21
key class tag text.class and set output value class text. So everything we have done now we are going for the check of the completion
13:29
If the completion is true we'll be returning zero otherwise it will come out it will
13:34
return it will exit with us to. We'll be creating the jar file as we did the create on the on the respective project
13:42
name right button click export and create the jar files which have shown that one in the
13:46
earlier projects also. So now we shall we have created the jar file already so let me go
13:51
for the terminal let me show you that what are the command we are supposed to execute
13:54
so I've written the command earlier so that will be Hadoop then JAR and then MapReduce
14:01
folder with the MapReduce the JR file folder and the path then the package name
14:06
that is a class name that is the input post and that is the output input folder
14:12
and the output folder so if we execute this one you can find that
14:21
If I execute this one, it will go on creating the respective output folders and all
14:27
So here we had provided the post.xml and users.xml and we had to find the distinct records as well as the random sampling on it using the job merging pattern
14:39
So that was the aim of this demonstration. So it is about to complete
14:45
Let me check the output folder also. We'll be getting the respective outputs
14:51
yes the command has got executed properly going for the output folder now we shall
15:02
go for the output folder now see under the output folder we're having distinct
15:07
and random and part files are having the zero bytes so distinct and random two
15:12
folders are there in the distinct we're having some part files three part
15:16
files are there and in the random also we're having three part files and sizes
15:21
sizes are non-zero so let me go on printing them coming back to the console
15:29
again and we shall go for printing using the command minus cat slash output
15:44
we shall go for part star so for the distinct well having this
15:51
You can find that all these values are distinct. You can get it done
15:57
Now let me go for the random. I think we have seen the code
16:03
You have seen how did we execute, how we are showing the part files
16:07
And this is the present content here. I hope you have got the idea
16:15
You can pause the video, type the code. Also you'll be getting the same output as we have got here
16:20
these are respective output we are getting in case of random long output this one and
16:30
divided into multiple part files I hope you've got the idea that how could I
16:38
could execute all this code and how these outputs are getting you can see that
16:42
we're having the respective row the opening angular bracket and the closing
16:47
angular bracket everything we gave a a accordingly and the IDs are not getting included you can see and also in the creation
16:55
date you can find only the date part is there because after capital T we have
17:00
discarded the time part so only the creation date is there and the respective
17:04
other other tags are already there and this is output of this random I hope you
17:10
have enjoyed this video and thanks for watching
#Java (Programming Language)
#Programming