MapReduce and Design Patterns - Shuffling Pattern Example
2K views
Oct 18, 2024
MapReduce and Design Patterns - Shuffling Pattern Example https://www.tutorialspoint.com/market/index.asp Get Extra 10% OFF on all courses, Ebooks, and prime packs, USE CODE: YOUTUBE10
View Video Transcript
0:00
In this video we are discussing Suffling pattern example
0:04
So in this example, we'll be implementing suffling design pattern. So, shuffling pattern example, it takes the comments.xml and randomize the records using
0:17
suffling design pattern. So here we'll be taking this comments. . XML, that is the XML file, where the datasets are deciding, and we'll be doing the
0:26
suffling design pattern implementation on this XML. So let us go for one practical demonstration
0:32
to show you that how the Java code can be written and how it can be executed and how the
0:37
outputs can be obtained. Here we are discussing one problem that is a shuffling pattern
0:43
example under data organization design pattern. We are having the comments.xml under the folder
0:51
that is slash input slash comments with the size of 37.98. mb so this comments is containing so many different comments by the user so here we have selected
1:04
some of the rows of the comment within the comments tag we're having multiple rows are there
1:10
so this is a one row now each and every row has got multiple fields like id post ID is code
1:17
then text then creation date and the last one is the user ID so we're having multiple
1:25
such rows are there we shall be going for shuffling all these roles in a random order so
1:32
here that only only one Java class is there that is a common shuffle MR task
1:38
so which is which is having one inner class that is about shuffle comment comments mapper
1:44
which is extending the mapper we have created in writable object random object and also
1:51
we are defined another one there is a random object is there and then we are going to define another one there is a text output value of the type of text
2:00
We're just overriding the method that is a map. Under this map XML parts is a hash map key object which will be instantiated by XML to map this particular function will take one XML file as input and returns the hash map object as output
2:18
So that function has been called here. So we'll be going for XML parts dot size if it is greater than zero
2:25
then only will be going for the true part. So we are going to define one string builder object That is a string builder is a new string builder string builder We having this a string builder object and it is going for append that is a less than
2:41
row one blank space just check here we're just going to append this one less than row
2:45
and blank space this string will be appended. Now we shall we shall check so here you can find that this a row actually so we
2:54
shall create such rows in my in our output file we shall be creating such rules in our output file so that I'm showing you here okay now the
3:06
next thing is that for entry string string entry within the XML parts dot entry
3:12
set so for this XML parts containing the hash map key if the entry dot get key
3:18
dot equals user ID or entry dot get key dot equals ID any one of them then go for
3:26
ignored don't consider that one we are not going to write that one on to our string builder class object else if entry dot get key equals creation date
3:35
in that case what we're doing and string builder dot append entry dot get key
3:41
but just writing the key there then we are putting one one double
3:47
quotation so that's why you written that one that is a that is a double
3:51
quote is equal to slash double quote then entry dot get value so one double code you are putting then entry dot get value dot sub string
4:01
0 to entry dot get value dot index of capital t now what is this so you are
4:08
taking a sub string till the capital t now question is coming in mind why this
4:13
capital t is residing so let me go back to my XML file so that I can show you the
4:19
date time format just come to this XML file you see just go for the creation date
4:24
you can just see at a creation date but having this one is the date and then capital t is there
4:31
which is denoting the time and then the time stamp is coming so this capital t is actually denoting
4:37
a delimited between the date and the time so until i get this capital t so i'll be
4:43
searching this capital t with the help of index so until that one from the very first
4:49
index to the capital d index prior to that that is a sub-string so that we are doing here
4:54
and then we'll be going for slash. So we one slash double code so double code will be appended Otherwise string builder dot append the entry dot key and then slash one double quote is there and then we be having the entry dot get value the slash double quote
5:11
So you see we have given the blank space we're given the equal to sign everything we have done and at the end we're just closing that angular bracket
5:18
So just remember this one in the output will be reflecting all this
5:22
So output key dot set that is a random. Next I and output value dot set string builder dot two string
5:29
2 string so the output key and output value has been initialized and now we'll be
5:35
writing onto the context the key value pair in the context will be writing that
5:39
one enclosing the code in the proper try-catch block including the code in the
5:44
proper try-catch block now we are going for the shuffle comments reducer extends
5:49
reducer here we are going to override the reduce method as we usually do you see
5:55
in the reducer we're going for this text value working on the values
5:59
and values is nothing but one iterable class object now we are going for for loop text
6:05
value which will be working on the values values is iterable class object and writing the key
6:10
value pair there so value and null writable dot get so now we are going to discuss the main
6:18
in the main you can find that we require the class name and then input folder and the output
6:23
folder input folder so two arguments are to be passed if you don't if we didn't then obviously
6:29
error will be there so creating one job that is a shuffle the comments setting the
6:34
jar class setting the mapper class and reducer class already you have
6:42
defined this mapper class and reduce a class earlier so here is the mapper
6:50
class and here is the deducer class we have defined that one earlier also we have
6:55
set numb reduced tax two so to set two reducers are there
6:59
Text input format will be that set input paths will be arc zero and set output path will
7:05
be arcs 1 and output key class will be intratable output hello class will be
7:10
text class we are going to check the completion status zero or three in this way
7:16
the program would be executed so now we are going to create the jar file so
7:21
package name then right click export jar respective path and the jar file name has to be
7:28
given and then next next finish but we have already created the jar files
7:33
were skipping this particular step so let me execute the command for the execution of this so hadup jar then the respective path
7:51
we have produced design patterns slash jar file slash there is a class there is a
7:58
class there is a jar file name and then we'll be going for
8:03
there is a suffling pattern is a package name and then comment
8:15
shuffle is the shuffle immer task there is a class name then you shall go
8:22
for slash input so comment so this is a input so comment so this is a input
8:33
path and the output path so we shall execute the command now so here we'll be
8:49
suffering and you can recall that how we have planned to write the rows that is a
8:55
there is a angular bracket and then blank space then row we'll be writing
9:01
the the syntax we'll be writing only the date no time part so let me let me show you the content so s dfs dfs minus cat
9:11
then you'll be going for here the output path and then part files will be there part star
9:21
see the contents see each and every row which whichever has come onto the part files we have written
9:29
because we had the string be glass object so we are having this just taken in
9:33
you can find that we are having that angular bracket row blank space creation
9:37
dates code this and that you see in the date we're having only the date part no time part but having only the date part and rest we wrote accordingly and then
9:45
a closing angular bracket so in this way you have you have seen that how this
9:50
this corresponding content has been suffered into our output file I think we
9:56
have explained the code line by line let me delete the output folder as we do
10:00
earlier also so I hope you have in enjoyed this video and thanks for watching
#Java (Programming Language)
#Programming