MapReduce and Design Patterns - Total Order Sorting Pattern Example
4K views
Oct 18, 2024
MapReduce and Design Patterns - Total Order Sorting Pattern Example https://www.tutorialspoint.com/market/index.asp Get Extra 10% OFF on all courses, Ebooks, and prime packs, USE CODE: YOUTUBE10
View Video Transcript
0:00
Total order shorting pattern example
0:03
So in this particular video we shall go for the implementation of this design pattern
0:10
So in this example we will take the XML file that is a users.xml file
0:15
and to short the records based on their reputations. So depending upon the reputation value, the records will be shorted
0:23
The records will be obtained from the XML that is users.xml. It also takes one sampling rate to be shorted
0:29
rate to make the random sampling. So it also takes one sampling rate to rate the random sampling
0:36
because using the random sampling will be going for further splitting. So let us go for one
0:41
practical demonstration for the easy understanding of this concept. We are going to have one
0:47
example of total order sorting design pattern which is falling under this data organization
0:54
pattern here and we will shot users based on their reputation against the sampling rate
1:01
of point one so this is the task here we're having the the input file that is XML file
1:08
users dot XML under this slash input slash user under this folder we are having users
1:14
dot XML having got size 56.18 so this is a folder I'm just marking here 56.18 mb is the size
1:23
of this file it is a long file but I've shown some of the records within the user's tag
1:29
we are having multiple rows are there these are the user tags are there we are having
1:33
multiple rows are there each and every row has got multiple attributes so let me explore
1:40
it with this ID reputation on which will be doing the sorting and then creation date
1:47
then display name last access date website URL location me then we'll be going for view upvotes downvotes account ID so these are the set
2:00
of attributes are there so we shall we shall short users based on their reputation
2:05
we're having only one Java file that is the dot order short MR task so
2:12
here it is having the the MR there is a mapper class has been extended a
2:18
reputation order mapper under this we're having one int writable object that is the output key which has been instantiated we're
2:29
going to override the map method within this map method we're having XML
2:35
parsed which will be instantiated using XML to map function this function
2:40
we have written there in our code which will take one XML and returns one hash
2:45
map object which will instantiate this XML parsed string reputation is equal to XML parse dot get reputation so from the hash map using that tag
2:55
reputation we are just extracting the reputation here so here you can find that
3:00
reputation is there as one of the fields in the row so this attribute is there
3:05
reputation we are just accessing that one if reputation is equal to is equal to
3:10
null then we are returning otherwise in reputation is equal to integer
3:15
dot value of reputation dot in value we are converting that one to the integer and
3:19
and output key dot set in reputation and this key value pair we are writing here in the current context context dot write output key comma value so there is a key value pair we writing here so that is for the temporary purpose enclosing it within proper tricatch
3:36
block with having the class there is a reputation reducer which extends
3:41
reducer it is a very simple class we have written and overriding the reduced
3:45
method overriding the reduced method you can find that we're having and that is
3:50
our text value which will be iterating over these values and this value is values is nothing but one iterable
3:57
class object so context.write value comma null writable. This is my you can see this is my
4:08
actually the main function we are going to discuss one by one the main function we are going to
4:12
discuss one by one so we are having this there is a class name then input path then output path
4:19
and then the sampling rate which you were supposed to pass 0.1. So three arguments are to be
4:24
passed. So arguments dot length must be three. We are defining multiple different paths here
4:30
So path input path that is initialized with the argument 0 and then partition file will be
4:38
initialized with argument 1 and also a concatenating underscore partitions dot LST. So with this
4:46
argument 1 we are concatenating underscore partition dot LST so that we are concatenating with it we're having the output stage here
4:56
we're having this arcs one there is a second argument whatever you'll be passing and concatenating it with underscore staging so output stage path has
5:06
been initialized we're having the output order that is a argument one so that is
5:12
the second argument we're passing so on that path so double sample rate is
5:17
equal to double. That is the last argument that is the last argument that is above point one
5:22
which will be kept converting to double. We're going to have this system file system dot get new configuration
5:30
Delete output order true. We're going to delete the output order. We're going to delete the output stage
5:37
We're going to delete the partition file and so on. So they are getting deleted
5:44
We're defining one job. the name of the job is short users based on reputation so configured job to prepare for
5:53
sampling so creating one job the job name is use short users based on reputation
5:58
then we are defining the jar carrots the mapper class and at first the
6:03
reduce task is zero so performing only the map only job so making the respective
6:09
classes assigned and for reducer you are not assigning anything right at this moment set output key and set
6:15
output value so this two the respective in writable class so text class were
6:20
also defining the respective classes file input format dot add input path that is a
6:27
sample job comma input path so whatever you have defined earlier so there are
6:32
being used here so there is a sample job and input path see this is the input
6:38
path we defined from arc zero sample job dot set output format class that is a sequence
6:45
file output format dot class we also mentioning the respective class that type and then sequence file output format dot set output path there is a sample job comma output stage so everything whatever you have defined earlier
6:59
we're just using them now what we're doing we're going for the completion status
7:05
if the if the completion status is zero that means the successful completion then we'll be
7:10
invoking the reducer otherwise we shall not do so if code is equal to zero so we are defining
7:16
the job instance and we are setting the the jar class again setting the the job instance we are going for the name of the job
7:24
is short users based on the reputation we're setting the the jar class set jar
7:31
by class we're setting the mapper class we're setting the reducer class so all
7:35
these things which you do earlier also in other programs we're doing the same
7:39
now here we're going for set number of reducer three so three reducers we're
7:44
going to create so there is the order job set partitioner class that is a total order partition class we are also
7:52
defining the respective classes there so Hughes hadoops total order partition class here so under the had to package total order partitioner dot set
8:03
partition file that is the order job dot get configuration and the partition
8:08
file so here we had just just respect to partition file whatever you
8:15
define with this concatenation of underscore partitions dot So here we're going for this. So order job.set output key class in writable and output value will be text class
8:30
Order job dot set input format class is a sequence file input format. Class
8:34
so set the input to the previous jobs output and sequence file format.set input path that is the order job and output stage
8:44
Everything we defined and initials earlier were using them here. Text output format, that is a set output path, that is the order job comma output order
8:57
But also going for the configuration, we're also going for the configuration, that is the order job
9:02
dot get configuration. There is a mapread. Dot text output format dot separator using the null, that is a null as a separator
9:12
That is an empty string actually. It is an empty string. input sampler is input entry string I'm just showing so input sampler dot write
9:21
partition file order job comma new input sampler dot random sampler sampling rate
9:27
whatever you should be we shall be passing there and comma hundred so we'll be
9:32
passing sampling rate as we decided point one there is a sampling rate we're
9:38
passing now depending upon the job completion status zero or two the corresponding
9:45
running zero or two so that will decide how how the completion has been done so we
9:50
shall delete the partition file and output stage file we shall delete them
9:55
so false system dot get new configuration dot delete partition file false and
10:01
output stage file true and system dot exit code it is a long code you can find that
10:07
so many things we are doing I've just explained line by line long code we had to
10:11
type yes this is the main main main function is there and then we having this is a taught order shot mr task so now we shall create the jar file so we shall go for the package right button click we shall go for export we shall go for export
10:31
then you shall decide the path we shall go for export we shall define the path and
10:36
respective the jar file name then we shall go for next next finish so in this way the jar file
10:41
will be created but we created the jar file earlier so you are skipping this step
10:45
So the jar file is required for the execution of this. So let me go to the console now terminal
10:54
So this is the command you are finding. There is a Hadoop
10:58
The first line we're writing here Hadoop. Then we'll be going for JAR and then the respective path up to JAR files is a path and
11:06
then data organized pattern or jar is a jar file name. Then we shall be defining the respective, there is a package
11:15
and then total order shot MR task is the class name then input file folder that is a
11:23
slash input slash user and slash output is the output file folder and point one is the sampling rate
11:32
so you shall execute the command down so as I told you that we will shot users based on their
11:38
reputation and depending on the sample rate that is point one so
11:45
but having the multiple files will be created the part files will be created
11:52
so let the command get executed completed we shall go to the output folder
12:00
so now the file has the program execution has been completed successfully
12:06
so let me go for SDFS and then DFS minus CAT slash output slash where I mean this part star and then press enter so part files
12:24
has got created how many parts file has got created I shall be going to the
12:29
output folder here you can find that all the records all the records in the
12:35
shorted from users dot XML all the records have got shorted based on their
12:41
reputation user dot XML is a file so based on the reputation
12:45
these records have got shorted so we're getting the right file right output so let me go
12:52
for the output folder let me go for the output folder to show you I'm just scrolling up
12:59
and down so that you can see the content there is the content of multiple part files
13:06
so let me go for this output now output folder I'm going for this output folder
13:11
so we have created we have got three part files but the first file is having the zero
13:15
bytes and other two files having the 32.95 mb and 23.03mb so these two part files are
13:22
having the respective contents these two part files are having the respective contents
13:28
so I have shown you that how to execute this program how to write the code
13:33
explaining line by line so let me delete the output folder there that is a common
13:37
practice but optional so that we can execute other map reduce tasks I hope you
13:42
have enjoyed the video and thanks for watching
#Programming
#Science
#Software