MapReduce and Design Patterns - Filtering Pattern Example
836 views
Oct 18, 2024
MapReduce and Design Patterns - Filtering Pattern Example https://www.tutorialspoint.com/market/index.asp Get Extra 10% OFF on all courses, Ebooks, and prime packs, USE CODE: YOUTUBE10
View Video Transcript
0:00
Let us discuss one filtering pattern example
0:03
So here this video will be assisted with some practical demonstration for the easy understanding
0:08
of the concept. So in this example we are going to filter those posts where the number of comments is zero
0:17
So I want to filter out those posts where there is no comment is there
0:21
So actually we are taking out some subset of the posts from the main set
0:28
So to execute this task, we need to use the posts.xm. So we are having one XML there, we'll be working on that
0:35
The output of the task will tell us how many posts need to be deleted
0:40
So let us go for one practical demonstration for the easy understanding and implementation
0:46
of this concept. We are discussing under the filtering pattern design pattern
0:52
The example is filtering pattern example. We are having one XML file posts
0:58
xml under the folder slash input slash post having got size 108.93 mb so it's a
1:05
big a big file this one so let us see the current content I've just shown some
1:10
part of it under the post tag we're having multiple number of rows containing
1:16
multiple number of attributes so this a row number one we are having this one
1:21
as row number two row number three and so on that are so many rows are there
1:24
having attributes ID post type ID so that is the second attribute accepted answer ID
1:31
creation date score view count body and then we're having this one as owner user ID
1:39
last editor user ID last editor date last activity date and then we're having the
1:48
the title then answer count tags and then answer count or tags and then answer count
1:54
comment count favorite count this a community own date so in this way you having multiple attributes are there So we having the other rows It is a map only job
2:08
So here the name is filtered MR task. So only one Java file we're having here and it is a map only job
2:15
So here we're having the deleted low count comment which extends the mapper class
2:20
We're not having any reducer here. So this particular class deleted low count comment because it removed contents with
2:27
low comment count so that's why the name of the class has been given
2:31
accordingly so this is a inner class and it is overriding the map method it is
2:36
overriding the map method so it is the map method is there within this map
2:41
method we're having one we're having one enumerator that is a delete count
2:46
which is having only one a value that is a deleted only one value that is a
2:50
deleted we're having the XML parsed this XML parts is hash map object is getting
2:57
instantiated with this XML 아닌 map method so here is the current content of
3:01
this XML to map method this XML to map method will take one XML as input
3:08
and returns one hash map object as output so from this XML parts get
3:14
comment count so we are the comment count is one of the one of the attributes
3:19
were there within the XML so that will initialize this count if count is not
3:23
equal to null or count dot equals is zero then context dot get counter and delete count or deleted dot increment one so now
3:33
CTX is the context object and its corresponding gate counter that is a delete
3:38
delete count or deleted the value will be increased by one so it has been kept
3:43
in the try catch block it has been kept in the try catch block now we shall discuss
3:50
the main function the main function will take common land arguments so
3:55
it will take two common land arguments so they be accessed with the air g s 1 0 and 1 so if the commonussy argument is not having the length just equal to 2 then exit 2 so the system will terminate here program will get terminated with this particular error message
4:11
we're defining one job instance with the name remove contents with low content count so here we're
4:17
having this one and here we're having the set jar class is this and then add input path so file
4:24
format will be going for this arc 0 and set output path the arcs 1
4:30
Set mapper class that is the deleted delete low count comment so there is a mapper
4:37
class and reducer class we don't have so no reducer will work here
4:43
Now we shall initialize the output key and output value with this writeable null
4:48
writeable class and all next one is the text class respectively. Depending upon the status of the completion whether it is true or false
4:58
0 or 1 will be returned and that value that code will be returned later on but before returning we are just going to�
5:05
the respective number of related comments is equal value and the value has been obtained using that formula that is the job dot get counters fine counter deleted delete
5:13
delete count or deleted dot get value so whatever we got the incremented we did the increment there and ultimately the value will be kept and the value will be printed
5:24
So before going for the execution we're supposed to make the jar file of this class
5:32
So how to create that one? So we have gone for the package name and then in export and then we're selecting the jar
5:39
Then you shall click on the next. Here we're supposed to keep the proper path, proper the journal name
5:45
file name, so jar file name. So everything has to be given and then next and finish
5:50
But already we have created the jar file. So we're not going to create the jar file once
5:54
again so let us go for the execution and let us show you that how the command is to be
6:00
initiated the command is hadup jar the command is haddub jar and then map reduce a design pattern slash jar file So this is a path and the jar file name is filtering pattern dot jar so there is a
6:15
JAR file name and then filtering pattern is the package name and the class name is
6:19
Filter MR task so there is a class name input file there is a post.xml is under
6:25
this slash input slash post folder and the output folder is our slash output so this
6:31
the output folder we are having so we are given the input path and also the output
6:35
path as argument zero and argument one now we have executed the command and we are
6:41
finding that the counter is zero I think the name note is in the safe mode so
6:47
name note is in the shape mode so let us make the name no name note to come out from
6:52
the step mode so the respective command should be given as is the Hadoop and then
6:59
we shall go for DFS admin then we shall go for minus shape mode and then blank space leave and enter so now the
7:09
name mode name node will come out from the step step mode the command has been
7:14
executed once again so remove contents with low comment count so you see the
7:27
number of deleted comment is equal to four five four seven one
7:31
so this is the number of deleted comment whatever we obtained we got that one printed from the
7:36
java class now let us come to the output folder here let us come to the output folder
7:42
you can find that output folder is there has got created but no part file is there no reducer
7:47
was there so that's why it is a only a map only job so let me delete this output folder
7:53
and the command is DFS DFS DFS minus RM minus R and then output I hope that you have got this idea
8:07
that how the program has to be executed thanks for watching
#Computer Science
#Programming