MapReduce and Design Patterns - Bloom Filtering Pattern Example
1K views
Oct 18, 2024
MapReduce and Design Patterns - Bloom Filtering Pattern Example https://www.tutorialspoint.com/market/index.asp Get Extra 10% OFF on all courses, Ebooks, and prime packs, USE CODE: YOUTUBE10
View Video Transcript
0:00
In this video, we are discussing Bloom Filtering Pattern Example
0:05
So here this video will be assisted with one practical demonstration for the implementation
0:10
of this task. So Bloom Filtering Pattern Example. So in this example, we will provide a tag list
0:19
And inside the tag list, there are some tags. In the training phase, it takes the tag list and creates the intermediate data
0:27
and that indomated data will be saved on the HDFS. In the second phase, the MR task will take those comments from the comments. comments.com.com
0:37
We are having one XML, which is known as a comments. Dot XML, that do not contain those tags
0:43
So here we are creating the tag list at first. And then in the second phase, we are finding that from the XML file, comments. comments
0:51
We are finding that is the map-produced task will take those comments from comments
0:55
XML which do not contain those tags which are there in the list so let us go
1:01
for one practical demonstration for the better understanding of this concept under the design pattern category that is a filtering pattern design pattern
1:11
we are going to implement bloom filtering pattern example so here we'll be having
1:17
two parts the first part is the training part and the second part is the
1:22
MR task part in the training part we shall create one text file and let the text file name is say tag list dot text and we shall create one
1:33
folder later let it be MR underscore files we shall create this tag we shall put this tag
1:40
list dot text file within this folder there is MR files and which is which will be created
1:47
into the sdfs and then we shall use the posts dot XML to train our file
1:55
So train our overthr respective application. So now here we can find that we're having posts.xml and it is under the folder
2:04
slash input slash post, 108.93 mb. So it is a big file there
2:10
So let me go to show you the current content of the posts.xml
2:15
Under the post tags we're having multiple rows are there. Thousands of rows are there, but here we're showing say three of them
2:23
So here under each and every row we're having multiple. different tags as you have multiple different attributes as a marking here we can go
2:30
for say body and then we can go for say you the owner user ID last edited
2:36
user ID last edit date last activity date title then tags and then answer count
2:45
comment count favorite count community owned date so these are the multiple
2:52
attributes are there under the file that is post dot XML. Now we shall go for the Java programs. We shall show you that we are mainly having
3:02
the two Java programs are there. So one is MR task, will be the Java program and then
3:09
we'll be having the other one. Here at first we're going to discuss that is a tag list
3:14
Dot text. Here we're having 20 tags are there, 20 tags are there. We're having alarm
3:19
networking, PDF and so on. So these 20 tags are there. So these tags will be referred
3:25
when you'll be going for the searching and filtering so here we're having two
3:30
Java programs bloom filter trainer or Java for the training purpose and bloom
3:35
filter m r tucks. Java for mr task so under this bloom filter trainer dot Java we're
3:41
having the respective class and we're having the respective one the inner class is
3:46
here there is a bloom filter mapper containing the bloom filter object and we're
3:51
having the adeleist tags and the bit number and hash number of the type of integer we're going to override we're going to override the method that is a setup
4:02
and this particular setup we are you overriding because it is under it is extending the mapper class
4:08
so here ctx dot get uh cash files is not equal to null and ctx dot get cache files
4:15
it dot length is greater than zero then you are just new file you are going to create that is
4:21
a tag list is the folder name so these are respective file tag
4:25
file is there we're having the buffer d door we're having the buffer
4:30
data class object and here we're reading the tag file whatever you have
4:34
defined right now and then depending upon the now we are reading it we're
4:40
reading this one so line is equal to buffer d door dot readline and then we are
4:45
creating the string bits and which is which is the bit array so we are great
4:50
also creating a string type of object that is a hashes and there is a hash
4:54
function get dot get hash function but just going to define two variables that is
5:00
a initializing two variables bit number and hash number from these bits and the hashes so we converting the string to the respective integer using pars i nt So in this way we are just defining one
5:14
bloom filter object with this hash number and bit number and the hash dot murmur hash
5:21
We're returning, we're just calling the setup with the respective context as input. So the
5:27
respective method will be called here, which is overwritten. As we have extended the
5:32
map mapper class so here we're overriding the map method this map method is
5:37
having one hash map object that is a XML parsed which is getting initialized
5:42
but the output of this XML to map XML to map is one method we have defined here
5:47
this map method will take one XML as input and returns one hash map object
5:53
as output and that will initialize the respective that will initialize the respective XML parsed there is a where here we're having the
6:02
string tag so from the from the respective post ID and tags you can find that if
6:07
you go to this posts dot XML the respective tags are there so we are actually
6:12
reading we're just mapping those tags here ID post ID is for ID and you know
6:18
the tags will be for tags so if tag is not is equal to null or post ID is
6:23
equal to now then you are returning you are not going for the rest part of the code otherwise you are just going for for string for each tag in in
6:31
tags if tag dot contents tag in so tag dot contents tag in bloom filter dot add new
6:37
key there is a post ID dot get by it so in this way you are going with this adding with this bloom
6:42
filter object there is a post ready here we're having and everything has been
6:47
kept under the try catch block for the exception handling but over adding one
6:51
method there is a cleanup we're over adding one method that is a cleanup and here
6:56
there is context dot write null writable dot get and bloom filter so this particular
7:01
we're overriding here next we are going to discuss next we shall go on we
7:08
are going to discuss the the class that is a bloom filter reducer which extends
7:14
reducer here we're having the member variables that is the bits of the type of
7:18
string and hashes of the type of string there and they are getting
7:22
initialized the bit number and hash number is getting initialized after converting
7:26
the string to the integer bloom filter object we have defined here and the same way
7:31
we are doing the same and in case of this in in our reducer so we are defining one
7:38
iterator that is the values dot iterator and this is my the iterator has next
7:44
so the next iterator will be will be taken into care of so bloom filter bloom
7:49
F will be initialized with that and boom filter dot or bloom F now we shall go for
7:57
the file system so file system file system is equal to file system dot get contact
8:01
get configuration and we know that we have we are going to we're going to
8:06
define one path that is the slash mr underscore files slash filter OBJ so this is
8:13
our file system path we are just mentioning here bloom filter dot write stream so
8:18
stream is nothing but fs data output stream object so then we are going for stream
8:24
dot flash and stream dot close so in this way we have just made one the new path that is
8:30
our slash mrp files slash filter obj let us come to this main function now we
8:37
require to pass four parameters as a common land argument so there is an
8:41
ad size has function number input and output so if the number of parameters are
8:47
not same we're going for system dot exit 2 we're going for set bit array and has
8:51
has function with this argument zero and argument one that is the first and second
8:56
arguments we're defining one job instance the name of the job is from
9:00
bloom filter trainer the name of the job we're also going for add input path and
9:05
as set output path with this arcs two and arcs three which we are passing as
9:10
Kamala an argument and set mapper class that is a bloom filter mapper
9:14
which is which is there and we are also having the inner class is a bloom filter
9:19
reducer which will be that is reducer class here so here we require only one
9:24
reducer so we are going for set num reduce task one so we are going for this ad cash
9:30
So whatever you have defined earlier, that is the AMR underscore files tag list or TXT
9:36
So there is the name of the file where the respective tags were there and slash a hash tag list
9:43
We are going to define this map output key and map output value with null writable class
9:50
and Bloom filter class type. And also the set output key class and set output value class also with the null writable
9:59
class and Bloom filter class. We're just checking the current status whether it is
10:05
successful or not and the we shall go for the we shall return we shall exit with that particular status Now we shall go and this is the current content
10:22
We shall go for the Bloom filter MR task. Java. So here we're defining one enumerator object
10:30
Inumerator object is there that is our selected counter. So it is having only one value that is selected
10:37
We're extending the mapper. there is a bloom filter mapper we're extending the mapper having got two
10:44
member variables that is a bloom filter object bloom filter and text post ID and
10:48
they are getting instantiated using the respective class objects we're overriding the setup method because mapper class is having one method that is a
10:57
setup and if context dot get cache files not equal to null and context dot get cache
11:03
files is got length is greater than zero we're going for file filter file is
11:08
to new file there is a dot filter obj so dot slash filter obj so in in this way the
11:15
filter file is getting updated and also we are going for this get path there is a
11:20
filter file file dot get path and filter file dot get path all of them are are used
11:27
to initialize the data input stream class object bloom filter dot read fields there
11:34
is a stream system route or print ellen we have done that one Now, what we shall do is that here also at having the XML to map
11:44
This XML to map as I as I told you already, it will take one XML file as input and produces
11:49
the hash map object as output that will be instantiating your XML parsed
11:55
So string. Post ID is equal to XML parsed. . . . . . . that is the . . . . . . . . . . . . . . . . . . respective, you know, the respective attribute post ID
12:04
If post ID is equal to now, then return, I should not go for the rest part. other as post ID dot set post ID if bloom filter dot membership test we are
12:14
going to check the membership whether this tag is existing there or not context
12:19
dot write null writable dot get there is a post ID and then context dot get
12:25
counter that is a selected counter dot select which will be incremented by one so
12:30
if it is a member then it will be incremented by one so that's why you are
12:34
going for the membership test and everything has been put within the try catch block to handle the exceptions now
12:48
this is our main function so in this particular main function we're going to
12:53
have this length the length of this common land argument so that will be two
12:57
we're supposed to give the input and the respective output so here we're going
13:02
for the set the jar file class set mapper class set number reduced
13:06
task it is only a map only job so reduce task is zero no reduce at a class has been
13:12
used so set output key class and set output value class so there is a null
13:17
writable class and text class now job dot at cache file so respective path has
13:22
been provided and then hash filter job obj and then add input path from the
13:29
argument zero set output path from the argument one so added input slash up
13:36
path so the message will be printed now we are going to check the status the
13:41
code will get the status whether the completion has got success or not then there is
13:46
a job dot get counters find counter so we are going to write them in the
13:51
respective output there you can find we're going to write them in the respective
13:55
output now let us go for the jar file creation so this a jar file you are
14:03
finding where supports to mention the respective file and the path we have already created the chart file so we're not going to create it
14:10
again so simple steps are click on next and then finish so now let me execute
14:16
my code now let me execute my project so we have come to this terminal so let me
14:26
let me execute my commands one by one because it is a long procedure so we're
14:33
going for S DFS DFS minus mk d i r and then you shall go for slash there is the mr underscore files so at first
14:48
we are creating this folder let me check if i go for refresh yes mr files the folder
14:55
has got created let me come back to my console so folder has got created now now let me
15:04
I do a copy of that file tag list. text within that folder. So, sdfs, DFS minus
15:11
put, there is a current path and then home path and then map reduce design pattern. This
15:16
is a path there Tag list TXT the text file will be put onto that MR underscore files in this folder
15:32
so the file has been copied there so this part you have done so for this
15:38
MR files you can go I can check that tag list or txt file has been copied onto the folder
15:47
adduck jar so here we are supposed to mention the path where you have kept the
16:05
jar file this is a path here may produce underscore design pattern jar files slash jar
16:10
files is the respective path and then filtering pattern or jar then we are supposed
16:17
to write the respective package name and then class name so trainer so here we are
16:33
passing this one thousand and five so let me show you that why this one
16:41
thousand and five is there because here the add a size and the
16:47
function number at a size and has function number then you shall give the input
16:53
path and then you shall go for the output path so let me execute the command now
17:10
yes the comment is okay so let me see that what are the outcomes yes the command executed successfully everything is okay now
17:36
we shall go for his DFS DFS then we shall go for L S only to show you the current content of MR files
17:47
This is the current content of this MR files folder
17:59
So filter OBJ and tag list. . If you go for the refresh, you can find that
18:07
Filter OBJ and tag list. . So two files have been created under the MR
18:13
underscore files folder. now we shall go for s dFS and then dfs r m minus r output folder so in this we can find
18:33
that how the things are working now we shall delete their respective output folder
18:40
here because we're supposed to execute the next java program so s dfs dfs minus r m
18:47
minus R output so this folder is now getting deleted so let me start with the next
18:54
Java program so we shall issue the command that is our Hadoop jar
19:06
there is a map reduce design pattern jar files then you shall go for filtering pattern dot jar bloom filter is a package
19:26
name bloom filter MR task is the class name is our input folder and also we are mentioning the output folder
20:09
808 so these are selected output we are getting here so depending upon the
20:14
condition the filtered filtered number of rows is 106808 so in this way we got
20:20
the output so I hope that you have got the idea it is a long process so many
20:25
steps are there you can rewatch the video again and thanks for watching
#Computer Science
#Programming
#Software