MapReduce and Design Patterns - Bloom Filtering Pattern Overview
578 views
Oct 18, 2024
MapReduce and Design Patterns - Bloom Filtering Pattern Overview https://www.tutorialspoint.com/market/index.asp Get Extra 10% OFF on all courses, Ebooks, and prime packs, USE CODE: YOUTUBE10
View Video Transcript
0:00
In this video we are discussing bloom filtering pattern overview
0:05
Now what is the bloom filtering pattern? So in this case we'll be having some predefined set of values and these values will be known
0:13
as the hot values. From the records, from the input records will be extracting some features and those features
0:20
will be searched on these hot values. And if the value has been found, if the feature has been found there then the record
0:27
will be kept otherwise the record will be discarded and dismissed. So, there is the main concept
0:32
in our bloom filtering pattern overview. So now let us go for some more discussion on this
0:38
topic. What is bloom filtering pattern? This is also a basic filtering pattern, but it has
0:46
unique evaluation function for each record. To extract the feature of each and every record
0:53
we are supposed to have some unique evaluation function. In the first, we are supposed to have some unique evaluation function. In this pattern we can filter those records which are present in some predefined set of
1:02
values and these predefined set of values will be known as hot values So we be having some predefined set of values and that set will be known as a hot value set So for each record it finds the feature of that record If it is present in the hot value set then the record will be kept
1:22
Otherwise, the record will be discarded and it will be ignored. So, in this way, the filtering will take place
1:30
So, Bloom filtering criteria. So let us go for a summarization. So there are some criteria for Bloom filtering
1:38
So, what are the criteria? Here you have mentioned four such criteria. Let me discuss one
1:42
by one. So, first one is that data can be separated into different records. So, a data
1:48
can be separated into multiple different records according to our requirement. An extracted feature from the records that could be in a set of hot values. So some evaluation
1:59
function will be there with the help of which this feature value will be extracted from each
2:04
every records and that will be searched onto some set of hot values
2:09
There is a set of predefined hot values. So we'll be having some predefined hot values
2:15
So there this searching will be done. Sometimes some unwanted data can be there after filtering also and it is also acceptable So if there is some unwanted data is there after filtering then also the data will be accepted so these are the today bloom filtering criteria
2:36
bloom filtering structure to perform the bloom filtering job at first we need to train our
2:42
system from the data set so at first the model that the model will be trained with some data set
2:48
so when the training will be done then we require here number of datasets so that my model can be trained in a proper way
2:56
So, after completing the training, it generates a data and it will be stored onto the SDFS
3:03
So, after completion of the training, the model will generate some data
3:07
And those data will be stored onto our HDFS. So, this is our phase number one
3:13
In the second phase, when the actual MR task is performed, it uses the intermediate
3:18
data which was stored in, in our phase number one. in HDFS then that particular data set and those values which is saved which are saved
3:27
on this SDFS will be accessed now here we are having one diagram for the better
3:33
understanding so we had two phases phase one and phase two so now phase one is actually the step one so step one means the filter training We having the input split we having a huge data set so the data set will be
3:45
split into multiple input splits. So that input split will be available to the Bloom filter training
3:52
So this input split will make this Bloom filter model to be trained and it will produce
3:58
some outputs and the output will be saved on the output file on the HDFS
4:04
Now, we are having the phase number 2 that is our step 2. So step 2, bloom filtering via map reduce
4:10
Now the MapReduce tasks will be coming in the scene. So input split, Bloom filter mapper and this Bloom filter mapper will refer load filter
4:21
from distributed cache and then the Bloom filter test will take place
4:26
Now the test will have two outputs. One is the maybe and the one is the no
4:30
So no means the respective record will be discarded, maybe means it will be stored. Maybe it will be stored under the output files
4:36
We are having multiple mappers that is a bloom filter mapper so they'll be working in
4:41
parallel simultaneously on multiple input splits. So that is the basic concepts of bloom filtering criteria in our map reviews
4:51
So thanks for watching this video
#Consulting
#Data Management
#Programming