MapReduce and Design Patterns - Generating Data Pattern Overview
5K views
Oct 18, 2024
MapReduce and Design Patterns - Generating Data Pattern Overview https://www.tutorialspoint.com/market/index.asp Get Extra 10% OFF on all courses, Ebooks, and prime packs, USE CODE: YOUTUBE10
View Video Transcript
0:00
In this video, we are discussing generating data pattern overview
0:05
So, in this case, the data will not be read from the HDFS
0:09
It will be generated by some random functions and as a result of that, we'll be getting
0:14
the virtual data set, virtual input splits and so on. Sometimes this particular design pattern can also be used
0:22
When we are having a small data set, we can put some newly generated data to make it a huge
0:28
data set. So, inclusion of this dummy records within that small data set will become a huge
0:35
data set for the operation. So, let us discuss more on this design pattern. So what is generating
0:43
data pattern? So this pattern does not take the input from SDFS. It creates data from the
0:50
scratch. So it is not reading its data from the SDFS and it is creating data from the very
0:56
scratch using some random function generations. So, generating random data is used in different
1:03
cases and one of the use of is that to build some sort of representative data set to convert
1:10
the actual small data set to a large one. So where we are having a few number of records
1:16
in a small data set we can put such new data sets from the scratch using some random functions and with the help of which we are going to get a healthy long data set So the task is not so easy to implement in the Hadoop MapReduce framework
1:34
and as there are no input split or records so we need to fool the framework to think
1:41
there are splits are there and so many records are there. So actually we are creating a
1:47
virtual data set virtual input splits on which the MapReduce task can easily work
1:55
So, here this is the diagram with the help of which you can discuss that is generating data pattern
2:02
So, here we are having the virtual input splits. That means, so here the data have got formed from the very scratch, there was no physically
2:10
existing or pre-existing data sets. So that data set is a huge one, so we are having multiple input splits
2:18
So virtual input splits were having multiple such. we are having the record reader which will be reading this records from the virtual input
2:26
splits then we're having the identity mapper and using this identity mapper you can put
2:33
some business logic in the mapper and the mapper will work accordingly and it will produce
2:37
the respective output parts so you can find that this is the generic diagram with help of which
2:43
we have discussed the concept of generating data pattern overview in the next videos we'll be
2:48
discussing lots on these particular concepts please watch all of them and thanks for watching this
#Data Management
#Programming