Hadoop Cluster Architecture
14K views
Nov 18, 2024
Hadoop Cluster Architecture
View Video Transcript
0:00
In this video we are discussing Hadoop cluster architecture
0:04
Hardup cluster architecture is consisting of name nodes, also known as primary name node
0:10
and data nodes and secondary name node. Name node is actually containing the metadata, that is the data about the data regarding the data nodes
0:19
That means it will contain the file system information, where the data will be stored and data has been stored
0:26
So that information, replica information, all will be kept in the data
0:29
the primary name node. In case of data nodes, so they are actually responsible for storing
0:35
data. And secondary name node is not there to replace the primary name node. So, secondary
0:41
name nodes are there to give the help or to give a support to the primary name node. So
0:48
let us go for more discussion on this topic. So what is Hadoop cluster architecture
0:55
Haddub clusters are set of commodity hardware. Commodity hardware means, hardware that means here the hardware cost is very low as a result of that data
1:04
storing cost will be also getting low they are used to store and process parts
1:10
of data it is basically the file system of the Hadoop and which is also known as
1:16
HDFS that is Hadoop distributed file system now let us consider this diagram
1:23
this is the name node and here the metadata related operations will be done and it is containing differing kinds of metadata that is the name replica information file system information
1:35
and so on and here we are having multiple data nodes divided into multiple racks so here
1:42
this data note is actually responsible to store the information and all the replica related
1:48
information will be also done here that means one data residing in one data note will
1:53
have a replica in another data note accordingly. This name node will be doing some block-related
2:00
operations on these data nodes and these data notes will be accessed by the client to perform
2:06
the read-write operations. At first, the client will be asking the name note to get the required
2:11
metadata. After getting the metadata, this client process will be accessing the data notes for the
2:17
data read and write information. These data notes are having the responsibility to send its
2:23
health report to the name node after regular time interval. And that is the basic architecture
2:30
There is a Hadoop cluster architecture. So let us discuss each and every type of nodes separately
2:37
So what is the tax of the name node? So we are discussing the name node at first
2:43
The name note is used to store the metadata and another data related to data nodes
2:48
So it will be actually storing the metadata. There is the data about data of the data
2:53
data nodes onto the name nodes and the name note is also responsible for managing the file system namespace it controls the access of different clients into the data blocks so which client
3:06
will be allowed to access a certain data block on the data node or not so that will be controlled
3:12
by this name node periodically checks the availability of the data nodes and also it also
3:18
cares about the replication factor of the data blocks so these are the different
3:24
responsibilities which will be handled by name note next we are going for what is the
3:31
task of data nodes data notes are the main storages of data
3:36
hardook uses low-cost hardware to store data in commodity hardwares and data
3:42
notes are responsible for storing replication creating deleting these types of jobs according to the instruction of the name node so which data has to be
3:53
replicated which has to be deleted so all this information which will be coming from the name
3:57
node to the data note data note is responsible to obey and execute those instructions these data
4:04
notes send the health report to the name node periodically and the default time is three seconds
4:10
so after every three seconds these sent the report to the name note so these data notes will
4:16
be sending the respective health report to the name node at each third second of time
4:23
Next we are going to discuss that is the secondary name note So whatever the name note we discussed earlier can also be called as name note or primary name note Secondary name note that is a secondary name note is another specially dedicated node
4:38
which is used to take the checkpoints of the file system. So this secondary name note is having the main responsibility to take care of the checkpoints
4:48
of the file system. The secondary name note is not for the substitution of the primary name note
4:54
name note and it helps the name note but not replacing that name note so it is actually giving
5:00
a support to the primary name node so let us come to this point here we're having the
5:04
secondary name note and there we're having the name node or the primary name note so this primary
5:09
name note is accessing this fs image that is a file system image so now this secondary name
5:15
node will do the update of this fs image with edit locks and this fs image will be updated
5:21
and its copy copy the update of the fs image back to the name node so that the name note can get the fs image perfectly so it is taking the care of the checkpoints of the file system so while doing this update before that it is querying for edit locks in regular intervals with the primary name note so after getting that information it will do the updates and updated copy will be copied here so as a result of that name node will get the always the updated fs image
5:51
So, in this way, in this discussion, we have discussed what is Hadoop cluster architecture
5:57
Thanks for watching this video
#Cloud Storage
#Computer Science
#Data Management
#Distributed & Cloud Computing
#Engineering & Technology
#Enterprise Technology
#Programming
#Software