Load Data from File into Pig Tables
4K views
Oct 26, 2024
Load Data from the File into Pig Tables
View Video Transcript
0:00
In this video we are discussing load data from files into pick tables
0:05
So what is the respective comment? So load data from disk into pick tables
0:12
So to load data from HDFS file we need to use the load command
0:17
So the command will be load. It will take the data and split it by delimiter and stored as a given schema
0:25
So depending upon the given schema, the data will be stored. considering the delimiter for the splitting purpose
0:32
The load syntax will be this. So, the command syntax will be like this. So, variable name is equal to load input path, then using function as schema
0:43
So, this is the syntax for the load command. The function area, we can choose among peak storage
0:51
You can go for bin storage method. We can go for JSON loader, text loader, and in this way
0:57
And to see the table data, we need to use the dump operator
1:01
So, dump operator will display the current content of the table. So note will be there that we should start our Hadoop at first before issuing all these commands
1:11
So, I think for the better understanding, let us go for one practical demonstration to clear this concept that how these commands can be executed in the PIG environment
1:21
At first, we shall start Hadoop. so control alter t to initiate one new terminal here so to start hadup we should type dollar
1:33
hadup underscore install all in capital letters slash sbin slash start hyphen all dot sch so this is a
1:47
shell program which are which is supposed to get executed to start hadoop on our system
1:54
Now to check whether Hadoop is executing properly or not, we are executing the command
2:01
GPS and you can find that node manager, name node, resource manager, secondary name
2:08
node and data node all the processes are executing. Now we shall initiate our GUI interface of Hadoop So we are opening our browser In the URL we are typing local host local host colon 5 0070 slash you see the name node related information
2:36
is coming here. All the name node related information they are coming. So
2:43
Hadoop is operating fine. You can go for utilities, you can go for browse the file system and all the default folders and the files are coming here so
2:53
that indicates that my Hadoop is executing perfectly it has got loaded and initiated into the memory
3:01
now we shall show you how to run PIG on our system so we are going to open one terminal
3:06
so control altered T there is a shortcut and to open a PIG to use the local data in that case
3:14
the command will be PIG minus X local that means it can access the local data of the Ubuntu Linux but if I want to
3:23
access if I want to access the Hadoop data from the SDFS I shall issue only PIG
3:29
here so you see at the end we have got the grand shell here so now we shall
3:35
go for so this is a grand shell so at first we're going to have we're having
3:42
one data file and from the data file we shall load our data
3:46
onto a peak table so let us open another terminal here so here we are having one
3:53
data file which is there like this one so under this this G-y of this Sdf sdf s we
4:00
are going for this Hadoop my files from the root root of this Hadoop we are having
4:05
this Hadoop my files under this we're having student underscore info c sv so let us
4:11
see that what is the current content of it what is the current content so we shall go for SdFS
4:16
s df s dfs minus cat so that is the Hadoop route then Hadoop my files and then
4:28
student underscore info dot c sv you can find that the current content of the
4:36
file is this one that is where we have here we having five rows are there Amit Electronics Kolkata Dinesh Computer Science Chennai in this way we having five rows And this is the current content We shall fetch this content onto the pick table
4:50
So how to do that one? I'm just closing this terminal. So this is about grand shell
4:59
So we shall open, we shall create one peak table, that is the table, say, student
5:05
So student is equal to load. I can write this load. in capital letters also we can go for this haddub so what is the path here so
5:15
we shall go for this hardu my files m and f m and f these two letters will be
5:21
in the capital then the file name is student info the file name is student info
5:27
c sv so the file name with the path has been completed so I shall close
5:33
this single code then we shall go for this using pig storage p
5:39
capital and storage as capital so pick storage so I should have this one as
5:47
comma should be enclosed within single quotes and then we shall go for as name
5:55
the first will have the name name there is a card array then you shall go for
6:02
the second one there is a major that will be of the type of car
6:09
array then you shall go for this city that will be of the type of car array so I've
6:16
given three field names name major and the city so I'm going for enter so
6:22
student is equal to student is the big table going to get created is equal to
6:26
load and then we're having this had do my file let me check once whether it is
6:31
file or files yes it is Hadoop my files that is so let us go for this Hadoop my files
6:38
then we are having slash we're having slash student underscore info c sv that is the name of
6:46
the file should be kept within the single quotes using pick storage p and s capital
6:51
and then within bracket comma has to be enclosed within single quotes as name
6:56
card array comma major that is a card array and the city there is our car array so I pressing enter here so I think it has got created so to see the table description you can go for describe we can go for describe
7:14
then table name that is a student pick table name so you can have this name
7:19
of the type of card array major of the type of card array city of the type of
7:23
card array we can also write this describe in capital letters that will also work
7:31
Okay, now I want to see what is the current content of the pick table student so I shall go for this dump and student
7:40
Dumb student and then I shall give the semicolon so in the ground shell we're running all this queries in the grand sale
7:50
Yes, we're having all this data which has been fetched onto this where it has been fetched onto this
7:56
Pick table that is our student so it has got faced now we can limit the output if I want to see only the three tuples in the output then we
8:07
can use in this way I want to see only the three tuples in the output in that case
8:11
I shall go for say limit so I'm writing this one as LMT is equal to we can go
8:16
for this limit student three we can go for this limit student student is a
8:23
peak table name and three now if you go for this dump if you go for this dump
8:30
limit you can see the output that we're having only the omit dinesh and the
8:37
kussel tapples here omit dinesh and kusel so here also you can make this one say
8:43
I'm changing this one if you write this query in the command in this capital
8:49
later then also it will work so if I take this one as four if I take this one
8:54
as four so limit student four now if I go for this
9:00
dump LMLMT there is a limit you can find that we're having getting only the four
9:06
tuples are getting printed out of five so in this way we have shown you that how to create
9:11
a pick table and how to see the description of the table using the described comment
9:16
and how to see the content of the table using the dump comment and how to see partial
9:22
portion of the table using the limit comment thanks for watching this video
#Computer Education
#Computer Science
#Programming