Spark Wordcount Example

Name: Spark Wordcount Example | Open Video
Uploaded: 2024-11-28T05:39:54+00:00
Duration: 10 min 44 s
Description: Spark Wordcount Example

0:00
in this video we are discussing spark
0:02
what count example in case of what count
0:05
we were having one text file and the
0:07
text file is containing multiple
0:09
different texts and we were supposed to
0:11
count that how many times one word has
0:13
occurred in the respective text file so
0:16
let us go forward some further
0:17
discussion on it and you shall also give
0:19
you a practical demonstration on this
0:21
problem how to implement what come
0:25
problem in spark using the Java
0:28
MapReduce program we have already seen
0:30
that how to count the frequency of words
0:32
in one or more than one text files so
0:35
for this example we are going to count
0:37
towards on the same file which you
0:39
selected in our MapReduce program and
0:41
which is used in MapReduce example
0:44
earlier so the file is stored on to the
0:47
HDFS and now at first we should start
0:51
about Hadoop before accessing this HDFS
0:54
files so let us go for one practical
0:56
demonstration to show you that how this
0:59
Watkyn problem can be written and can be
1:02
executed my hard-up system is on so it
1:06
is running so now here you can find that
1:08
here you are having the Hadoop root that
1:11
is the HDFS root now under this we are
1:13
having one folder that is a Hadoop my
1:15
files so there is the heart of my files
1:17
under this folder we are having one file
1:19
that is a sample underscore file dot txt
1:21
so let me show you what is the content
1:23
of the file so we shall go for control
1:25
alter T we shall open one terminal and
1:28
now we shall go for HDFS DFS - CAT slash
1:35
Hadoop my files we shall go for cat
1:42
Hadoop my files and the file name is
1:45
sample file dot txt so file image sample
1:50
file dot txt so I'm going to see the
1:53
content of the file the file content is
1:55
this one so this is a content now we
1:58
shall open our spark cell and in the
2:00
spark shell we shall execute a program
2:03
that means not a program but the state
2:05
of statements will be writing some lines
2:07
will be writing here one by one to
2:09
perform the word count problem on this
2:12
sample underscore
2:13
file dot txt so that is a purpose and
2:15
that is a demonstration we are going to
2:17
give you right now so let me let me go
2:19
for the initiation of the spark shell so
2:22
to initialize the spark shell we shall
2:24
go forward spark shell if we initialize
2:29
the spark shell then the Scala prompt
2:31
will be coming so at first we shall
2:34
create one file which will read all this
2:38
sample underscore file dot txt content
2:40
so let me open that file so let the
2:43
Scala prompt come the Scala prompt has
2:49
come so I shall go for bar sample file
2:53
is equal to SC Assistance for spark
2:57
context so text file and then we shall
3:00
give the path should be enclosed within
3:03
double quotes so HDFS then hold on / /
3:08
localhost colon 9000 so 9000 is the port
3:14
number and the folder is Hadoop my files
3:18
how do my files and the file name is
3:21
sample sample file dot txt so this is
3:28
the total path with the filename okay it
3:34
opens the txt file stored in the HDFS so
3:38
now to see the content of the text file
3:40
as an array so we shall go for sample
3:42
sample file dot collect we shall go for
3:48
sample file dot collect so to see the
3:51
content of the text file as an array so
3:54
you can find that the content is getting
3:56
shown you see the content is coming in
3:58
the form of an array you can find this
4:00
one in the form of an array now we shall
4:04
we shall split this particular content
4:07
so to split all the words which will be
4:09
separated by the blank spaces okay so
4:12
what you shall go for this impart and
4:14
then W count I can give other name also
4:18
no issues so then sample file dot flat
4:23
map
4:25
so m capital not f flat map and then
4:29
line so line dot split and here the
4:37
delimiter will be a space so I'm
4:39
enclosing this space within double
4:41
quotes and now closing it okay to see
4:46
the contents inside W count so all words
4:49
will be separated in the array so we
4:51
shall go for W count not collect you see
4:58
in the array all the words have got
5:00
separated you can find the output here
5:02
I've just marked it you can find that
5:04
output okay now to see the contents
5:08
inside the W count we issued the command
5:10
that is a W count dot collect here so
5:13
now we shall put one after each word in
5:16
the W count or DD so how to do this one
5:20
so we shall go for save hard a map
5:23
output I'm just writing this one as map
5:26
of P Matt map output W count W count dot
5:32
map and then W and here each and every
5:38
word will have one after it
5:40
I'm pressing enter to see the output
5:45
what is the value will be getting a key
5:47
value pair type of thing so let me show
5:49
you that one also so map output dot
5:53
collector so it will show show us the
5:57
key value pair type of thing you see it
5:59
is the key value pair so key is here
6:01
this and value is one so for each and
6:04
every word we have a test we have
6:06
treated that word as a key and the value
6:08
is one here okay now what we shall do we
6:12
shall call the reduce by key method we
6:15
shall call the reduce by P methods so
6:17
now let me go for and that one so Val
6:23
reduce or output I shall make this one
6:26
as a reduce for P then map of P dot
6:30
reduce
6:32
by key so by P reduce by key and here we
6:39
shall go for underscore + underscore now
6:46
what is the final output we have called
6:48
the reducer also so what is the final
6:50
output so initially we have the we had
6:52
the map output now we are having this
6:54
reduced output so to get this one I
6:57
shall go for reduce output dot collect
7:03
you can find that it is coming like this
7:06
you see it is coming like this so here
7:10
each and every key is there when the key
7:12
is unique not having the occurrences
7:16
again so it is having the frequency one
7:17
count is one but when this particular
7:20
key has got repeated for multiple times
7:22
so so respective counts are coming so in
7:24
this way where you are getting this so
7:26
now let me play let me in save the
7:28
support on to some file in the HDFS so
7:32
how to do that one so reduce reduce
7:36
output dot save as text file so there is
7:41
a method is this one so we are going for
7:45
and passing this parameter what is the
7:47
parameter that is a path so HDFS say me
7:51
make making this one as / / localhost
7:53
colon 9000 we wrote this one earlier
7:57
also so now let me decide some path some
8:01
directories which will be created so let
8:04
it be a spark output /wc spark
8:16
before going for that let me show you
8:18
that there is no folder called spark
8:20
output okay so there is no folder called
8:23
spark output or under that folder
8:25
obviously the WC spark folder will be
8:27
created but there is no folder called
8:29
spark output so what we can do so now if
8:33
I if I execute this one so there is a
8:36
reduce opie dot save as text file so we
8:40
are giving this total path with a file
8:41
name so we are giving the path actually
8:44
here the file will get created
8:45
automatically and that should be
8:47
enclosed within double-quotes so just in
8:50
putting enter so now let me show you
8:53
that the corresponding spark output has
8:56
got created under this we're having this
8:57
WC spark and under this were having this
9:00
underscore success and part - 5 0 so
9:04
this is the file which is containing
9:06
actually the output so let me show you
9:08
the output also so how to show so I
9:11
shall go for the terminals 1 second okay
9:16
so this is the terminal I'm having so
9:19
I'm going for let me come out from this
9:21
so I shall go for exit so coming out so
9:24
I've got the dollar prompt back again
9:27
clear so let me see so I shall go for
9:31
say HDFS DFS - cat to see the content
9:36
here so - cat slash so spark output so
9:43
there is a first folder under which were
9:45
having the next folder is our there is a
9:49
WC spark you can find here so WC spark
9:53
and then we are having this part - one
9:59
two three four five zeros so that is a
10:02
content we are going to get so this is
10:05
the content you can find that this is a
10:06
content so the content has been written
10:10
on to this part - five zeros so here the
10:14
content has got written so instead of
10:16
writing this fool instead of writing
10:17
this full also you can put the what
10:20
should I say that respective wildcard
10:21
characters we can put so that will also
10:24
work for us so that will not also
10:28
produce the same
10:30
so in this demonstration we have given
10:32
you the idea that what are the different
10:34
steps should be followed to execute in
10:36
the word count problem on to a text file
10:39
in our spark shell thanks for watching
10:42
this video

Spark Wordcount Example

Tutorialspoint

Understanding the TAKE function in Excel | Simplify Data Extraction!

How to use comments and notes in Excel | Enhance Collaboration and Clarity!

Simple PDF to Excel Tutorial (Quick & Easy) | Convert Data in Seconds!

Spark Word Count Example | Scala | Spark

APRENDE japonés con ANIME - Kimetsu no Yaiba Yuukaku hen 06 (Segunda Temporada) gramatica JLPT N5 N4

Word 2010 Features

How to Add Animated Typing Text Effect in Wordpress

How To Find Website Source Code On Moxilla Firefox

The Ultimate Free AI Tool for Research Kilocode AI

The Ultimate Free AI Tool for Research: Nipkin AI

Top CIA Tools Revealed: Malware and Espionage Secrets

Unlock Anything: 5-Minute Crafts Lock Picking Secrets!

Up next in 10

Spark Wordcount Example

Tutorialspoint