0:00
uh hello guys welcome to this video so
0:02
in this video I will show you a NodeJS
0:04
package which allows you to extract data
0:07
from a PDF file and save it as a txt
0:09
file so the name of the package is PDF-
0:23
psefile PDF so it contains multiple of
0:26
pages right here it contains some text
0:28
so when I run this NodeJS script it will
0:31
extract all the data and actually store
0:34
that data in a txt file so if I run this
0:39
index.js so here we just need to specify
0:42
the output directory sorry
0:45
the PDF file so we simply write here
0:48
bulk file PDF so we provide the path of
0:50
the file and now if I run the NodeJS
0:54
script here so you will see on the left
0:55
hand side it will create PDF to text
0:58
conversion complete and it will create
1:00
this output.txt file so now it will
1:04
extract all the content in the PDF and
1:07
save it as a TXT file so now let me show
1:10
you first of all you just need to
1:12
install this module so we simply need to
1:15
install this third party module of
1:20
PDF-P i've already installed it so now I
1:24
will show you step by step so for this
1:26
you just need the file system module as
1:28
well and then you simply require this uh
1:32
third party module PDF parts after
1:36
requiring it you just need to
1:41
file so our input PDF file is present in
1:45
the same directory so we'll use the read
1:47
file sync method to read the input files
1:53
so our input file name is bulb file so I
1:56
will just provide the path here after
2:00
providing the input PDF file here you
2:02
just need to say PDF we simply call this
2:06
uh third party method uh package right
2:09
here and here we specify the
2:12
actual PDF content here which is located
2:16
in this variable data buffer and this
2:19
actually returns a promise here we can
2:21
handle this promise by dot then and this
2:24
returns the actual text which is
2:27
extracted and here we can actually after
2:31
extracting the text we can write this
2:33
file here as a txt file here so
2:37
outputtxt and then we specify
2:40
data.ext so all the data will be
2:43
extracted and it will be
2:48
and so this is data buffer sorry data
2:51
buffer just make sure the spelling is
2:53
correct and after that we can just show
2:56
a simple notification message that uh
3:03
successfully so this is the overall
3:05
script guys if I once again run this you
3:08
will see it will actually extract the
3:12
data if I change the file name here
3:18
two so it will create this file here
3:21
output 2.txt so once again it extracted
3:24
all the text from the PDF and stored
3:27
this so in this easy way you can do this
3:29
inside NodeJS to extract data from PDF
3:32
and also check out my website
3:33
freemediatools.com which contains