Python 3 pypdf Script to Extract Text From PDF Document and Save it as TXT File
260 views
Jun 3, 2025
Get the full source code of application here:
View Video Transcript
0:00
uh hello guys uh welcome to this video
0:02
so in this video I will show you a
0:04
Python package which allows you to
0:06
extract uh all the text from the PDF
0:10
document uh this is actually the package
0:12
here which is py pdf this is actually
0:15
the package name uh let me show you the
0:18
package here if you go to the website of
0:21
python uh just search for this package
0:23
py pdf it's a very popular package for
0:28
extraction of text from PDF documents so
0:32
this is actually the package the command
0:34
is simple you simply install this by
0:36
executing this command after installing
0:39
it I will now show you basically we have
0:43
taking a simple PDF document which
0:46
contains two pages the first page right
0:49
here you see some text is
0:52
there now this is actually the second
0:54
page here so two pages are there so once
0:57
I execute this Python script so what you
1:00
will see it will actually extract all
1:04
the text and save it inside the txt file
1:08
so let me execute this script here app8
1:12
py so you will see it will extract all
1:15
the text and print it inside the
1:17
terminal and it will also save it inside
1:19
the txt file so as you can see all the
1:22
text has been successfully saved right
1:24
here this is page one this is page two
1:28
so now let me show you the Python script
1:31
here
1:34
uh so first of all I will show you step
1:38
by
1:45
step so I will show you step by step so
1:49
just wait let me
1:52
just So first of all you just need to
1:55
import this module so we simply import
1:59
this py
2:00
pdf and from this we just need to import
2:04
this uh pdf
2:11
reader after that we just need to
2:14
initialize it so this PDF reader and
2:17
here we just need to pass the actual PDF
2:20
file that you're working right here so
2:22
we PDF file is present right in the same
2:25
directory which is sample PDF after you
2:28
initialize this you can
2:31
actually calculate the total number of
2:33
pages by using this length
2:36
function and it contains this reader dot
2:40
pages this is actually an array here
2:43
which will return the number of pages
2:45
inside this PDF so if I just execute the
2:49
script
2:51
here so it will print out which is two
2:54
so it will calculate the total number of
2:57
pages and print it out after that we can
3:00
specifically extract the text from a
3:04
specific page here let's suppose you
3:06
only want to extract the text from the
3:09
very first page so what you
3:13
do you first of all get access to that
3:17
page and then to get get the actual text
3:22
it contains this function extract text
3:26
so we simply call this function and then
3:29
we print out the actual text in the
3:32
first page because here we are providing
3:34
the first page here which is zero so if
3:38
I execute this now you will see it will
3:40
print out all the text which is present
3:42
in the first page and it is outputting
3:45
it inside the terminal and now we can
3:48
save all the text in the txt file by
3:51
using the open function so we can call
3:54
the open function and we can make a new
3:57
file result.txt in the write mode and in
4:02
inside
4:05
encoding
4:11
UTF8 so here we can actually call the
4:14
function
4:16
we can loop
4:18
through all the
4:22
pages reader dot pages and loop through
4:26
all the pages and extract
4:28
it by using this function extract text
4:32
and then we use the write function
4:35
f.right to write the
4:40
content to write the text
4:46
if so like
4:53
this so here we are looping through all
4:56
the pages and extracting the text and
4:59
writing it inside our result txt so if I
5:05
just execute you will see it will create
5:07
this file uh which
5:11
is result.txt
5:34
So you'll see it will create this file
5:37
output.txt this is actually your first
5:39
page this is your second page it has
5:42
successfully extracted all the text from
5:44
the PDF document so we have taken this
5:46
example which contains two pages in the
5:49
PDF it extracted all the text from the
5:52
PDF document so you can see guys you can
5:55
use this package very easily simply
5:57
install this and uh I've shown you a
6:00
very simple example in this video uh
6:04
thank you very much for watching this
6:06
video and also visit my website
6:10
freemediattools.com
6:12
uh which contains thousands of tools
#Programming
#Scripting Languages
#Software