Build a PHP Web App to Extract Text From PDF Document in Browser Using HTML5 & Javascript
Jan 9, 2025
Official Website:
https://freemediatools.com
Show More Show Less View Video Transcript
0:00
uh hello guys welcome to this tutorial
0:03
so in this tutorial we will look at a
0:05
application that I developed in PHP
0:08
which allows you to actually extract the
0:11
text from a PDF document and display it
0:15
inside the browser in a text area so you
0:18
can see right here on your screen this
0:20
is actually the application we will be
0:22
developing we will allow the user to
0:24
Simply select any PDF file from their
0:27
local computer and then we will see
0:30
simply extract the text which is written
0:32
and display it inside this text area
0:34
where they can simply copy to playboard
0:36
the text so now let me show just show
0:39
the example right here I take an
0:41
example uh this example look I take here
0:45
so if I open this PDF document you will
0:48
actually see uh this actually some text
0:51
is there so there are two pages in this
0:54
PDF document some text is written this
0:56
is actually a heading this is simply a
0:58
paragraph on the first page and some
1:00
more text on the second page so now my
1:03
job is to actually extract all this text
1:06
and display it in the browser with the
1:08
help of the PHP application so I'm
1:10
running it on Local Host extract text on
1:13
my exam Apache server you can see it is
1:16
running on or Local Host at Port number
1:19
so my job is to actually select a PDF
1:22
document and if I select that sample.pdf
1:25
and then as I click the button extract
1:27
text so what should happen guys you will
1:29
be basically see all that text is
1:32
extracted and you will see that all this
1:35
text is displayed in the text area now
1:38
you can simply copy this text copy to
1:40
clipboard like this all the text is copy
1:43
to clipboard so you can see that guys
1:46
all the text which was present inside
1:48
that uh PDF document has been extracted
1:52
and it is actually been copy to
1:54
clipboard and let me take an ex another
1:56
example you can take uh any p document
2:00
let me take this one bulk file.pdf click
2:04
on extract text you will see all that
2:07
text which is there in the PDF document
2:09
has been successfully extracted and this
2:11
is very much useful guys very useful
2:14
application whenever you want to
2:15
actually extract all the data from the
2:18
PDF document uh you can use this
2:20
application there are a lot of tools
2:22
available on online which does the same
2:25
process so I will show you in this video
2:27
how I do did that and and uh let me
2:31
start building this application so the
2:32
very first thing you need is exam
2:34
control panel in order to run PHP
2:36
applications in the browser so exam
2:39
control panel is a crossplatform
2:41
software and it is available for Windows
2:44
Linux and Mac operating system so
2:46
depending upon your operating system
2:47
simply download it I have already
2:49
downloaded and you can see my Apache
2:52
server is run running on local host at
2:54
Port number
2:57
so we will simply first of all write our
3:01
HTML interface so I will simply delete
3:04
everything from here and start from
3:06
scratch we are also using Ajax to
3:09
actually send out request uh and we are
3:13
actually using bootstrap you can see we
3:15
have included the CDN for bootstrap and
3:17
J cury so we are using jqy to actually
3:20
send out post request to our backend
3:23
server which is PHP code right here so
3:25
this is actually the PHP script which
3:27
extracts the text from a PDF document
3:29
let me delete this also so we have a
3:32
simple index.html file guys so I have
3:34
simply included the CDN of bootstrap and
3:37
J J
3:39
cury so I've included the bootstrap of
3:43
bootstrap CSS and J cury CD and simply
3:46
include that and now I will start
3:49
writing the code step by step so the
3:51
very first thing guys we need to do
3:53
right
3:54
here we now need to actually write the
3:58
actual interface so div tag will be
4:00
their container
4:01
mt5 so inside this we will have a simple
4:05
heading and this heading will be PDF
4:08
text
4:12
extractor there is a heading right here
4:14
and inside the form here we will
4:16
basically give it an ID
4:18
here which is PDF form so inside that
4:22
form guys we will be having uh a label
4:26
where we will actually allow the user to
4:28
actually upload or select a PDF file so
4:31
we will simply say to the user choose
4:33
PDF file like this and then we will
4:36
simply say input type will be type will
4:40
equal to file and then we will give it a
4:42
bootstrap class of form control so if
4:45
you just refresh your application guys
4:47
if I go to that you will see this PDF
4:49
text extractor in the we can make it in
4:52
the center screen by adding a bootstrap
4:54
class of text Center so it will appear
4:57
in the right in the center and we have a
4:59
choose file button where we will allow
5:00
the user to only accept PDF files we
5:03
will say accept validator dopdf and it
5:06
should be
5:07
required and we will also be giving a
5:10
name parameter to it so that we can
5:12
Target in PHP name PDF file that's all
5:16
so you will now see it will only accept
5:18
PDF
5:19
files only accept PDF files so with the
5:22
help of this accept parameter. PDF so
5:26
after this guys we will simply be having
5:28
a simple button and this button will
5:30
simply say extract text and we will be
5:33
giving uh some bootstrap class to it
5:36
which is BTN BTN
5:39
primary so this will be a blue color and
5:42
it will be button type submit so if you
5:45
just see there will be this button added
5:47
extract
5:48
text and after this we will simply
5:53
be having a div
5:57
tag so here inside this we will be
6:00
displaying uh the extracted text from
6:04
the PDF in a text area so this will be
6:07
only read only you can't modify it so I
6:09
will simply type readon attribute right
6:12
here and we will be giving an ID to it
6:15
so that we can Target in JavaScript
6:18
simply give it an ID here which is
6:20
extracted
6:22
text that's all so rows will be 10
6:26
columns let me remove that columns like
6:29
this
6:30
so it will be right here a text area
6:33
will be added right here we do need to
6:35
add a class here of form control
6:38
bootstrap
6:42
class so now you can see text area is
6:45
added of only read only and just below
6:48
that we will have a simple button to
6:49
actually copy to clipboard so whenever
6:52
you click this button all the text will
6:54
be copyed to clipboard so we'll give it
6:58
an ID here copy to clipboard and you'll
7:01
give it a simple class of bootstrap BTN
7:03
BTN secondary
7:08
mt2 so there will be this button
7:12
added gray button copy to clipboard so
7:15
this is actually interface is complete
7:17
guys now we will simply write some
7:18
JavaScript code to actually make our uh
7:21
I will simply make a script.js file
7:23
right in the same directory and uh
7:26
inside this Javascript file guys we will
7:29
will actually make uh a post request to
7:31
the PHP script so document. ready
7:34
function we will use of actually jQuery
7:37
when all the elements are loaded in the
7:39
Dom we need to Simply handle the form
7:41
submit event so we have given this form
7:46
a ID so if you see in the index for HTML
7:49
we have given this form an ID here PDF
7:51
form so we are simply targeting the form
7:53
right here and we are binding a onsubmit
7:55
event handler so this is available in J
7:58
cury so this
8:00
we'll actually trigger a call back
8:02
function so inside this call back
8:04
function guys we will simply first of
8:05
all say event. prevent default to
8:08
actually prevent the auto submission of
8:10
the form prevent default and then we
8:12
will simply get the form data by inst
8:15
instantiating a new form data right here
8:17
and passing this as an argument right
8:19
here so now we need to send the Ajax
8:23
request to the PHP
8:26
script so now to do this we need to now
8:29
make a simple Ajax request and this will
8:32
be URL will be same it will be present
8:36
in the same directory the script
8:38
extract. PHP so whatever is the location
8:40
you need to give the location so it is
8:42
present in the same directory so I will
8:43
simply say extract. PHP the second
8:46
parameter is the actual method type so
8:49
type will be post because we are posting
8:51
some data we are sending data as well in
8:54
this request so it will be a post
8:56
request content type will be uh false
9:01
false and process data will also be
9:04
false these are two Boolean parameters
9:06
you need to give and success call back
9:08
so whenever the S uh request is
9:10
successful this success call back will
9:12
actually have the response coming back
9:14
from the PHP script so we need to
9:16
console log this response in the browser
9:18
console just to check if the script is
9:21
working so we need to now actually make
9:24
a the script right here in the root
9:26
directory extract. PHP So currently it
9:28
is empty now we do do need to make some
9:31
code right here so we'll write some PHP
9:33
code right here so inside this uh this
9:37
is actual syntax of PHP and here we will
9:39
have this ifs condition that ifs is set
9:42
dollar files and here we need to give
9:45
the name parameter guys if you see in
9:47
the index. HTML we have given this name
9:49
parameter PDF file to this input field
9:51
so we are targeting this input field
9:53
right here using this is set function so
9:56
we are simply telling if this exist and
9:59
underscore
10:01
files PDF file
10:08
error and then
10:10
upload error
10:20
okay uh let me see I think I made some
10:24
type of typo mistake so let me
10:28
just yeah so this is like this it said
10:31
files PDF file and files PDF file error
10:34
is equal to upload error okay so in this
10:36
case if the if no there if there are no
10:39
errors then we simply need to process
10:40
this file which is it will be present
10:44
inside the temporary path because we are
10:46
not uploading the file we are just
10:48
storing it in a temporary path so by
10:50
default it will be stored inside this
10:53
temporary path PDF file and there is a
10:56
property in PHP which is temporary name
10:59
so we are not uploading this file in the
11:01
backend server of PHP so we are simply
11:03
getting this file from the temporary
11:05
path which is available by default in
11:07
PHP so we are getting it this path here
11:09
storing it inside this variable and now
11:11
we will Define a simple function to
11:13
actually
11:15
extract text from PDF file so here we
11:19
will Define this
11:20
function so which will actually extract
11:23
so we will Define the function
11:26
extract text from
11:29
PDF and we will basically get an
11:31
argument right
11:33
here PDF file path so this will be a
11:37
parenthesis set of curly brackets in the
11:40
function definition so you'll give it an
11:43
output and this will be a simple array
11:46
and just make sure that you put
11:47
semicolon at the end of each statement
11:49
to actually prevent any errors and now
11:51
we need to Simply command extract
11:54
text from PDF and we are actually using
11:58
this utility guys pdf2 text utility so I
12:01
have already installed this utility on
12:03
my command line so pdf2 text this is
12:06
present inside my command line you can
12:08
simply install this uh if you see just
12:10
type here pdf2 text it is available uh
12:14
you just need to go to their GitHub Depo
12:17
PDF to text so simply go to their GitHub
12:21
Depo simply install the exe file which
12:23
is present it also is a uh python module
12:26
as well PDF to txt so if you just write
12:30
here P pdf2 text PHP kitup so it's a you
12:34
can see that so this is actually
12:38
a you can like this also so it is
12:42
present inside your command link you can
12:45
simply install it PDF to text utility so
12:48
using this utility guys we can simply uh
12:51
actually make a
12:54
execute and we can execute a simple
12:56
Command right here
12:59
PDF to
13:06
text so this is actually the command
13:09
guys let me just show
13:14
you so we are actually using the execute
13:17
function guys PDF to text followed by
13:19
the PDF file path that we get in the
13:22
argument and then the output output is
13:25
basically an array right here so
13:26
whatever text will be extracted now we
13:28
need to simp simply store this text
13:31
implode you will use the implode method
13:33
in PHP and we can implode it in a new
13:35
line character and then the second
13:37
argument will be the actual output now
13:40
we need to show this in the browser so
13:42
we will simply return this text from
13:43
this function that's all this function
13:46
basically extracts the text from the PDF
13:48
document and return the text right here
13:50
now we need to Simply call this function
13:53
so we can simply say extracted text is
13:56
equal to extract text from PDF and here
14:00
we will simply pass the path here which
14:02
is PDF file path PDF
14:05
file temp
14:08
path and simply we will Echo out Echo
14:12
out we will simply return back to the
14:14
client so to the front end extracted
14:17
text that's all and in the else block if
14:19
no file is provided then we will simply
14:22
say to the user that
14:24
invalid error happened because there is
14:28
invalid PDF file that's all now this is
14:32
actually the PHP script guys it is
14:34
actually taking the input PDF file and
14:36
it is actually extracting the text and
14:38
returning back to the client and now we
14:41
need to Simply display it uh if I just
14:43
show you in the
14:46
uh if I basically uh select my PDF
14:50
document click on this button and check
14:52
my console you will basically see error
14:55
invalid PDF file let me check
14:59
just refresh
15:06
here so you will basically see it is
15:09
coming this error error invalid PDF file
15:12
so
15:15
script uh I think there is some some
15:17
kind of error is there in this PHP
15:20
script that I written so what I will do
15:22
I will simply copy this
15:36
and paste
15:38
it I have given the name parameter as
15:41
PDF file if you
15:48
see let me add a High Vol alert
15:52
statement so that uh let me cross check
15:54
if it is working you will see high world
15:56
is printing out so our application is
15:58
correct let me select it click on
16:00
extract
16:02
text so again it is returning invalid
16:05
PDF file
16:07
uh let me uh what I can do is that I
16:14
can
16:15
paste this script tag in the same
16:19
directory I
16:21
think right here add
16:26
this so you're making a post request you
16:28
will see that extract. PHP form data
16:31
false false okay I think we eliminated
16:34
the data part right here you will see
16:36
that guys this is actually the error I
16:39
forgot to add our data property so we
16:42
are not passing the data right here so
16:44
we need to pass the data as well we just
16:47
forgot to do that that's why it is
16:49
creating that problem so in the post
16:50
request whenever we are sending the post
16:52
request we do need to add this data
16:55
property as well so whatever data that
16:56
we are sending so in this case we are
16:58
sending the whole form data object put a
17:01
comma so if you just make this uh
17:03
mistake you will see so if I now select
17:07
click on extract text and go to console
17:10
you will basically see all this data
17:12
will be returned to me in the console
17:13
now we just need to display this data
17:16
right inside our uh text area so now
17:21
inside our JavaScript it is very easy
17:23
for me simply after the statement in the
17:26
cons uh success call back we can simply
17:29
say targeting the ID extracted text we
17:33
have given to the text area and the
17:35
value will be equal to response that's
17:37
all so all the data will be now be
17:40
displayed in a if I select the PDF
17:44
file click on extract text you will
17:46
basically see all the data is displayed
17:48
in the text area and now we also need to
17:50
bind a onclick listener to this copy to
17:52
clipboard Button as well so whenever
17:54
someone clicks this copy to clipboard
17:56
button all the data should be copied to
17:58
clipboard we'll bind a basically
18:04
a on click
18:16
list so we are basically ex uh targeting
18:19
it by ID that we have given extracted
18:23
text then we will select all the text
18:26
which is there in the select in the text
18:29
area by using this function
18:31
select and then we can simply execute
18:34
this command execute command which is
18:37
copy copy command and simply show alert
18:40
statement text copied to clipboard
18:44
that's all this completes the
18:46
application guys so I can simply
18:47
eliminate this alert statement right
18:49
here and refresh the application choose
18:51
your favorite PDF file click on extract
18:54
text you will see all the EX text has
18:57
been extracted from the PDF document and
18:59
displaying it in the text area uh now we
19:01
can click this button all the text is
19:03
copyed to clipboard so we can simply
19:05
copy paste uh in any any location that
19:08
you want to Let's suppose I create this
19:10
file paste it you will see all
19:13
this text has been copy pasted so in
19:16
this way guys you can basically build
19:19
out this awesome little application in
19:21
the browser using PHP to actually
19:22
extract text from your PDF
19:25
document thank you very much for
19:26
watching this video please hit that like
19:28
button subscribe the channel and I will
19:30
be seeing you in the next video
