Python 3 Streamlit Project to Merge Multiple PDF Document Using PyPDF2 Library in Browser
Jan 9, 2025
Get the full source code of application here:
https://gist.github.com/gauti123456/b142cf4e030d6867339ab51681676db8
Show More Show Less View Video Transcript
0:00
uh Hello friends welcome to this video
0:02
so in this video we will look at how to
0:04
merge multiple PDF documents inside
0:07
python we'll be using streamlit solution
0:10
to build out web applications inside
0:12
python so this is actually this
0:15
application so if I want to start this
0:18
application streamlet run followed by
0:20
the name of the file merge pdf. py this
0:24
is a command streamlit run followed by
0:26
the name of the file so if I just enter
0:30
it this will start my application on the
0:32
port number which is 8501 so so
0:36
streamlit if you don't know it actually
0:38
provides a fast way of building web
0:40
applications inside python so it has a
0:43
pre-built components such as this you
0:46
can see on your screen drag and drop
0:47
component so here you can drag and drop
0:49
multiple PDF documents that you want to
0:52
merge into a single one so let me take
0:55
this uh PDF
0:58
document which actually
1:01
contains two pages and this one let me
1:05
take the second one which contains 23
1:08
pages so if I want to merge these two
1:11
documents into a single one so you can
1:13
drag and drop multiple PDF documents
1:21
so so if I drag and drop these four PDF
1:24
documents click on this button merge PDF
1:27
so it will you see after we get this
1:30
download merged PDF button so if you
1:34
open this all your PDF files have been
1:36
merged into a single one and it has been
1:39
downloaded as an attachment so if you
1:40
open this now you will see it contains
1:44
51 pages so all the PDF documents have
1:47
been merged into a single
1:49
one so we will build this web
1:53
application in this video so for doing
1:56
this we are using an open source python
1:59
Library which is py pdf2 so this is
2:02
actually the package which is a pure
2:05
python Library related to PDF so the
2:09
command is really simple pip install py
2:11
PDF 2 you need to install this Library
2:15
so first of all we do need to install
2:18
streamlet as
2:20
well and then py PDF 2 so these two
2:23
packages are required for this
2:24
application so streamlit and py PDF 2
2:30
so
2:31
streamlit this is actually a faster way
2:34
to build web application it's a very
2:36
popular solution it's op source so it's
2:39
completely free and it contains
2:41
pre-built components that makes web
2:44
application easier to
2:46
build and just create this simple python
2:50
file and we will import the streamlet
2:54
package as
2:56
St and then we need to import the py PDF
2:59
two package and from that we need to
3:02
import the class of PDF
3:05
merger which is a class which will be
3:08
responsible for merging PDF multiple PDF
3:11
documents into a single one and then
3:14
from input output we need to import
3:17
bytes IO so input output is a built-in
3:20
python
3:22
package so here now we need to give the
3:24
title of the app so in this easy way you
3:27
can give the title merge multiple PDF
3:30
into
3:32
one so if you want to now start this
3:35
application streamlit has a command
3:38
which you can simply write stream lit
3:40
run followed by the name of the file
3:42
which is merch pdf. py so this will
3:46
start your application on port number
3:49
8501 so you will get this title
3:52
appearing right here you will see on the
3:53
center of the screen so after this we
3:58
can just write small little descript
4:00
about this app which
4:02
is upload
4:05
multiple PDF
4:12
files and this app will merge them into
4:16
a single
4:19
PDF so it automatically support Auto
4:22
refresh so you don't need to restart
4:24
application every time you make changes
4:26
so it will automatically restart and you
4:28
will see this live preview
4:30
so after
4:32
this we will allow the user to have this
4:35
file uploader component drag and
4:40
drop so streamlit have this uh
4:44
pre-built you can use this method which
4:47
is file undor
4:49
uploader so here you can just provide a
4:53
title choose PDF files the second one
4:56
will be the type of files that you want
4:58
to accept we need to only accept PDF
5:00
files so this will be the second
5:01
argument type is equal to PDF and then
5:05
the third one is accept multiple files
5:07
so it's a Boolean parameter so if you
5:09
want to accept multiple files you will
5:11
put it to true so that's all so if you
5:14
just refresh now you will see this drag
5:16
and
5:17
drop files component so here the user
5:21
can select multiple files so this is
5:23
actually the advantage of using
5:25
streamlit because if you are developing
5:27
it from scratch it will take a long long
5:30
time to develop this component so
5:32
streamlet has a built-in component so
5:34
with a single line of code you can
5:36
[Music]
5:37
see you have generated this
5:41
functionality so after we do this we can
5:44
have a if condition that
5:47
if there is a button which is merge
5:51
PDF so we will have this Dynamic button
5:54
which will be merged PDF so we are
5:55
simply telling that if this button exist
5:58
and if this button has been clicked by
6:00
the user then in that case we need to
6:02
Simply check if the files have been
6:06
uploaded and then we will be
6:08
initializing the PDF merger for doing
6:12
this we will be using the PDF merger
6:14
class that we imported early on from py
6:18
PDF
6:20
2 and then we will simply need to
6:24
iterate
6:26
through uploaded files and add
6:31
to the merger so for doing this we'll be
6:35
simply using a for Loop for PDF file in
6:38
the uploaded files array and for each
6:42
PDF we will use the append method in
6:45
this object merger.
6:48
append and we'll append this PDF file so
6:52
essentially what this we are doing right
6:54
here in this for Loop we are essentially
6:55
looping through each PDF that the users
6:57
select and just merging
7:00
into a single output file so after this
7:04
for Loop we will create
7:08
a output file variable and you'll simply
7:11
say
7:12
byes we simply saving this merge PDF
7:16
into a by side input output buffer and
7:19
then
7:21
merger it contains a right method and
7:24
we'll be writing this merge PDF like
7:27
this
7:34
so now we can close the connection so we
7:37
can simply call the close
7:41
method and just move to the first page
7:43
of the PDF document
7:46
by the seek
7:49
method seek to the first page so first
7:52
page reflects a zero as index indexing
7:55
start from zero so it will move to the
7:56
first page and now we will need to allow
7:59
the user to download so you need to have
8:02
a download button so streamlit has a
8:05
prebuilt functionality you can use this
8:07
function download button so this
8:10
function take four arguments first is
8:11
the label of the button so you'll simply
8:13
say download
8:16
merg PDF the second argument is the
8:20
actual data that you want to download so
8:22
in this case we need to download the
8:23
merged
8:25
PDF the third argument is actually takes
8:28
the file name
8:30
so file name will be any name that you
8:34
can give so you'll simply say merged
8:37
pdf.pdf and the fourth one is the MIM
8:39
type so MIM so for PDF document it's
8:44
application SL PDF that's all so this is
8:48
all that you need to
8:50
do so just move it into a single
8:53
line so this is actually the function
8:56
which takes four arguments the download
8:58
button so if if you refresh
9:00
now and if you select here you will see
9:05
automatically this button will appear
9:06
merge
9:10
PDF so we select these four PDF
9:13
documents and if I click this button
9:16
merg
9:17
PDF so it is saying that merged PDF is
9:21
not defined
9:30
uh sorry this needs to be merge PDF
9:36
because this is merged PDF variable that
9:39
we have declared so just make this slide
9:41
modification
9:42
so again if you select your
9:46
for PDF click on this
9:50
button so again it is saying merged PDF
9:53
is not
9:56
defined I think I
10:01
sorry here again you can see this needs
10:03
to be marched
10:21
PDF sorry this arguments needs to be
10:24
file underscore name so essentially I do
10:27
this mistake every time so this need
10:29
needs to be file uncore name this is
10:37
argument so now if you click the merg
10:40
PDF button you will see this button
10:42
appearing download merged PDF so if you
10:44
click this button so all your PDF files
10:48
selected has been merged into a single
10:51
output PDF so all the content of all the
10:54
files have been merged into a single
10:56
file so in this easy way you can develop
10:59
this python web application using
11:01
streamlet which merge multiple PDF files
11:05
into a single PDF file using the pypdf2
11:09
library so thank you very much for
11:11
watching this video please hit that like
11:13
button subscribe the channel and I will
11:16
be seeing you in the next video and do
11:17
check out my website as well free mediat
11:19
tools.com which contains uh thousands of
11:23
free audio video and image
11:27
tools and I will be seeing in the next
11:29
video
#Programming
#Open Source
