Python 3 Tabula Script to Extract Tables From PDF as Dataframes & Export as CSV File
162 views
Jun 3, 2025
Get the full source code of application here: https://codingshiksha.com/python/python-3-tabula-script-to-extract-tables-from-pdf-as-dataframes-export-as-csv-file/
View Video Transcript
0:00
uh hello guys uh welcome to this video
0:02
So in this video I will show you a
0:04
Python uh script which will extract all
0:07
the tables from the PDF document and
0:10
return that inside a list of data frames
0:14
For this we are using a library called
0:16
as tabula which is an open-source
0:19
library So we are taking a very simple
0:21
example So we have this PDF file right
0:24
in the directory itself and which
0:26
contains this table as you can see So we
0:31
have a tabulate structure of data right
0:34
here which inside this PDF document So
0:37
if you want to extract this uh table
0:40
from PDF So I've just written the script
0:43
here Uh if I execute the script it will
0:46
extract the data and save it as a CSV
0:49
file So let me just execute this script
0:52
here Python app py So first of all you
0:55
just need to install this uh module
0:57
which is
0:59
tabula py So this is the command here So
1:02
pip install tabular-
1:09
py So once this module installs you can
1:13
actually use it inside your python
1:15
script here
1:17
So now we can
1:19
actually execute the script
1:23
here So you will see total table
1:26
extracted is one and it has been saved
1:30
to CSV file here So you can see it has
1:34
created the CSV file in the left hand
1:36
corner of the
1:38
screen All the data is extracted from
1:41
the PDF file here So in this easy way
1:44
you can extract uh tables from PDF
1:48
really
1:50
easily and save it as a CSV file
1:54
So now let me show you how to use this
1:58
So the full script is given in the
1:59
description of the video So first of all
2:01
you need to import the tabula package
2:05
like this and also we need operating
2:07
system package as
2:10
well And then after that we just need to
2:14
read the PDF file by using this module
2:17
tabula and it contains this function
2:20
which is uh read
2:23
PDF So which reads the actual PDF file
2:27
So here you need to provide the name of
2:29
the file sample
2:31
PDF
2:33
pages all So it will read all the pages
2:36
one by
2:37
one and it takes three third option here
2:41
multiple tables set to true So this
2:44
means that if uh the PDF contains
2:46
multiple tables then it will read all
2:48
the tables one by
2:50
one and return them as data frames So we
2:55
can print this out
2:58
here So if you just execute
3:02
this so it
3:04
will read the actual table from the PDF
3:08
file and return them inside uh this data
3:12
frames Here you can see it is printing
3:15
out And now we just need to export this
3:19
to a CSV file So it's really easy to do
3:25
So here we can print out how many tables
3:28
are extracted So we can
3:32
just execute this length function So it
3:35
will return how many tables are there
3:37
After that we can
3:39
create a output
3:43
folder So which will be
3:45
tablesore
3:47
CSV So then we can make this directory
3:51
by operating system make directories
3:54
function
3:59
And after that we
4:01
will loop through all the data frames
4:04
here using a simple for
4:13
loop and just export this to a CSV file
4:17
here
4:31
So we just give it a file name and it
4:33
contains this function
4:35
here You can see it contains a series of
4:38
functions here Either you can export
4:40
them into CSV or Excel file
4:44
So you can export them into
4:59
CSV So this completes the script here Uh
5:03
so this tabula is very simple You first
5:06
of all import that Then you simply read
5:09
read the PDF Pass these three arguments
5:12
the name of the PDF file pages all
5:14
multiple tables true So this will
5:17
extract all the tables from the
5:19
PDF So if I execute this you will see
5:24
that so number of tables extracted is
5:28
one and it has been saved to CSV
5:34
files So in this way you can extract
5:37
tables from uh
5:39
PDF inside Python using
5:42
tabula and also check out my website uh
5:45
freemediatoolsuh.com
5:48
uh which contains uh thousands of tools