Python 3 OpenCV & Google Tesseract OCR Example to Extract Text From Image & Save it as TXT in CMD
76 views
Jun 3, 2025
Get the full source code of application here: https://codingshiksha.com/python/python-3-opencv-tesseract-ocr-example-to-extract-text-from-image-save-it-as-txt-in-terminal/
View Video Transcript
0:00
uh hello guys welcome to this video so
0:02
in this video I will show you a Python
0:04
script which will actually extract a
0:07
text from an image file so you can see
0:10
we have a sample image which contains
0:12
some code screenshot it contains some
0:15
code so what we will do this Python
0:18
script will automatically detect and
0:20
extract all the text from this image and
0:22
save it as a txt file and for this we
0:25
are using the Google optical OCR
0:28
tesseract library for automatically
0:31
detecting text from an image file so we
0:34
let me show you the script right here
0:37
and as soon as I run the script you will
0:39
see on the left hand side all the text
0:41
will be extracted and as you can see we
0:44
also using open cv to show the live
0:47
preview so we see the file has been
0:51
created extracted text.txt if I try to
0:54
open this file you will see all your
0:56
data has been successfully
0:58
extracted so from this image file you
1:01
will
1:03
see and for this we should need to
1:07
install this module which is the command
1:09
is simple pip install pi tesseract so
1:14
this is a very wrapper library for the
1:17
famous Google's tesseract OCR library
1:21
which is a very famous library for
1:23
detecting text from an image file so you
1:26
should have installed this tesseract OCR
1:29
just first of all download this software
1:31
inside your
1:33
machine so I have already installed it
1:37
so there is a direct link available
1:40
simply download this and uh once it
1:43
downloads it will be there inside your C
1:47
directory once you run the setup it will
1:49
be present inside your C directory i
1:52
have already installed it so we have
1:53
this folder tesseract
1:56
OCR and uh we
2:00
now you can also copy this whole path
2:03
and just first of all go to your
2:05
environment variables and add this
2:07
inside this path variable
2:12
as a new so simply add this path i've
2:14
already added this path so to enable
2:17
test globally inside your system so
2:19
after this uh let me just create this
2:23
all the script will be given in the
2:25
description of the video so first of all
2:27
we now need to import all the packages
2:30
open CV package and this pi test package
2:33
as well after that we just need to set
2:36
this path pi
2:39
test dot test command so here you just
2:43
need to set the full path where your
2:44
test is located so I have just copy
2:49
pasted my full path like
2:53
this and after that we can uh load this
2:57
image which is present where we'll be
3:00
extracting the text by open
3:04
CV so this is code PNG
3:10
code so once you load this image we just
3:13
need to convert this
3:16
into like this grayscale image then we
3:20
need to blur this image for blurring
3:22
this image we need to apply caution
3:30
blur and then we can put any values
3:33
right here to blur the image after that
3:35
we just need
3:37
to calculate the threshold so this
3:40
contains this function threshold and
3:42
here you just need to pass this blur
3:44
image and then 0 comma
3:48
255 cv2 dot
3:54
thresh
3:57
binary so you just need to do this first
4:00
of all before extracting the text using
4:03
open cv now we also need to remove the
4:07
noise as
4:09
well and invert the image again we using
4:13
these open CV packages these methods
4:16
which are there after we do this now we
4:19
can simply extract the data from the
4:22
image by using the tessoract package so
4:25
this contains this function which is
4:27
image to string so you will see it has a
4:32
bunch of methods but we need to use this
4:33
method which is image to string it will
4:36
extract the data from the image here you
4:39
can select the language as well so I
4:41
just need the English language and then
4:45
you can pass the configuration as well
4:48
so it's a very simple library for
4:51
extracting data from images then we can
4:53
print out the
4:55
data and then we can save this data into
4:59
a text file for saving the data we can
5:02
open this output file extracted text in
5:06
a write mode and then f.t write data
5:08
that's all so let me delete this and
5:11
rerun this application once
5:13
again so you will see it will print out
5:16
the data from this image file and create
5:20
this text file if you see and uh all
5:23
your data has been successfully
5:25
extracted so we taken this image file
5:27
which has this uh text so we
5:31
successfully extracted all the text and
5:33
save it inside a txt file using this
5:35
package so it's a Google tesseract
5:38
OCR text recognition library so you just
5:42
need to install these two packages to
5:44
actually use this so thank you very much
5:47
for watching this video and also check
5:49
out my website
5:51
freemediatools.com which contains
5:53
thousands of tools
#Programming
#Software