Python Script to Extract Text & Images From Word Document File .DOCX Using docx2txt Library in CMD
323 views
Jun 3, 2025
Get the full source code of application here: https://gist.github.com/gauti123456/99a62fd22ee609de5583d8fd60851250
View Video Transcript
0:00
uh hello guys welcome to this video so
0:02
in this video I'll show you how to
0:04
extract text and images from a word
0:07
document file D O CX file so as you can
0:12
see we have a word document file right
0:14
here and inside this word document file
0:17
we have a set of images we have also a
0:20
set of text as well so I will show you a
0:22
Python script which will actually
0:24
extract all the text which is there
0:26
inside the word document and also all
0:28
these images as well so for this we are
0:30
using a third party package which is
0:34
just go to the terminal and just install
0:36
this package pip install
0:40
doox2 txt so this is actually the
0:43
package and uh just enter and it will
0:46
install this package i've already
0:48
installed this package so now for using
0:50
it simply create a simple Python script
0:53
uh now I will show you
0:58
so let me delete everything so we have
1:01
this app py file and we have this word
1:04
document file right here so now I will
1:08
simply import the package first of all
1:11
so we will import this
1:13
package by using the import line right
1:16
here so we import this after importing
1:19
it we simply actually extract the text
1:22
first of all so this module contains
1:24
this function which is process and here
1:27
you specify the address of the document
1:29
file so it is present in the same
1:31
directory so we are using this function
1:33
and passing the address of the word
1:35
document file so after doing this we
1:37
simply save the text in a file here so
1:43
we open a file with the open function
1:47
and we save the txt and we pass the
1:51
write mode so we create a new file which
1:54
is output.txt txt which will hold all
1:56
the data for us which is extracted and
1:59
then we simply use the encoding type
2:01
which will be UTF8 and uh as f and right
2:08
here we actually call the write
2:10
function
2:12
and so once I execute the script right
2:15
here you will see on the left hand side
2:17
a new file will be created which will be
2:19
output txt and all the text as you can
2:23
see from the word document has been
2:24
extracted and it has been saved inside
2:27
this txt file and now for extracting the
2:30
images as well so as you can see we also
2:32
have the images so there is also a
2:35
function you
2:37
can easily do this after
2:43
this in the second argument right here
2:46
you can actually pass a folder name so
2:49
you can we can create a folder right
2:51
here which will be storing the images
2:53
which will be extracted so right here in
2:55
the second argument we simply pass the
2:58
folder path here so we I simply pass img
3:02
so now what happens if you run the
3:04
python script it will also extract all
3:07
the images and store it inside the image
3:09
folder so as you can see it extracted
3:12
these two images from this word document
3:15
and if I try to open this this is the
3:17
first image and this is the second image
3:20
so this is a really useful Python
3:22
package guys which is actually used to
3:25
extract the text and the images from a
3:28
word document file and as soon as you
3:31
run this you will see it has extract
3:32
extracted all these images which is
3:36
present in the word document so if I
3:38
open it
3:40
inside so you can see it's a image here
3:43
this is the first image this is the
3:44
second image and the rest of all is all
3:47
the text right here so it has
3:50
successfully extracted everything this
3:52
Python script using this module so you
3:55
can actually use this module to extract
3:58
text and images from a word document
4:01
file the first of all install the module
4:05
then import this and then we simply pass
4:07
the word document file and then the
4:10
folder name where the images will be
4:12
stored and then it simply extracts the
4:16
text and the images so in this way you
4:18
can do this and uh thank you very much
4:20
for watching this video and also check
4:22
out my website freemediatools.com
4:25
uh which contains thousands of tools
#Scripting Languages
#Software