How to ingest unstructured PDF files from AWS S3 into Salesforce Data 360?
Jan 12, 2026
Show More Show Less
View Video Transcript
0:06
Hello everyone. In this video, we are
0:09
going to see how to ingest unstructured
0:13
PDF files from Amazon AWS S3 bucket into
0:19
Salesforce data 360 previously called as
0:23
data cloud.
0:25
If you have a requirement where you
0:28
wanted to
0:30
keep data cloud or data 360 updated
0:35
with the unstructured files that are
0:38
stored in S3 bucket then you can make
0:40
use of this video as a reference.
0:43
My use case was whenever a file is
0:46
created, updated or deleted,
0:49
those files should be
0:53
available in data cloud uh in Salesforce
0:59
in UDLO
1:02
unstructured data lake object and then
1:04
it should be available in UDMO
1:07
unstructured data
1:10
model object and then with the help of
1:13
search index and custom retriever we can
1:16
make use of it in agent force or in
1:19
prompt builder
1:21
let's see how to do this setup
1:28
uh please check the video description in
1:30
the video description I have shared my
1:32
blog post from the blog post you should
1:34
be able to get all the steps sample
1:37
commands and also sample um
1:41
configuration file for your reference.
1:45
In this blog post, I have given
1:50
detailed steps and also highle steps. In
1:53
this video, let's go over the highle
1:56
steps. If you are pretty new to Amazon
2:00
AWS and Salesforce data 360, then you
2:03
can make use of the highle uh steps.
2:05
Sorry. Uh the detailed steps if you have
2:08
good understanding of Amazon AWS and
2:11
then Salesforce data 360 then you can
2:14
make use of the highle steps. If you
2:17
scroll down to this blog post
2:21
you should see highle steps.
2:26
The first step is we have to create a S3
2:29
bucket. So I created a S3 bucket with
2:32
the name enterprise level storage. and
2:35
it is in US East Wagayo. So the region
2:39
is US - East - 2.
2:43
Next,
2:45
this bucket should be created in your
2:48
Amazon
2:49
uh account region. So if your Amazon
2:52
account region was in west, then you
2:54
should create the bucket in west region.
2:57
If your Amazon buck um region account
3:00
region is in east then you should create
3:02
the bucket in east uh region. If you
3:05
create in two different uh regions then
3:08
you will get redirect errors and then um
3:11
there are so many complex uh uh
3:13
configurations to uh to achieve it. So
3:15
in order to make it simple try to make
3:17
sure the bucket is in the same location
3:20
of your uh Amazon AWS account. Next go
3:25
to IM in U
3:28
Amazon AWS console and then create an IM
3:32
user. When you create this AM user,
3:36
assign Amazon S3 full access permission
3:38
policy to this user.
3:41
If you want to restrict this access and
3:44
then if you wanted to have a better uh
3:47
um security around providing this access
3:51
then make use of the technical
3:53
recommendations and best practices uh
3:56
section in which I have covered
4:00
how to provide
4:03
um
4:05
secured access to this integration user.
4:12
So this is uh one. So check the least
4:15
privilege principle in this technical
4:17
recommendations and best practices.
4:20
Okay.
4:21
The next step is
4:24
um we have to create AWS S3 connector in
4:28
Salesforce.
4:30
So go to data cloud setup. So we have to
4:33
go to data cloud setup.
4:37
Under external integrations, select
4:40
other connectors. Click new
4:44
and for the source select Amazon S3
4:50
and then you have to enter the access
4:53
key and the secret key which you got
4:55
while creating the user in IM in AWS.
4:59
You will see a button called test
5:01
connection. So click that test
5:03
connection button to test the
5:05
connectivity. You should get a success
5:08
prompt so that the connectivity was uh
5:10
successfully done and also the AWS
5:14
access key and the secret key was
5:15
accurate. Click the save button. Now the
5:19
AWS connector setup is done. The next uh
5:22
thing is we have to create UDL in
5:25
Salesforce data 360. So go open data
5:30
cloud application. Go to data lake
5:32
objects tab. Click the new button.
5:36
Select from external files. Click next.
5:39
Select Amazon S3. You will be able to
5:42
see this option only when this
5:45
particular connector setup was
5:47
successfully done. And make sure this is
5:49
also active.
5:52
Select it. And then you should be able
5:54
to
5:56
configure it. Here
6:00
you have to select the connection
6:04
when you go. Yeah, here you have to
6:05
select the connection which you created
6:07
in this connector. So the connector name
6:09
is AWS S3. So here I'm able to add
6:13
select AWS S3.
6:17
Next in the directory uh field you if
6:20
let's say inside the S3 bucket you have
6:23
folders then you can mention the folder
6:26
names here you can add up to four or
6:28
five um folders I guess it also looks
6:32
into the data um of subdirectory uh
6:35
let's say you have a parent directory
6:37
and then you have subdirectories inside
6:39
that uh um parent directory then it will
6:42
also try to pull all the information
6:43
from uh subdirectory here in the file
6:47
name pattern use star dot PDF because
6:52
the use case for me was to get all the
6:54
PDF files.
6:59
Yeah, you have to just type this
7:02
information without double quotes. Okay.
7:06
And then you have to give a name for uh
7:09
UDLO and as well as for UDMO. Once that
7:12
is done um when you click next button
7:16
you should see uh an option called
7:18
enable semantic search with system
7:21
defaults. You can make use of this to
7:24
create a search index uh in data 360
7:27
this search index uh uh uh is required
7:31
to create a custom retriever. So make
7:33
sure you are enabling this option. Once
7:36
this is done, click save and make sure
7:40
the status of uh
7:43
the data lake is active. When you create
7:46
this UDL,
7:48
Salesforce will create three um UDLs.
7:52
One for the file names, other one is for
7:56
chunk and the other one is for vector
7:58
embedding which is index.
8:03
Okay. Next is so the connectivity is
8:07
done uh in Salesforce uh to connect
8:11
Amazon S3 uh bucket. So that is done and
8:16
you have created UDL and then it it
8:18
would have automatically created UDMO
8:20
since we gave the object API name and
8:22
also the name for the UDMO and the
8:24
mapping would have been done all
8:26
automatically by uh Salesforce. Now
8:31
we have to push the files whenever it is
8:34
created, created, updated or deleted in
8:39
S3 bucket. In order to do that, we have
8:41
to set up file notification pipeline. In
8:45
order to achieve it, we are going to
8:47
create a lambda function which will uh u
8:50
make use of uh um the updated file
8:54
notifications or created or deleted file
8:57
notification and then it will push it to
8:59
Salesforce data 360. Uh in order to uh
9:03
do that install AWS CLA, install jq and
9:08
also open SSL, install all these three
9:11
applications.
9:13
The next step is we have to create
9:15
private public key pair and certificate.
9:19
In order to do that make use of open SSL
9:24
command. So this will create a keypad.pm
9:28
file. So I created a folder called uh
9:31
AWS
9:35
not AWS forert I created a file uh a
9:39
folder called uert
9:45
inside this particular folder using my
9:49
terminal I was able to run this command
9:52
and then it created a PEM file. Next it
9:55
created a CRT file which is assert and
9:59
then it created a private key. So make
10:02
sure you are creating a folder and then
10:05
inside and in the terminal
10:08
uh make sure the path of the terminal is
10:11
uh this particular folder. Run these
10:14
three commands so that it will create PM
10:17
filesert file and a private key file.
10:21
Okay. Once this is done, open these two
10:24
GitHub link.
10:27
This is the installer script. We are
10:29
going to execute it so that it will
10:31
create a lambda function role u S3
10:35
bucket um and also event notification in
10:39
AWS so that whenever a file is created,
10:42
updated or deleted, it will send those
10:44
information to Salesforce data 360.
10:48
this particular AWS lambda function
10:51
contains the lambda function. So if you
10:54
if you are an AWS expert then you can
10:57
look into the code that is uh uh
11:00
developed for this lambda function and
11:02
then you can also do some modifications
11:04
to it. Okay, the next step is we have to
11:08
create a external client app in
11:10
Salesforce. Previously uh we were making
11:13
use of connected app in Salesforce and
11:15
now we have to make use of um external
11:18
client credential app. So here create a
11:20
external client credential app in the
11:22
call back URL make use of your
11:24
Salesforce my domain URL
11:27
in in order to authenticate the
11:29
connectivity between Salesforce uh uh
11:32
data 360 and uh um Amazon S3 we are
11:37
going to make use of this ooth uh in
11:40
order once the oath authorization is
11:42
done we are going to redirect the user
11:46
to a particular URL. So that is the
11:49
redirect URI in uh oath. So that is the
11:53
call back URL. You have to set it up
11:55
here. Make sure you are selecting these
11:57
three oath scopes. One is to manage the
12:00
user data. One is to perform uh uh
12:03
request for refresh token and offline
12:05
access and the other one is data cloud
12:07
injection API data so that it can ingest
12:10
the data from uh um lambda function in
12:14
Amazon S3 into Salesforce data 360. Make
12:19
sure to select JWT bearer uh flow option
12:24
in the certificate. Select the
12:26
certificate which we created. So make
12:28
use of the dosert file which we created
12:31
using the open SSL command.
12:34
Okay. Once that is also done make a note
12:37
of consumer key and secret in Salesforce
12:40
setup. Search for who a open id connect
12:45
settings.
12:49
If you go to that setup,
12:54
you should see
12:57
all oath username password flows.
13:04
You can toggle it on.
13:13
Once that is toggled on,
13:16
now make use of your Salesforce my
13:18
domain URL/services/2
13:21
/ authorize. Um, and we are passing
13:24
multiple parameters here. Response type
13:26
equal to code. Client ID equal to your
13:29
Salesforce consumer key from the
13:31
external client application. scope API
13:34
refresh token and CPI CDP_gest
13:38
API and redirect URI equal to the call
13:40
back URL which we used it
13:45
in the connect in the external client
13:47
credential app
13:50
and then
13:52
um code_ch challenges sa 256. So this is
13:56
the algorithm. Once this is done, you
13:58
will see a prompt with uh three scopes.
14:04
So these three scopes, you should see it
14:07
on the prompt. Make sure you are getting
14:09
only those three pro um scopes not uh
14:12
additional and also make sure you are
14:14
not adding any additional scopes here
14:16
which are not needed. I have seen uh
14:18
people adding full access. Please do not
14:20
do that. It will expose so much of your
14:23
Salesforce information. Um if if someone
14:26
hacks a consumer key and consumer secret
14:29
so make sure you are not doing that. Um
14:34
once it is done once you click allow it
14:37
will redirect to the call back URL that
14:40
was configured and used it in the URL
14:43
also. So it will redirect
14:46
to the call back URL and then it will it
14:48
will authorize. Once that is done
14:52
unzip the S3 file notification
14:54
installation script. So this is the
14:56
installation script. Um you would have
14:59
downloaded it from GitHub.
15:02
Unzip it. Move it to a folder. So I
15:05
created a folder called AWS. And uh if
15:09
you go to documents, if I go to AWS, I
15:12
should see um so these are all generated
15:16
ones. So you should see three files
15:18
input parameters, setup
15:22
s3 and setup s3. So these three files
15:25
you should see. Now you have to update
15:27
this input parameters
15:30
s3 confile. This is a very crucial step.
15:34
Um so here
15:38
I am I made use of visual studio code uh
15:41
to update uh this
15:42
input_parameters_s3.com
15:45
file. Here enter use your username
15:49
and make use of your Salesforce my
15:50
domain URL. Since I have um uh directly
15:55
connected it with my developer edition
15:57
or I'm making use of my
15:59
login.salforce.com URL here. Use your
16:02
12digit AWS account ID. Make sure the
16:05
region is accurate. Uh make use of the
16:10
uh S3 bucket API name here. Um let's say
16:15
you have a folder inside your uh S3
16:18
bucket, then you should make use of that
16:20
particular folder. Uh I don't have any
16:22
folder inside it, so I left it blank. So
16:25
it makes use of the entire S3 bucket
16:28
access.
16:30
Um next
16:32
uh here I I created I used this um uh
16:35
name like AWS - S3 - SF.Lambda. So
16:40
Salesforce will create a um S3 bucket
16:44
with this name uh in your uh Amazon uh
16:47
uh console if this particular uh bucket
16:50
is not already available and then it
16:52
will put the GitHub
16:56
lambda function.zip file which we
16:58
downloaded. So the same file will be
17:00
uploaded in uh uh this particular S3
17:03
bucket. Um
17:07
you can leave this uh um uh key as
17:11
blank. It is not u required. Next here
17:15
you have to mention where is the source
17:18
for the
17:19
um AWS lambda_unction.zip
17:23
file. Uh this is the file we would have
17:25
downloaded it uh for the file
17:27
notification pipeline. Um so this is the
17:29
role name I have defined and this is the
17:32
lambda function. Um so when the lambda
17:35
function is created the name of the
17:36
lambda function will be this one. If you
17:38
go to lambda in uh Amazon console if you
17:41
search for this lambda function you
17:43
should see it once the script executed
17:45
successfully. Uh next uh this is the key
17:50
that will be used in the lambda function
17:52
uh to store your consumer key of AWS
17:55
sorry Salesforce and this is the
17:57
consumer key uh Salesforce consumer key
18:00
for the from the external client
18:02
credential app. Uh next in order to
18:05
store the private key uh this is the um
18:08
private key uh name uh I have set it up.
18:12
These are all something um uh the naming
18:15
convention I have used. So Salesforce
18:18
AWS S3 and iPhone RSA private key. Uh so
18:22
I made use of the example from
18:24
Salesforce and then I created it. Uh and
18:27
for the PM file you have to mention the
18:30
exact location where the keeper.pm
18:34
file um is stored. Uh this file was
18:37
generated with the help of open SSL
18:40
command.
18:41
Okay. So, make sure this
18:44
input_parameters_s3.com
18:47
file is configured without any errors.
18:50
If you configure with any errors, then
18:53
uh the installation script will fail and
18:56
then the lambda function creation and uh
18:59
the policy creation, role creation,
19:00
everything will uh fail. So, make sure
19:02
it is um accurate. Okay. Once uh that is
19:07
done now you have to in in your terminal
19:11
you have to change the path to the
19:13
directory where those three files are.
19:16
So this is the folder where um I have
19:19
stored all the three uh files. So in uh
19:23
make sure you are in this particular
19:26
folder in your terminal using the cd
19:28
command change directory command. Once
19:30
that is done, make use of this command.
19:33
Populate with your account access key,
19:36
secret key, your uh session token and
19:38
also your AWS region. Run this command
19:41
in your terminal so that your uh AWS uh
19:46
uh environment variables are set. Once
19:48
that is done, you have to run this
19:50
command.
19:52
This command will take few minutes to
19:54
complete. Make sure all the steps are
19:57
completed. Once it is completed then you
20:01
should see a uh S3 bucket with the name
20:05
that was mentioned here.
20:09
Okay. No no not here.
20:16
Yeah. So this is the bucket name where
20:20
the files are stored.
20:23
You should see a lambda function with
20:24
this name.
20:26
And also you should see
20:29
this particular Oh, okay. So this is the
20:32
bucket name I have used. So this is the
20:36
bucket name. So this bucket you should
20:39
see it and inside that you should see
20:43
AWS lambda_unction.zip.
20:46
So this is the file uh which contains
20:48
the lambda function code. So Salesforce
20:50
would have created this for you. Okay.
20:53
And the next thing is
20:58
you have to create a uh retriever. So go
21:01
to Einstein studio tab, go to retriever,
21:05
click new and then you should be able to
21:08
create a retriever.
21:10
Make use of the data model object and
21:14
also the search index which we
21:16
configured. Once that is done, you can
21:19
create a simple prompt template. In
21:23
order to verify it, I created a flex
21:25
prompt template and then I was able to
21:27
verify it.
21:34
So here
21:37
you have to use
21:51
Yeah. So I'm making use of the input uh
21:54
string against the retriever to respond
21:58
to the customer.
22:00
So in the input
22:09
Yeah. So in the input the information
22:12
what I passed was change the
22:14
advertisement plan. So this is the
22:17
information that is passed to search
22:19
against this particular retriever. So
22:21
the retriever was able to fetch all this
22:23
information from the file and then it
22:25
was able to generate a nice response
22:27
back to the user.
22:35
I have shared the
22:36
input_parameters_s3.com
22:40
file for your reference. Um I had hard
22:43
time in configuring this because I'm not
22:45
an AWS expert. Um so if you run into any
22:49
issues or if you wanted to recreate
22:52
things, go to the S3 bucket.
22:59
You should see an event notification
23:02
created for the S3 bucket.
23:05
So if you go to the properties,
23:11
you should see an event notification.
23:13
You can delete it and then you can rerun
23:16
this particular uh
23:19
command again and again.
23:21
So once this is uh uh done successfully,
23:26
if you create a file, if you update a
23:29
file and also if you delete a file, the
23:32
chunking whatever happened in Salesforce
23:35
in data 360 will be up to date. I um I
23:40
tested both uh
23:43
file creation or file upload and also
23:46
file uh deletion. it was able to sync it
23:49
properly with the data 360 and then I
23:52
was able to create a uh prompt template
23:55
and then I was able to verify it.
23:58
If you go to lambda in Salesforce, you
24:01
should see a lambda with the name that
24:04
was mentioned
24:06
here. So this is a lambda function name.
24:09
If you are a uh AWS expert then you
24:12
should you can make use of uh the
24:15
monitor tab. You can click view cloud
24:17
watch logs and then you should be able
24:19
to troubleshoot it. I had some errors
24:23
with the help of
24:26
um with the help of uh failed invocation
24:30
execution error. I was able to debug it
24:33
and then I was able to fix it.
24:36
So if you upload or uh if you delete a
24:38
file and then if the data is not updated
24:41
uh in Salesforce data 360, please make
24:44
use of
24:46
view uh cloud watch logs which will be
24:48
very very helpful.
24:53
I hope it was helpful.
25:00
Thank you for watching.
#Internet Software
