
videocam_off
This livestream is currently offline
Check back later when the stream goes live
How to ingest unstructured PDF files from AWS S3 into Salesforce Data 360?
Jan 12, 2026
Show More Show Less
View Video Transcript
0:06
Hello everyone. In this video, we are
0:09
going to see how to ingest unstructured
0:13
PDF files from Amazon AWS S3 bucket into
0:19
Salesforce data 360 previously called as
0:23
data cloud.
0:25
If you have a requirement where you
0:28
wanted to
0:30
keep data cloud or data 360 updated
0:35
with the unstructured files that are
0:38
stored in S3 bucket then you can make
0:40
use of this video as a reference.
0:43
My use case was whenever a file is
0:46
created, updated or deleted,
0:49
those files should be
0:53
available in data cloud uh in Salesforce
0:59
in UDLO
1:02
unstructured data lake object and then
1:04
it should be available in UDMO
1:07
unstructured data
1:10
model object and then with the help of
1:13
search index and custom retriever we can
1:16
make use of it in agent force or in
1:19
prompt builder
1:21
let's see how to do this setup
1:28
uh please check the video description in
1:30
the video description I have shared my
1:32
blog post from the blog post you should
1:34
be able to get all the steps sample
1:37
commands and also sample um
1:41
configuration file for your reference.
1:45
In this blog post, I have given
1:50
detailed steps and also highle steps. In
1:53
this video, let's go over the highle
1:56
steps. If you are pretty new to Amazon
2:00
AWS and Salesforce data 360, then you
2:03
can make use of the highle uh steps.
2:05
Sorry. Uh the detailed steps if you have
2:08
good understanding of Amazon AWS and
2:11
then Salesforce data 360 then you can
2:14
make use of the highle steps. If you
2:17
scroll down to this blog post
2:21
you should see highle steps.
2:26
The first step is we have to create a S3
2:29
bucket. So I created a S3 bucket with
2:32
the name enterprise level storage. and
2:35
it is in US East Wagayo. So the region
2:39
is US - East - 2.
2:43
Next,
2:45
this bucket should be created in your
2:48
Amazon
2:49
uh account region. So if your Amazon
2:52
account region was in west, then you
2:54
should create the bucket in west region.
2:57
If your Amazon buck um region account
3:00
region is in east then you should create
3:02
the bucket in east uh region. If you
3:05
create in two different uh regions then
3:08
you will get redirect errors and then um
3:11
there are so many complex uh uh
3:13
configurations to uh to achieve it. So
3:15
in order to make it simple try to make
3:17
sure the bucket is in the same location
3:20
of your uh Amazon AWS account. Next go
3:25
to IM in U
3:28
Amazon AWS console and then create an IM
3:32
user. When you create this AM user,
3:36
assign Amazon S3 full access permission
3:38
policy to this user.
3:41
If you want to restrict this access and
3:44
then if you wanted to have a better uh
3:47
um security around providing this access
3:51
then make use of the technical
3:53
recommendations and best practices uh
3:56
section in which I have covered
4:00
how to provide
4:03
um
4:05
secured access to this integration user.
4:12
So this is uh one. So check the least
4:15
privilege principle in this technical
4:17
recommendations and best practices.
4:20
Okay.
4:21
The next step is
4:24
um we have to create AWS S3 connector in
4:28
Salesforce.
4:30
So go to data cloud setup. So we have to
4:33
go to data cloud setup.
4:37
Under external integrations, select
4:40
other connectors. Click new
4:44
and for the source select Amazon S3
4:50
and then you have to enter the access
4:53
key and the secret key which you got
4:55
while creating the user in IM in AWS.
4:59
You will see a button called test
5:01
connection. So click that test
5:03
connection button to test the
5:05
connectivity. You should get a success
5:08
prompt so that the connectivity was uh
5:10
successfully done and also the AWS
5:14
access key and the secret key was
5:15
accurate. Click the save button. Now the
5:19
AWS connector setup is done. The next uh
5:22
thing is we have to create UDL in
5:25
Salesforce data 360. So go open data
5:30
cloud application. Go to data lake
5:32
objects tab. Click the new button.
5:36
Select from external files. Click next.
5:39
Select Amazon S3. You will be able to
5:42
see this option only when this
5:45
particular connector setup was
5:47
successfully done. And make sure this is
5:49
also active.
5:52
Select it. And then you should be able
5:54
to
5:56
configure it. Here
6:00
you have to select the connection
6:04
when you go. Yeah, here you have to
6:05
select the connection which you created
6:07
in this connector. So the connector name
6:09
is AWS S3. So here I'm able to add
6:13
select AWS S3.
6:17
Next in the directory uh field you if
6:20
let's say inside the S3 bucket you have
6:23
folders then you can mention the folder
6:26
names here you can add up to four or
6:28
five um folders I guess it also looks
6:32
into the data um of subdirectory uh
6:35
let's say you have a parent directory
6:37
and then you have subdirectories inside
6:39
that uh um parent directory then it will
6:42
also try to pull all the information
6:43
from uh subdirectory here in the file
6:47
name pattern use star dot PDF because
6:52
the use case for me was to get all the
6:54
PDF files.
6:59
Yeah, you have to just type this
7:02
information without double quotes. Okay.
7:06
And then you have to give a name for uh
7:09
UDLO and as well as for UDMO. Once that
7:12
is done um when you click next button
7:16
you should see uh an option called
7:18
enable semantic search with system
7:21
defaults. You can make use of this to
7:24
create a search index uh in data 360
7:27
this search index uh uh uh is required
7:31
to create a custom retriever. So make
7:33
sure you are enabling this option. Once
7:36
this is done, click save and make sure
7:40
the status of uh
7:43
the data lake is active. When you create
7:46
this UDL,
7:48
Salesforce will create three um UDLs.
7:52
One for the file names, other one is for
7:56
chunk and the other one is for vector
7:58
embedding which is index.
8:03
Okay. Next is so the connectivity is
8:07
done uh in Salesforce uh to connect
8:11
Amazon S3 uh bucket. So that is done and
8:16
you have created UDL and then it it
8:18
would have automatically created UDMO
8:20
since we gave the object API name and
8:22
also the name for the UDMO and the
8:24
mapping would have been done all
8:26
automatically by uh Salesforce. Now
8:31
we have to push the files whenever it is
8:34
created, created, updated or deleted in
8:39
S3 bucket. In order to do that, we have
8:41
to set up file notification pipeline. In
8:45
order to achieve it, we are going to
8:47
create a lambda function which will uh u
8:50
make use of uh um the updated file
8:54
notifications or created or deleted file
8:57
notification and then it will push it to
8:59
Salesforce data 360. Uh in order to uh
9:03
do that install AWS CLA, install jq and
9:08
also open SSL, install all these three
9:11
applications.
9:13
The next step is we have to create
9:15
private public key pair and certificate.
9:19
In order to do that make use of open SSL
9:24
command. So this will create a keypad.pm
9:28
file. So I created a folder called uh
9:31
AWS
9:35
not AWS forert I created a file uh a
9:39
folder called uert
9:45
inside this particular folder using my
9:49
terminal I was able to run this command
9:52
and then it created a PEM file. Next it
9:55
created a CRT file which is assert and
9:59
then it created a private key. So make
10:02
sure you are creating a folder and then
10:05
inside and in the terminal
10:08
uh make sure the path of the terminal is
10:11
uh this particular folder. Run these
10:14
three commands so that it will create PM
10:17
filesert file and a private key file.
10:21
Okay. Once this is done, open these two
10:24
GitHub link.
10:27
This is the installer script. We are
10:29
going to execute it so that it will
10:31
create a lambda function role u S3
10:35
bucket um and also event notification in
10:39
AWS so that whenever a file is created,
10:42
updated or deleted, it will send those
10:44
information to Salesforce data 360.
10:48
this particular AWS lambda function
10:51
contains the lambda function. So if you
10:54
if you are an AWS expert then you can
10:57
look into the code that is uh uh
11:00
developed for this lambda function and
11:02
then you can also do some modifications
11:04
to it. Okay, the next step is we have to
11:08
create a external client app in
11:10
Salesforce. Previously uh we were making
11:13
use of connected app in Salesforce and
11:15
now we have to make use of um external
11:18
client credential app. So here create a
11:20
external client credential app in the
11:22
call back URL make use of your
11:24
Salesforce my domain URL
11:27
in in order to authenticate the
11:29
connectivity between Salesforce uh uh
11:32
data 360 and uh um Amazon S3 we are
11:37
going to make use of this ooth uh in
11:40
order once the oath authorization is
11:42
done we are going to redirect the user
11:46
to a particular URL. So that is the
11:49
redirect URI in uh oath. So that is the
11:53
call back URL. You have to set it up
11:55
here. Make sure you are selecting these
11:57
three oath scopes. One is to manage the
12:00
user data. One is to perform uh uh
12:03
request for refresh token and offline
12:05
access and the other one is data cloud
12:07
injection API data so that it can ingest
12:10
the data from uh um lambda function in
12:14
Amazon S3 into Salesforce data 360. Make
12:19
sure to select JWT bearer uh flow option
12:24
in the certificate. Select the
12:26
certificate which we created. So make
12:28
use of the dosert file which we created
12:31
using the open SSL command.
12:34
Okay. Once that is also done make a note
12:37
of consumer key and secret in Salesforce
12:40
setup. Search for who a open id connect
12:45
settings.
12:49
If you go to that setup,
12:54
you should see
12:57
all oath username password flows.
13:04
You can toggle it on.
13:13
Once that is toggled on,
13:16
now make use of your Salesforce my
13:18
domain URL/services/2
13:21
/ authorize. Um, and we are passing
13:24
multiple parameters here. Response type
13:26
equal to code. Client ID equal to your
13:29
Salesforce consumer key from the
13:31
external client application. scope API
13:34
refresh token and CPI CDP_gest
13:38
API and redirect URI equal to the call
13:40
back URL which we used it
13:45
in the connect in the external client
13:47
credential app
13:50
and then
13:52
um code_ch challenges sa 256. So this is
13:56
the algorithm. Once this is done, you
13:58
will see a prompt with uh three scopes.
14:04
So these three scopes, you should see it
14:07
on the prompt. Make sure you are getting
14:09
only those three pro um scopes not uh
14:12
additional and also make sure you are
14:14
not adding any additional scopes here
14:16
which are not needed. I have seen uh
14:18
people adding full access. Please do not
14:20
do that. It will expose so much of your
14:23
Salesforce information. Um if if someone
14:26
hacks a consumer key and consumer secret
14:29
so make sure you are not doing that. Um
14:34
once it is done once you click allow it
14:37
will redirect to the call back URL that
14:40
was configured and used it in the URL
14:43
also. So it will redirect
14:46
to the call back URL and then it will it
14:48
will authorize. Once that is done
14:52
unzip the S3 file notification
14:54
installation script. So this is the
14:56
installation script. Um you would have
14:59
downloaded it from GitHub.
15:02
Unzip it. Move it to a folder. So I
15:05
created a folder called AWS. And uh if
15:09
you go to documents, if I go to AWS, I
15:12
should see um so these are all generated
15:16
ones. So you should see three files
15:18
input parameters, setup
15:22
s3 and setup s3. So these three files
15:25
you should see. Now you have to update
15:27
this input parameters
15:30
s3 confile. This is a very crucial step.
15:34
Um so here
15:38
I am I made use of visual studio code uh
15:41
to update uh this
15:42
input_parameters_s3.com
15:45
file. Here enter use your username
15:49
and make use of your Salesforce my
15:50
domain URL. Since I have um uh directly
15:55
connected it with my developer edition
15:57
or I'm making use of my
15:59
login.salforce.com URL here. Use your
16:02
12digit AWS account ID. Make sure the
16:05
region is accurate. Uh make use of the
16:10
uh S3 bucket API name here. Um let's say
16:15
you have a folder inside your uh S3
16:18
bucket, then you should make use of that
16:20
particular folder. Uh I don't have any
16:22
folder inside it, so I left it blank. So
16:25
it makes use of the entire S3 bucket
16:28
access.
16:30
Um next
16:32
uh here I I created I used this um uh
16:35
name like AWS - S3 - SF.Lambda. So
16:40
Salesforce will create a um S3 bucket
16:44
with this name uh in your uh Amazon uh
16:47
uh console if this particular uh bucket
16:50
is not already available and then it
16:52
will put the GitHub
16:56
lambda function.zip file which we
16:58
downloaded. So the same file will be
17:00
uploaded in uh uh this particular S3
17:03
bucket. Um
17:07
you can leave this uh um uh key as
17:11
blank. It is not u required. Next here
17:15
you have to mention where is the source
17:18
for the
17:19
um AWS lambda_unction.zip
17:23
file. Uh this is the file we would have
17:25
downloaded it uh for the file
17:27
notification pipeline. Um so this is the
17:29
role name I have defined and this is the
17:32
lambda function. Um so when the lambda
17:35
function is created the name of the
17:36
lambda function will be this one. If you
17:38
go to lambda in uh Amazon console if you
17:41
search for this lambda function you
17:43
should see it once the script executed
17:45
successfully. Uh next uh this is the key
17:50
that will be used in the lambda function
17:52
uh to store your consumer key of AWS
17:55
sorry Salesforce and this is the
17:57
consumer key uh Salesforce consumer key
18:00
for the from the external client
18:02
credential app. Uh next in order to
18:05
store the private key uh this is the um
18:08
private key uh name uh I have set it up.
18:12
These are all something um uh the naming
18:15
convention I have used. So Salesforce
18:18
AWS S3 and iPhone RSA private key. Uh so
18:22
I made use of the example from
18:24
Salesforce and then I created it. Uh and
18:27
for the PM file you have to mention the
18:30
exact location where the keeper.pm
18:34
file um is stored. Uh this file was
18:37
generated with the help of open SSL
18:40
command.
18:41
Okay. So, make sure this
18:44
input_parameters_s3.com
18:47
file is configured without any errors.
18:50
If you configure with any errors, then
18:53
uh the installation script will fail and
18:56
then the lambda function creation and uh
18:59
the policy creation, role creation,
19:00
everything will uh fail. So, make sure
19:02
it is um accurate. Okay. Once uh that is
19:07
done now you have to in in your terminal
19:11
you have to change the path to the
19:13
directory where those three files are.
19:16
So this is the folder where um I have
19:19
stored all the three uh files. So in uh
19:23
make sure you are in this particular
19:26
folder in your terminal using the cd
19:28
command change directory command. Once
19:30
that is done, make use of this command.
19:33
Populate with your account access key,
19:36
secret key, your uh session token and
19:38
also your AWS region. Run this command
19:41
in your terminal so that your uh AWS uh
19:46
uh environment variables are set. Once
19:48
that is done, you have to run this
19:50
command.
19:52
This command will take few minutes to
19:54
complete. Make sure all the steps are
19:57
completed. Once it is completed then you
20:01
should see a uh S3 bucket with the name
20:05
that was mentioned here.
20:09
Okay. No no not here.
20:16
Yeah. So this is the bucket name where
20:20
the files are stored.
20:23
You should see a lambda function with
20:24
this name.
20:26
And also you should see
20:29
this particular Oh, okay. So this is the
20:32
bucket name I have used. So this is the
20:36
bucket name. So this bucket you should
20:39
see it and inside that you should see
20:43
AWS lambda_unction.zip.
20:46
So this is the file uh which contains
20:48
the lambda function code. So Salesforce
20:50
would have created this for you. Okay.
20:53
And the next thing is
20:58
you have to create a uh retriever. So go
21:01
to Einstein studio tab, go to retriever,
21:05
click new and then you should be able to
21:08
create a retriever.
21:10
Make use of the data model object and
21:14
also the search index which we
21:16
configured. Once that is done, you can
21:19
create a simple prompt template. In
21:23
order to verify it, I created a flex
21:25
prompt template and then I was able to
21:27
verify it.
21:34
So here
21:37
you have to use
21:51
Yeah. So I'm making use of the input uh
21:54
string against the retriever to respond
21:58
to the customer.
22:00
So in the input
22:09
Yeah. So in the input the information
22:12
what I passed was change the
22:14
advertisement plan. So this is the
22:17
information that is passed to search
22:19
against this particular retriever. So
22:21
the retriever was able to fetch all this
22:23
information from the file and then it
22:25
was able to generate a nice response
22:27
back to the user.
22:35
I have shared the
22:36
input_parameters_s3.com
22:40
file for your reference. Um I had hard
22:43
time in configuring this because I'm not
22:45
an AWS expert. Um so if you run into any
22:49
issues or if you wanted to recreate
22:52
things, go to the S3 bucket.
22:59
You should see an event notification
23:02
created for the S3 bucket.
23:05
So if you go to the properties,
23:11
you should see an event notification.
23:13
You can delete it and then you can rerun
23:16
this particular uh
23:19
command again and again.
23:21
So once this is uh uh done successfully,
23:26
if you create a file, if you update a
23:29
file and also if you delete a file, the
23:32
chunking whatever happened in Salesforce
23:35
in data 360 will be up to date. I um I
23:40
tested both uh
23:43
file creation or file upload and also
23:46
file uh deletion. it was able to sync it
23:49
properly with the data 360 and then I
23:52
was able to create a uh prompt template
23:55
and then I was able to verify it.
23:58
If you go to lambda in Salesforce, you
24:01
should see a lambda with the name that
24:04
was mentioned
24:06
here. So this is a lambda function name.
24:09
If you are a uh AWS expert then you
24:12
should you can make use of uh the
24:15
monitor tab. You can click view cloud
24:17
watch logs and then you should be able
24:19
to troubleshoot it. I had some errors
24:23
with the help of
24:26
um with the help of uh failed invocation
24:30
execution error. I was able to debug it
24:33
and then I was able to fix it.
24:36
So if you upload or uh if you delete a
24:38
file and then if the data is not updated
24:41
uh in Salesforce data 360, please make
24:44
use of
24:46
view uh cloud watch logs which will be
24:48
very very helpful.
24:53
I hope it was helpful.
25:00
Thank you for watching.
#Internet Software
