Create AWS S3 Bucket and Read Parquet File from Fabric Notebook

Name: Create AWS S3 Bucket and Read Parquet File from Fabric Notebook | Open Video
Uploaded: 2024-01-29T16:00:51+00:00
Duration: 16 min 39 s
Description: In this video, I covered how to get started with AWS S3, create S3 bucket, access key, secret key and read data in Fabric Notebook into the AWS S3.

0:00
Hello everyone and compliments of the season
0:12
In this video, I'm going to show you how to read a Leak House Backup file into Amazon
0:19
S3 Web Service using the Fabric Notebook. This is going to be a comprehensive end-to-end project because I'm going to walk you through
0:28
what is Amazon S3, how you can create buckets, create access keys, secret keys and of course
0:35
how you can read the data from Fabric Notebook into the S3 Web Service and of course how
0:41
you can query the data in the S3 query window. So let's get started
0:48
Now what is Amazon S3? Now the S3 simply means Simple Storage Service and it is a scalable object storage service
0:56
that is offered as part of the Amazon Web Services and it allows you to store and retrieve
1:01
any amount of data at any time from any location on the web
1:07
The S3 is designed to be a highly durable, available and scalable service to store data
1:13
or form data backup, archive your data, distribute content and of course it serves as a data lake
1:20
Using the Amazon Web Service, we're going to create what's called buckets. Now buckets are basically containers for storing any kind of objects like the PDF
1:28
like the parquet file, CSV, PNG or whatever file you have. Enough of talking, let's get started
1:36
I am currently in the console.aws.amazon.com and of course we can click on the services
1:44
to see all the services in the AWS or I can even search the S3 in the search bar and then
1:51
click on this S3 service. For the first time, we're going to see this Amazon S3 store and retrieve any amount of
1:58
data from anywhere because I do not have any container buckets created
2:04
So I'm going to click on this create buckets that can store different kind of objects and
2:09
of course we can specify the AWS region in the general configuration
2:15
I'm going to scroll down to, there we go, EU01 and of course we can choose the bucket
2:24
type, we can choose the general purpose or directory. I'm going to go with the general purpose which is the recommended and I'm going to scroll down
2:32
Now for the bucket name, you can see this kind of example, we can't use uppercase as
2:37
a bucket name. It's not going to allow that. So I'm going to type in sales data 1234 just to make it more unique and of course you can
2:48
even select a bucket in S3 if we have any existing bucket but I do not have any bucket
2:53
so I'm not going to choose this and let's scroll down. For the object ownership, now we can specify the access control levels, okay so this is
3:01
going to be disabled, ACL and let's scroll down. Now for the block public access settings for this bucket, we're going to block all the
3:09
public access just for now. For the bucket version, I'm going to click on enable and this allows me to get any objects
3:17
that has been deleted in a bucket so we can see version is a means of keeping multiple
3:21
variants of an object in the same bucket. So enabled and of course we can optionally specify tags and for the default encryption
3:28
we want to use the default server-side encryption with Amazon S3 managed keys and let's go down
3:35
and of course for the advanced settings, we don't need to do anything here for now, let's
3:40
just go down and click on create bucket, okay I'm going to go up, okay I can see that bucket
3:48
with the same name already exists. Now I do not have any bucket with the same name but I don't know why
3:55
I'm just going to type in 12 and then sales data 12, oh okay I'm still seeing the same
4:00
error or let's use 007, let me click out and this is accepted sales data 007 and then
4:08
let's click on create bucket, successfully created bucket sales data 007 and that is
4:15
super amazing. So we can see the name of the bucket sales data 007, we can see the AWS region in EU
4:22
01 covering Stockholm and some other places. So for the access, we can see bucket and objects not public and that is fine for now
4:29
I can click on the bucket name and then we can see different kind of parameters like
4:35
the object, the properties, the permissions, the metrics, management and the access points
4:42
Now what I'm going to do is to go ahead and create what's called access key and secret
4:47
key in the identity and access management service. So I'm going to come here, I'm going to type in AIM and then I'm going to click on this
4:56
Now in the identity and access management AIM service, we can see the access management
5:02
and we can see different kind of the dashboard rather, the AIM dashboard, we can see the
5:07
security recommendations and so on. Now I want to click on users, I want to create a new user and click on create user
5:16
Now I'm going to call it my name, Abiola Dede, so this is going to be the username
5:21
Now I'm not going to provide user access to the AWS management consult
5:26
So I'm going to go ahead and click on next. And then for the permissions options, under the set permissions, now we can add to a group
5:34
we can copy permission from a user to this particular new user
5:38
We can even attach individual direct policies. So I'm going to choose this attach policies directly and then I'm going to apply these policies
5:48
I'm going to click on this administrative access Amplify. And I think let me just grant, okay, device setup
5:54
I think that is fine for this user. I'm going to scroll down, then click on next
5:59
And then we can see the review and create. So this is the username, the console password type, none, require password reset, none
6:10
And of course we can see the permission summary. So scroll down and click on create user
6:17
And there we go, Abiola Dede user created with a few errors. That is fine
6:21
There's no problem. Now I'm going to click on the user we just created
6:25
And then in the user, we're going to create what's called the access key
6:29
So I'm going to click on this create access key. And of course we can specify the access key
6:35
We can choose the use case for command line interface CLI against the local code application
6:39
running on AWS compute service, third party service. We can even click on or choose application running outside AWS
6:47
I'm going to choose this third party service. We plan to use this access key to enable access for a third party application or service that
6:53
monitors or manage your AWS resources. So I'm going to scroll down and I'm going to click on this
7:00
I understand above recommendation. I want to proceed to create an access key
7:04
So click on next. And then for the description tag, let's just call it, you know, access, go ahead and create
7:13
access key. Okay. So you can see this is the only time that the secret access can be viewed or downloaded
7:21
So what I'm going to do is I'm going to click on this, the secret code. So I'm not going to click on it anyway, but this is the access key
7:28
So what I'm going to do is click on this download.csv. So I'm going to download to my personal PC
7:32
There we go. AbuelaDavidAccessKey.csv. That's fine. And go ahead and click done. Okay
7:39
So you can see the access key one created, and of course it is now active
7:44
What I'm going to do finally for now, I'm going to click on sales data to investigate
7:49
Now you can see in this case, we do not have any object
7:52
So let's head over to the Fabric Notebook. Now basically in this Fabric Notebook workspace, we can see we have this AWS S3 leak house
8:04
I'm going to click on it. And of course we can see we have this file uploaded sales data.csv
8:09
I can click on it to investigate. So we can see the sales data
8:14
And of course, I'm going to click on this file. So what I did just to click on these three ellipses, and of course just load to the existing
8:23
table, this particular sales data. Okay, that's fine. I'm going to open a notebook
8:28
So click on new notebook. Now we need to install two libraries
8:38
The first one is the Boto3, which allows us to interact with AWS services and also S3FS
8:45
to interact specifically with the S3 service in AWS. So let's do pip install Boto3 and control enter to run the cell
8:57
While that is doing its job, I'm going to click to add a new cell
9:01
Now I'm going to install pip install. Now I'm going to install the S3FS
9:10
And let's wait for this to finish its job. Okay, there we go
9:15
So it has been installed. And I'm going to come to the pip install
9:19
Okay, so you can see command executed in few seconds. So I'm going to scroll down
9:25
First we need to import the Boto3 and of course we want to import pandas and then SPD
9:30
So import Boto3. Okay, and then we'll import pandas as SPD
9:40
So let's run this cell and let's see. There we go. So command executed in 312 milliseconds and that is amazing
9:49
So let's click on a new cell. Now we'll initialize the instance of the Boto3 and then we'll see the resource
9:55
So we'll specify the service name, the region name, the AWS access key ID, AWS secret access key
10:02
So I'm going to just give the variable AWS S3. Okay, S3
10:09
Okay, and then we'll initialize the Boto3.resource and then open the brackets
10:16
Now we'll specify the name of the service. So let's call this one service underscore name
10:24
And that's going to be S3, right? Amazon S3 and then put in a comma
10:28
And then we'll provide the region, region underscore name. And that has to be, let me just double check, that should be EU01
10:39
So EU iPhone North, iPhone 1 and then put in a comma
10:46
Now we need to provide the AWS underscore access, access key ID
10:54
So, and there we go. So we can see the AWS access key ID
10:59
Now specify the AWS underscore secret underscore access key. So AWS underscore secret underscore access key
11:10
And that has to be inside a single code. Okay, so there we go
11:17
Now there we go. So we can see the access service name, the region, the access key ID
11:23
and of course the secret access key ID. So let's go ahead and control enter to run the cell
11:29
There we go. So command executed successfully. Now, the next thing we need to do is to read the parquet file
11:36
into pd.region underscore parquet. So I'm going to come to the sales data, okay
11:41
And click on this horizontal ellipses. And I want to copy the path
11:46
So copy that and let's add a new cell. Now I'm just going to do df equals to pd.read underscore parquet
11:54
And of course, inside double quote, I'm going to control V extension
12:00
and then click enter. Now we need to go ahead and store the data frame into parquet extension
12:07
So I'm going to do df.to underscore parquet. And then inside open and close brackets, now single quote
12:15
I'm going to call this one sales data dot parquet. Okay, so let's check it out
12:21
So df.to underscore parquet. Okay, this is fine. Control enter. And let's see
12:29
There we go. So command executed in just two seconds and 490 milliseconds
12:35
Now I'm going to scroll down. So the final thing we need to do is to go ahead and upload this particular
12:41
sales data dot parquet into the AWS. So I'm going to call the AWS S3 that we defined
12:49
And if we want to access the bucket, so inside open and close brackets
12:54
I'm going to specify the name of the bucket, which is sales 007
13:00
So sales data, sales data 007, 007. Okay. And then we'll use the dot upload underscore file
13:11
And of course, we'll actually specify the file name. So file name, that's most equal to this sales data dot parquet
13:20
Let me just copy it. And inside single quotes, control V. And of course, we want to specify the key
13:28
So this is key. And inside that has to equal to inside single quote
13:33
Let's just call it sales data. So let's go through it again
13:38
So first we specify the bucket. And then we specify the name of the bucket
13:43
And then we use the dot upload file. And of course, we specify the file name, which is sales data dot parquet
13:50
And then the key, which is sales data. So this is going to be what's going to be seen as the key in the S3
13:56
So let's go ahead and control enter to run the cell. There we go
14:02
So command executed in 879 milliseconds by Abiola Abiola. And then we can go to the S3
14:10
Now, I'm just going to go ahead and refresh the page. Now, this is the moment of truth
14:16
I'm going to click on the sales data 007 bucket. And there we go
14:21
Sales data last modified December 21st, 2023 by 1838. And that's exactly the time we are
14:29
And that is super cool. And of course, you can see the size for 81.7 kilobytes
14:35
And this is super amazing. And of course, you can see the storage class is standard
14:39
I'm going to click on the object, the sales data. And of course, you can see the properties, the permission, the version
14:46
So you can see the object overview, owner, AWS region, the last modified date, size
14:52
We can see the S3 URI and, of course, the object URL
14:56
And, of course, you can see the key. So this is exactly what we specified, the sales data
15:00
If you don't forget, they've got this particular key. So you can see the key here
15:04
This is super amazing. And finally, I'm going to click on this object actions
15:09
And I'm going to click on this query with S3 select. So for the input settings, now we can see the path and, of course, the SAS
15:17
Now I'm going to choose, because this is actually a package, I'm going to choose a patch package
15:22
And for the compression, this is not supported. That's fine. I'm going to scroll down
15:28
And then I can select the output settings. So for the output, you can choose the CSV or JSON
15:33
I can even choose the CSV delimiter. So that is fine. Just go ahead and scroll down
15:39
Now this is going to be select star from S3 object. And this is S as an alias
15:46
I'm going to see the limits or the first 10 records. I'm going to scroll down
15:52
I can see in the results query there's nothing to display. I'm going to click on run SQL query
15:59
Let's see. Successfully returned 10 records in 2045 milliseconds. I'm going to scroll down
16:06
I can see the raw data. I'm going to choose the formatted. And there we go
16:10
So we can see the data. So this is super amazing. So this is the end-to-end project on how to create S3 buckets
16:19
how we can create access key, secret key. We can read data from Fabric Boot Booth
16:25
And of course, how we can query the data in the S3 query window
16:29
I trust you enjoyed this video. If you do, like, share, comment, and see you in the new year
16:36
Thank you and bye for now. Cheers

Create AWS S3 Bucket and Read Parquet File from Fabric Notebook

c-sharpcorner_com

What is New in GPT 5

Top 10 Vibe Coding Tools to Try Now

The Cloud Show with Magnus Mårtensson ft. Mahesh Chand - Ep: 73

Cómo Crear un Archivo ZIP Dividido en Línea (Guía Sencilla)

분할 ZIP 파일 만들기 | 멀티파트 ZIP(ZIP, Z01, Z02 등)

📁 Cách Nén Thư Mục Thành ZIP Trực Tuyến Miễn Phí | Không Cần Cài Đặt Phần Mềm

How To Make A DIY Wood Countertop

Python 3 Google Translator API to Build Voice Recognition System in Terminal Using Microphone

Python 3 Flask Whatsapp Web API Example to Build Whatsapp Bot to Send Messages & Files in Browser

How To Make Portable Folding Table

How To Create Email Template In Mailchimp (Step by Step) Email Marketing Tutorial For Beginners 2024

Everyone's grabbing Dollar Store frames for this GENIUS outdoor idea!

Up next in 10

Create AWS S3 Bucket and Read Parquet File from Fabric Notebook

c-sharpcorner_com