Configure DVC 🖇️ Amazon S3 Bucket
Last Updated on April 5, 2024 by Editorial Team
Author(s): ronilpatil
Originally published on Towards AI.
Configure DVC U+1F587️ Amazon S3 Bucket
Hello everyone, I hope you are doing well. If you’re interested in the MLOps domain, this blog will be quite fascinating. Here I’ll demonstrate “How to configure the DVC remote with Amazon S3 Bucket so that any data tracked by DVC, can be directly saved into S3”.
I. Create IAM User
Step I. First, create an IAM(Identity and Access Management) user with Administrator Access to get full control over an AWS account, allowing us to manage resources, create and delete users, and adjust permissions.
Search for IAM Service and select it.
Step II. Click on Create user.
Step III. Click on Users.
Step IV. Give any user name and click on Next.
Step V. Select Attach policies directly and move down.
Step VI. Select AdministratorAccess policy, move down, and click on Next.
Step VII. Click on Create user.
CongratsU+1F389 IAM User created successfully.
Step VIII. IAM User is created, let’s create an access key (we’ll need to set up AWS CLI). Goto IAM Users, click on User name here it’s “Admin”.
Step IX. Go to Security credentials, and move down.
Step X. You will see the Access keys block, click on Create access key.
Step XI. Select Command Line Interface (CLI).
Step XII. Accept T&C, and click on Next.
Step XIII. Description is optional, click on Create access key.
Step XIV. CongratsU+1F389 Access key created successfully, Download the .csv file & keep it safe and secure.
II. Create Amazon S3 Bucket
Step I. To create an S3 Bucket, search for S3 bucket and click on Create bucket. Just give any bucket name, click on Create bucket that's it! Keep the remaining things default.
Step II. CongratsU+1F389 S3 bucket created successfully.
Step III. We can also create folder inside the S3 bucket, I’m going to store my artifacts in dvcArtifacts
folder.
III. Configure AWS CLI
To configure DVC with an S3 bucket first, we need to configure the AWS CLI (Command Line Interface). This allows DVC to interact with our AWS account and S3 bucket securely, enables functionalities such as pushing and pulling data to and from the bucket. By configuring the AWS CLI, we provide the necessary credentials and permissions for DVC to access and manage data in your S3 bucket effectively. You can download it from here.
Step I. Download & install AWS CLI, open the terminal and run aws --version
command to verify everything is perfect.
Step II. Run aws configure
command in the terminal. Enter the Access Key, Secret Access Key (generated while creating IAM User), region name, and output format of AWS CLI.
IV. Configure DVC Remote
Once everything is set up perfectly, now let's configure the DVC remote.
First install dvc-s3
package (plugin for dvc), then run dvc remote add -d remote_name s3://bucket_name/dir_name
cmd. This command will set up an S3 bucket as a remote for DVC.
Push & pull the versioned data using dvc push
& dvc pull
. That’s it U+1F60E
Conclusion
If this blog has sparked your curiosity or ignited new ideas, follow me on Medium, GitHub & connect on LinkedIn, and let’s keep the curiosity alive.
Your questions, feedback, and perspectives are not just welcomed but celebrated. Feel free to reach out with any queries or share your thoughts.
Thank youU+1F64C &,
Keep pushing boundariesU+1F680
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI