How to Upload Large Files to Amazon S3 with AWS CLI

ByEric Ma Nov 29, 2015Aug 30, 2020

Amazon S3 is a widely used public cloud storage system. S3 allows an object/file to be up to 5TB which is enough for most applications. The AWS Management Console provides a Web-based interface for users to upload and manage files in S3 buckets. However, uploading a large files that is 100s of GB is not easy using the Web interface. From my experience, it fails frequently. There are various third party commercial tools that claims to help people upload large files to Amazon S3 and Amazon also provides a Multipart Upload API which is most of these tools based on.

While these tools are helpful, they are not free and AWS already provides users a pretty good tool for uploading large files to S3—the open source aws s3 CLI tool from Amazon. From my test, the aws s3 command line tool can achieve more than 7MB/s uploading speed in a shared 100Mbps network, which should be good enough for many situations and network environments. In this post, I will give a tutorial on uploading large files to Amazon S3 with the aws command line tool.

Install aws CLI tool

Assume that you already have Python environment set up on your computer. You can install aws tools ~~using pip or~~ using the bundled installer

$ curl "https://s3.amazonaws.com/aws-cli/awscli-bundle.zip" -o "awscli-bundle.zip"
$ unzip awscli-bundle.zip
$ sudo ./awscli-bundle/install -i /usr/local/aws -b /usr/local/bin/aws

Try to run aws after installation. If you see output as follows, you should have installed it successfully.

$ aws
usage: aws [options] <command> <subcommand> [<subcommand> ...] [parameters]
To see help text, you can run:

  aws help
  aws <command> help
  aws <command> <subcommand> help
aws: error: too few arguments

Configure `aws` tool access

The quickest way to configure the AWS CLI is to run the aws configure command:

$ aws configure
AWS Access Key ID: foo
AWS Secret Access Key: bar
Default region name [us-west-2]: us-west-2
Default output format [None]: json

Here, your AWS Access Key ID and AWS Secret Access Key can be found in Your Security Credentials on the AWS Console.

Uploading large files

Lastly, the fun comes. Here, assume we are uploading the large ./150GB.data to s3://systut-data-test/store_dir/ (that is, directory store-dir under bucket systut-data-test) and the bucket and directory are already created on S3. The command is:

$ aws s3 cp ./150GB.data s3://systut-data-test/store_dir/

After it starts to upload the file, it will print the progress message like

Completed 1 part(s) with ... file(s) remaining

at the beginning, and the progress message as follows when it is reaching the end.

Completed 9896 of 9896 part(s) with 1 file(s) remaining

After it successfully uploads the file, it will print a message like

upload: ./150GB.data to s3://systut-data-test/store_dir/150GB.data

aws has more commands to operate files on S3. I hope this tutorial helps you start with it. Check the manual for more details.

How to make “tree” output consistent on Linux

ByEric Ma Mar 24, 2018Mar 24, 2018

I tried tree on different Linux boxes to verify the files by diff. However, I found the format can be a little bit different on different nodes. For examples, the tree result could be . |– test2 | |– test4 | `– test5 `– test3 1 directory, 3 files or . ├── test2 │ ├──…

News

Conferences on Cloud Computing 2012

ByEric Ma May 11, 2011Mar 27, 2018

This post lists important conferences on Cloud Computing in year 2012. OSDI 2012 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’12) October 8–10, 2012, Hollywood, CA “The tenth OSDI seeks to present innovative, exciting research in computer systems. OSDI brings together professionals from academic and industrial backgrounds in what has become a…

Linux | Network | Programming | Software | Tutorial

Git through SSH Tunnel as Proxy

ByEric Ma May 21, 2014Aug 30, 2020

git is a great tool and it is common to have a git server over SSH possibly managed by gitolite. However, there are situations that we can not directly connect to the git server but be able to SSH to another node that can connect to the git server. The git server may allow only…

Tutorial

How to Set up or Disable Call Forwarding in iOS for iPhone

ByDavid Yang May 12, 2017Sep 25, 2020

Call forwarding is useful if you are expected to be unavailable for receiving calls or lose cell phone coverage for some time. Call forwarding is not something that is “new” to modern phones. It is a feature which is invented in 1960s (the patent expired in 1980) of some telephone switching systems. But with iPhone/iOS,…

Direct multi-hop ssh connection

ByEric Ma Mar 24, 2018Mar 24, 2018

How to use multi-hop ssh connection without needs to ssh multiple times? As a example, you are connecting to server.example.com through proxy.example.com from laptop.example.com as follows: laptop —-> proxy —-> server 2 possible methods: Method 1: Use the similar method as in Directly SSH to hosts using internal IPs through the gateway. Add this to…

How to debug media print view of Web page in Firefox?

ByEric Ma Mar 24, 2018Mar 24, 2018

How to debug the media print view set by @media print {} in CSS of Web pages in Firefox? In firefox, after opening the Web page, First, hit “Shift + F2” to open the “Developer Toolbar” at the bottom. Second, in the “Developer Toobar”, input media emulate print and Firefox will show the print view…

8 Comments

Eric Ma says:

Dec 16, 2015 at 4:57 pm

To upload a directory recursively, you may use `aws s3 sync`. For example, to upload current directory to my-bucket bucket under dir my-dir:

$ aws s3 sync . s3://my-bucket/my-dir/

Reply
Pedro says:

Jun 25, 2016 at 12:58 am

Hey Eric, is there a parameter available for the above command that would allow me to enforce TLS 1.2 encryption in-transit?

Reply
1. Eric Z Ma says:
  
  Jun 30, 2016 at 11:16 am
  
  I am not aware of such one. You may need to dig into the source code of aws-cli which is available at https://github.com/aws/aws-cli to investigate or make patch to enforce TLS 1.2.
  
  Reply
Nhu says:

Aug 12, 2016 at 1:44 pm

how do I sync between an sftp location and s3 bucket directly?

Reply
1. Eric Z Ma says:
  
  Aug 19, 2016 at 4:25 pm
  
  You may consider a solution like this:
  
  1. Mount the sftp location by sshfs http://www.systutorials.com/1505/mounting-remote-folder-through-ssh/ to a local directory.
  
  2. Use the tool in this post to upload the file to sync the local directory (mounted the sftp location) with your S3 bucket.
  
  Reply
sal says:

Nov 28, 2016 at 7:33 pm

What happens when a large file upload fails?? This is not covered.
I’ve been getting segfaults using the straight cp command, and re-running it will start again from the beginning. On large files this can mean days wasted.

Reply
1. Andy says:
  
  Mar 16, 2019 at 6:02 am
  
  Stumbled upon this while looking for solutions to upload large files.
  Check this link: https://aws.amazon.com/premiumsupport/knowledge-center/s3-multipart-upload-cli/
  If your cp process keeps dying, you may want to implicitly break it apart with the lower level s3api command set.
  
  Reply
Narendra says:

Apr 4, 2020 at 7:44 am

How do i upload a image file from my local folder to s3 bucket via command prompt.

Please help to provide CLI commands.

Reply

Install aws CLI tool

Configure aws tool access

Uploading large files

Similar Posts

8 Comments

Leave a Reply Cancel reply

Configure `aws` tool access