|

How to Upload Large Files to Amazon S3 with AWS CLI

Amazon S3 is a widely used public cloud storage system. S3 allows an object/file to be up to 5TB which is enough for most applications. The AWS Management Console provides a Web-based interface for users to upload and manage files in S3 buckets. However, uploading a large files that is 100s of GB is not easy using the Web interface. From my experience, it fails frequently. There are various third party commercial tools that claims to help people upload large files to Amazon S3 and Amazon also provides a Multipart Upload API which is most of these tools based on.

While these tools are helpful, they are not free and AWS already provides users a pretty good tool for uploading large files to S3—the open source aws s3 CLI tool from Amazon. From my test, the aws s3 command line tool can achieve more than 7MB/s uploading speed in a shared 100Mbps network, which should be good enough for many situations and network environments. In this post, I will give a tutorial on uploading large files to Amazon S3 with the aws command line tool.

Install aws CLI tool

Assume that you already have Python environment set up on your computer. You can install aws tools using pip or using the bundled installer

$ curl "https://s3.amazonaws.com/aws-cli/awscli-bundle.zip" -o "awscli-bundle.zip"
$ unzip awscli-bundle.zip
$ sudo ./awscli-bundle/install -i /usr/local/aws -b /usr/local/bin/aws

Try to run aws after installation. If you see output as follows, you should have installed it successfully.

$ aws
usage: aws [options] <command> <subcommand> [<subcommand> ...] [parameters]
To see help text, you can run:

  aws help
  aws <command> help
  aws <command> <subcommand> help
aws: error: too few arguments

Configure aws tool access

The quickest way to configure the AWS CLI is to run the aws configure command:

$ aws configure
AWS Access Key ID: foo
AWS Secret Access Key: bar
Default region name [us-west-2]: us-west-2
Default output format [None]: json

Here, your AWS Access Key ID and AWS Secret Access Key can be found in Your Security Credentials on the AWS Console.

Uploading large files

Lastly, the fun comes. Here, assume we are uploading the large ./150GB.data to s3://systut-data-test/store_dir/ (that is, directory store-dir under bucket systut-data-test) and the bucket and directory are already created on S3. The command is:

$ aws s3 cp ./150GB.data s3://systut-data-test/store_dir/

After it starts to upload the file, it will print the progress message like

Completed 1 part(s) with ... file(s) remaining

at the beginning, and the progress message as follows when it is reaching the end.

Completed 9896 of 9896 part(s) with 1 file(s) remaining

After it successfully uploads the file, it will print a message like

upload: ./150GB.data to s3://systut-data-test/store_dir/150GB.data

aws has more commands to operate files on S3. I hope this tutorial helps you start with it. Check the manual for more details.

Similar Posts

  • Micosoft招聘部分算法题

    Micosoft招聘部分算法题 1.链表和数组的区别在哪里? 2.编写实现链表排序的一种算法。说明为什么你会选择用这样的方法? 3.编写实现数组排序的一种算法。说明为什么你会选择用这样的方法? 4.请编写能直接实现strstr()函数功能的代码。 5.编写反转字符串的程序,要求优化速度、优化空间。 6.在链表里如何发现循环链接? 7.给出洗牌的一个算法,并将洗好的牌存储在一个整形数组里。 8.写一个函数,检查字符是否是整数,如果是,返回其整数值。(或者:怎样只用4行代码编写出一个从字符串到长整形的函数?) 9.给出一个函数来输出一个字符串的所有排列。 10.请编写实现malloc()内存分配函数功能一样的代码。 11.给出一个函数来复制两个字符串A和B。字符串A的后几个字节和字符串B的前几个字节重叠。 12.怎样编写一个程序,把一个有序整数数组放到二叉树中? 13.怎样从顶部开始逐层打印二叉树结点数据?请编程。 14.怎样把一个链表掉个顺序(也就是反序,注意链表的边界条件并考虑空链表)? 来源:·日月光华 bbs.fudan.edu.cn Read more: Java Calling Native Functions in .DLL on Windows OCaml Learning Materials Online Tutorials for Linux Beginners How to Run a Command Upon Files or Directories Changes on Linux Parameterised AngularJS Routing in Asp.net MVC using $routeProvider…

  • How to enable RPM Fusion for CentOS 6.6?

    How to enable RPM Fusion for CentOS 6.6? Enable RPM fusion on RHEL 6 or compatible like CentOS: su -c ‘yum localinstall –nogpgcheck http://download1.rpmfusion.org/free/el/updates/6/i386/rpmfusion-free-release-6-1.noarch.rpm http://download1.rpmfusion.org/nonfree/el/updates/6/i386/rpmfusion-nonfree-release-6-1.noarch.rpm’ It will install https://fedoraproject.org/wiki/EPEL. If it fails to install EPEL automatically, you will need to install it manually. Reference: http://rpmfusion.org/Configuration Read more: How to install alien on CentOS 7 to…

  • Uniswap 101

    Uniswap is a decentralized exchange (DEX) that was launched in 2018. It is built on the . Uniswap is a fully decentralized platform, which means that it is not controlled by any central authority or organization. The key feature of Uniswap is its , which allow users to trade cryptocurrencies without the need for a…

  • How to set up Firefox Sync?

    How to set up Firefox Sync? The online Firefox help provides a very good tutorial on setting up Firefox sync across computers and other devides: http://support.mozilla.org/en-US/kb/how-do-i-set-up-firefox-sync Read more: Firefox: how to sync bookmarks saved on iOS devices to Firefox on PC? Is Firefox Sync safe, that is, could someone else read my password saved in…

8 Comments

  1. To upload a directory recursively, you may use `aws s3 sync`. For example, to upload current directory to my-bucket bucket under dir my-dir:

    $ aws s3 sync . s3://my-bucket/my-dir/

  2. Hey Eric, is there a parameter available for the above command that would allow me to enforce TLS 1.2 encryption in-transit?

  3. What happens when a large file upload fails?? This is not covered.
    I’ve been getting segfaults using the straight cp command, and re-running it will start again from the beginning. On large files this can mean days wasted.

  4. How do i upload a image file from my local folder to s3 bucket via command prompt.

    Please help to provide CLI commands.

Leave a Reply

Your email address will not be published. Required fields are marked *