AWSUGBE #2: AWS Use cases and S3 best practices/upload performancePublished on November 15, 2013 by Frederik Denkens
Last week we hosted the second AWS User Group Belgium at our offices. Due to the terrible traffic around Antwerp we started out a bit later, but that didn’t make it any bit less interesting.
The first presentation was by Alex Sinner, AWS Solutions Architect. He explained the main use cases for choosing AWS and how to design a simple, fault-tolerant application. The main takeaways were:
- Design for failure (“everything fails all the time”)
- Use multiple Availability Zones (Multi AZ RDS, spread load, etc)
- Build for scale (monitoring, auto-scaling, etc)
- Decouple components (better scaling, better fault tolerance and how SQS can help)
An interesting discussion followed around the capacity of various availability zones in case of downtime of one zone. In that context it is always better, if you have the budget, to spread your normal load and replication over as much zones as possible. The additional cost is marginal, but in case of problems your chances of being able to migrate your complete workload over to the other zones is significantly better.
For me the most interesting slide was this:
It illustrates that any cloud platform evolves. You start with a basic platform (constrained by time and resources) and iterate to add additional features (Scalability, fault-tolerance, etc) to reach a perfect environment. Just like your code.
After a short break, I presented “S3 Intro, tips and filling it up with data quickly” (or presentation with speaker notes). The first half focused on a general introduction to S3 on how to use it. The second section focused on how to get your data onto S3 as quickly as possible using standard tools.
After some theory on best practices, we progressed to do some tests and formulate conclusions. The tests started at around 18 megabytes per second of data transferred from an EC2 ramdisk to S3. However, through some simple optimisations we got up to 248 megabytes per second using just standard command line tools.
The two main benefactors to this dramatic performance increase were:
- instance type and related IO performance class
- the use of multiple upload threads.
Theoretically a Very High I/O instance should go up to 10 Gbit, or about 1,1 gigabytes per second. Some people on the internet claim to have gotten up to such speeds. Alex shed some light on how we might be able to reach that goal by taking into consideration how S3 indexing and partitioning works. Unfortunately I haven’t had the time to test that out yet (these two links might help a bit). Any takers? 🙂
This was only the second AWS user group, but I feel it has everything going for it to become an interesting forum in Belgium where much is to be learned about AWS. Want to be part of the next one? Then sign-up on the meet up group and we’ll notify you once we have a date. It will probably be somewhere in January in Brussels.
As always, if you want some help on figuring this stuff out, don’t forget to call the Skyscrapers!