Surviving the Storm
By Jeff Fedor, Terry Goertz on March 12, 2008 - Comments (View)Operations and Architecture on Amazon EC2
Big press release coming out? You are expecting internet traffic volumes to double. Are you ready? If you have deployed with Amazon EC2 you are.
Until AWS came along, you’d be frantically procuring hardware, additional rack space, recruiting on-call staff and hoping that your air conditioner doesn’t crap out. All this to survive your peak load; hope your estimates are right!
The press release comes and goes and you survived the peak—now what do you do with the excess hardware?
In Scaling with Clouds, we covered the basics of Amazon Web Services (AWS). In this article we cover operational and architectural considerations.
With AWS, your computing capacity is elastic, you simply rent what you need by the hour. As your press release hits the wire, bring up additional computing capacity to meet the demand. If your needs decrease, you reduce capacity and decrease your costs. Depending on your architecture, you have two options for scaling: up or out.
To scale up, EC2 instances come in three flavours, small, large and extra-large; each roughly four times larger then the other. To scale out, bring up more instances. The best architecture to support this follows a Service Oriented Architecture (SOA). SOA enables you to scale to meet the demand of specific services and maximize utility while minimizing costs. While other architectures will work on EC2, they will be less effective.
With EC2 and an SOA, service interruption is minimized. If a service fails, the effects are limited to specific functionality. There is an additional benefit when it comes to shipping your next release. Promoting your release is similar to scaling; bring up new instances with your latest features. To your customers, these new features appear without service interruption; helping to achieve 99.999% uptime.
So what is the catch? Most notably a lack of SLA; however, in our experience downtime has not been a concern. Service levels have been high because Amazon is serious about delivering utility computing.
Elasticity in scaling, rip and replace releasing, and a lack of SLA require a mind shift when developing on AWS. You need to plan for failure, design services to function as independently as possible, and consider persistence of data.
Worried about mission critical data? Don’t be. Amazon’s Simple Storage Service (S3) guarantees persistence. This guarantee is backed up by an SLA. Your data will be stored with the same care and concern as all of Amazon’s most valuable data. In our next article we will discuss S3 and other Amazon storage mediums.


Comments
mar 12 2008 11:47
2 Reputation Points
I agree, but it is still a good idea to have a graceful degradation plan available on a separate network if availability is critical. This is made easier by abstracting the core services into libraries/classes that handle these failures in a manner amenable to your application (i.e. using berkdb for queue puts during an SQS outage, to be synced when the service resumes).
Edit (for another )mar 12 2008 20:53
2 Reputation Points
Having a graceful degradation plan is always important. The
likelihood of all your Amazon instances going down simultaneously are
very slim at worst. Keep in mind, Amazon’s services are the same
services use to keep the multi-billion dollar organization a float.
If you have taken an SOA approach and a design for failure mindset,
Edit (for another )delivering high availability comes naturally because you anticipate
failure will happen and have isolated points of failure. With EC2 you
have an easy disaster recovery mechanism. If a disaster does happen,
new instances can be brought online quickly to reduce downtime. You
can also setup automated recovery procedures as part of your
monitoring and disaster recovery systems. For example, Nagios allows
you to take specific actions in the event of a catastrophe. One
action could be to start new instances automagically.