Over the last few months I have spent a fair amount of time getting up to speed with cloud computing, and how to apply it in practice in a cost-effective yet scalable way.
Auto-scaling was one of the concepts that occurred to me early on, as it probably has to many others.

However in thinking about these issues, I realized that untethered auto-scaling of computing resources in an environment such as EC2 is very susceptible to behaviour that is less than desirable: Imagine a less scrupulous competitor (or just hackers) decides to make a DoS attack on your infrastructure, and has acquired a big network of zombie computers to do this – if your system automatically scales upwards, it could mean that the DoS attack suddenly feeds on itself. It would scale up and up to a point where instead of crashing your infrastructure, the DoS attack would literally bankrupt your business.

Given the choice of going bankrupt or having a service outtage of a couple of hours, the latter is surely more desirable. This means that if you are going to do auto-scaling, you need one of two things, most likely a combination of them:

  • An upper limit for how far you will allow your infrastructure to scale.
  • An early warning system that will allow you to be notified of sudden extreme and pro-longed spikes in traffic (and see where it originates from).

I’m leaning towards thinking the best solution would be a combination: have an early warning system that allows you to take action, for instance allow you to scale beyond a preset limit, but if you do not take action within a set timeframe, it will push the proverbial kill switch to stop your infrastructure from crashing or scaling up too high.