Saving money on AWS Part 1: Choose the right filling for your compute pie.

When it comes to cutting costs on AWS, the biggest chunk to aim for is compute cost.  In our experience,compute services – think anything EC2 backed – make up almost 70% of an organization’s AWS spend, which makes it the obvious place to start when looking to reduce costs.  The pie chart below illustrates the anecdote – you normally choose your pie based on your desired filling (think server specifications), but it’s the filling that costs you the most.

 

Breaking down the data even further reveals that most compute costs sit in EC2 (almost 50% of overall costs) with RDS and Redshift adding a further 17%.  Not surprising considering they both run on EC2 anyway.  Focusing on ensuring that the resources are correctly sized is therefore going to yield significant results (unless you’ve done this already) – a process called rightsizing.

Rightsizing

If you’re looking for something hearty, you want big chunks of meat. Small bits of veg are more for the health conscious.  Well, not really, but I’m battling to extend the pie analogy, so let me get straight to the point:  Rightsize your instances before you even think about making reservations.

I’ve watched companies simply go for savings via reservations and then find they’re stuck with mismatches further down the line or are unable to rightsize when they’ve picked up on the option to do so as they’ll lose their RI savings*.  Worse still, they sometimes never find out because savings from these commitments are also not easily tracked, leaving the cycle to repeat itself.

So how do you rightsize?  Carefully.  For one, more data generally means more accuracy.  Run the data collection period for long enough to cover any seasonality, for example, or make sure you collect metrics under the maximum load you have designed the system to handle.  Also remember that metrics reported by CloudWatch exclude memory and some other stats and are reported at the hypervisor layer, so it’s preferable to load an agent into the instance OS and get the stats from the view of the application.  The CloudHealth agent we use is free and of course integrates directly into the analysis we do using the CloudHealth platform, but you can use metrics from almost any agent like Datadog, New Relic or Zabbix, and many management tools include some level of server monitoring too.

If you’re feeling like an expert pieman (or woman), you can also use CloudHealth to dive deeper and specify your own rightsizing policy to include some of your own best practices. For example, you may want to provide more headroom for CPU utilization, or ensure any instance with a high-speed network interface is never replaced by an instance type without the same interface.

 

In addition, you can also consider the efficiency of instances to prioritize which of them to rightsize first.  If an instance is moderately underutilized and very expensive, it could be considered as more inefficient than an instance that is severely underutilized but cheap as…umm…chips (no gravy).  You’ll want to ensure you’re dealing with the least cost efficient cases first, just in case you have to scoff your pie in a hurry.

And lastly, always test changes before settling on new instance sizes or going ahead with an RI analysis.  This is the cloud, you can do it quite easily, so do it.

Next week we’ll take a good look at AWS reserved instances, what criteria you must consider in the process, and how to get out of jail if you make bad decisions.

Never stop optimizing!

Russell Warne is Chief Customer Officer for Kaskade.cloud. He is a Certified AWS Solutions Architect – Associate and a Certified Cloud Health Platform Administrator Associate for Cost Optimisation.

*AWS continue to make it easier for clients to benefit from RI purchases, including the automatic application of size flexibility i.e. the RI will be applied in a proportionate manner to instance sizes in the same family (previously you had to manually adjust RIs to get this benefit).  However, this does not work across all variables such as instance families, tenancy, OS etc., so it pays to make the best choices up-front.

Related posts