So this story begins one fine afternoon just minding my own business, roaming around in a project’s production AWS console.
I was testing out some new ideas to change the system over to a more serverless architecture, the main goal being to lower some costs, but also just because serverless is all the rage, right?
One of our biggest costs is EC2, which accounts for nearly 70% of the total bill:
However, when I was having a look at these EC2 costs in more detail, there was an extra
EC2-Other service which accounted for a hefty amount of the total EC2 costs:
EC2-Other - sorry, what?
After some research into this
EC2-Other service, AWS made it sound like it was an expected cost, and it was just part and parcel of using EC2:
The EC2-Other category includes multiple service-related usage types, tracking costs associated [with] Amazon EBS volumes and snapshots, elastic IP addresses, NAT gateways, data transfer, and more.
So, for a moment I stopped looking into these costs and just assumed they were normal.
EC2-Other - sorry, no.
A few days had passed, and I was still roaming around in that same console. This time I was looking specifically at a monthly bill breakdown. This was when I noticed something odd:
Did you spot it? 52,664 GB (~53 TB) of data being processed through our NAT gateway - that is suspicious. We would expect to see some data being transferred through the NAT gateway but that amount is absurd.
My investigations began, and well it did not take long to find an answer.
I determined that every time a container in EC2 was pulling an image from ECR (Elastic Container Registry) it was being transferred through our NAT gateway. Every file that was being transferred from S3 to a container or vice versa, which in our system is a lot of GBs, this was going through the gateway. Even our logs being sent from any container to CloudWatch, through the gateway…
First off you would have thought that AWS would have managed all these transfers because they are from within AWS - but of course they are outside your VPC (Virtual Private Cloud) - so the only way in is through your NAT gateway.
EC2-Other - sorry, bye
You can use AWS PrivateLink to connect the resources in your VPC to services using private IP addresses, as if those services were hosted directly in your VPC.
If you use your console for changes to infrastructure then follow this tutorial on how to set these endpoints up for S3, ECR and logs: Create Private Links via Console or also this one provided by AWS: New VPC Endpoint for S3.
However, if you use CDK to deploy your infrastructure, like a hero, then here is how to set up your 3 new VPC endpoints. I am using Kotlin but if you use TypeScript or another language it shouldn’t be too hard to adjust.
CDK (In Kotlin)
Here are your imports for this setup:
import software.amazon.awscdk.services.ec2.GatewayVpcEndpoint import software.amazon.awscdk.services.ec2.GatewayVpcEndpointProps import software.amazon.awscdk.services.ec2.GatewayVpcEndpointAwsService import software.amazon.awscdk.services.ec2.InterfaceVpcEndpoint import software.amazon.awscdk.services.ec2.InterfaceVpcEndpointProps import software.amazon.awscdk.services.ec2.InterfaceVpcEndpointService
You’ll need to have your VPC available in this stack.
Firstly here is your Gateway Endpoint for S3:
private val s3VpcEndpoint = GatewayVpcEndpoint( scope, "your-s3-endpoint", GatewayVpcEndpointProps.builder() .vpc(vpc) .service(GatewayVpcEndpointAwsService.S3) .build() )
Gateway endpoints are only currently available for S3 and DynamoDB - otherwise you must use an Interface Endpoint. Gateway endpoints for S3 are offered at no cost and the routes are managed through route tables.
Interface endpoints are priced at $0.01/per AZ/per hour. Cost depends on the Region, check current pricing. Data transferred through the interface endpoint is charged at $0.01/per GB (depending on Region).
For the Interface Endpoint for ECR you need to have your security groups available:
Here I have used my security groups to give you an idea of what you might need. It is important to get these right or else you could have issues with your containers not being able to pull from ECR:
private val ecrVpcEndpoint = InterfaceVpcEndpoint( scope, "your-ecr-endpoint", InterfaceVpcEndpointProps.builder() .service(InterfaceVpcEndpointService("com.amazonaws.$yourRegion.ecr.dkr")) .vpc(vpc) .privateDnsEnabled(true) .securityGroups( listOf(loadBalancerSecurityGroup, ecsHostSecurityGroup, efsSecurityGroup) ) .build() )
Then for the Interface Endpoint for Logs:
private val logsVpcEndpoint = InterfaceVpcEndpoint( scope, "your-logs-endpoint", InterfaceVpcEndpointProps.builder() .service(InterfaceVpcEndpointService("com.amazonaws.$yourRegion.logs")) .vpc(vpc) .privateDnsEnabled(true) .securityGroups( listOf(loadBalancerSecurityGroup, ecsHostSecurityGroup, efsSecurityGroup) ) .build() )
Sadly with CDK you cannot add a name tag to these endpoints (see issue here) so when you deploy you will see something that looks like this:
So I pushed these changes to our production environment halfway through the day 23rd June, but I think you can see that quite clearly here:
So after leaving your system to run for a little while with these endpoints in place you should start to see the EC2-Other costs drop significantly. For this example our costs dropped from over $2000 per month to just over $200!
There is an increase of course in the costs for your VPC for these endpoints, but I am not too worried about the $1 increase there:
Thanks for coming on this journey with me, and I hope this can help you save some precious dolla bills