Fearless deployments with CodeDeploy
24 July 2024
CodeDeploy is an AWS fully managed code deployment service for AWS Lambda, EC2, ECS and on-premise servers. In this post, I will show how we used it to reduce overall deployment time by 12x and application downtime by 36x on a recent EC2 project all whilst enabling automatic rollbacks for failed deployments.
Firstly, some context. Prior to using CodeDeploy, our application was deployed as a JAR file to EC2 using a GitHub Action. Here's the Github Action action.yml
file:
name: "Deploy to environment"
description: "Installs dependencies, tears down stack and deploys project"
runs:
using: "composite"
steps:
- name: Setup Node
uses: actions/setup-node@v3
with:
node-version: 18.13.0
- name: Install AWS CDK
run: npm install -g aws-cdk
shell: sh
- name: Tear down stack and deploy
run: |
cdk destroy --force --execute --ci --require-approval=never $STACK -c BUILD_NUMBER=2.1.$GITHUB_RUN_NUMBER -c ENVIRONMENT=$ENVIRONMENT
cdk deploy --execute --ci --require-approval=never $STACK -c BUILD_NUMBER=2.1.$GITHUB_RUN_NUMBER -c ENVIRONMENT=$ENVIRONMENT
working-directory: infrastructure
shell: sh
Crucially, the project has several database migration scripts that are executed as part of the deployment. These scripts can update the underlying database schema, invalidating the old version of the application - a deployment therefore had to update the schema and the latest version of the application as a unit. Consequently, the action executed a cdk destroy
followed by a cdk deploy
to first tear down and then re-deploy the new version of the application.
A slight disclaimer: the original authors of the action clearly felt the cdk destroy
step was warranted because, by itself, a cdk deploy
only updates the EC2 instance's 'user data script'. It does not stop and restart the new version of the application. Therefore, without the destroy step, we might update the database schema but leave the old version of the application running. The problem, however, with cdk destroy
is that it is extremely time-consuming - it would have been better and faster if the action avoided this step altogether and instead stopped, deployed and then restarted the server. But even with this, the action would have too slow for our needs (and still much slower than using CodeDeploy).
Once the new application is deployed and the EC2 instance restarted, it's user data script is executed. Here's how our original script looked before the updates to use CodeDeploy:
#!/bin/bash
\# Associate the elastic IP to the EC2 instance
aws ec2 associate-address --instance-id ${'$'}(curl -s http://169.254.169.254/latest/meta-data/instance-id) --allocation-id ${context.elasticIP} --allow-reassociation
yum update -y
yum remove java-1.7.0-openjdk -y
yum install java-1.8.0 -y
yum install java-devel -y
yum install ruby -y
yum install wget -y
mkdir /opt/project
echo 'export environmentName="${yamlEnvironment.environmentName}"' >> /etc/profile
wget https://aws-codedeploy-${context.region}.s3.${context.region}.amazonaws.com/latest/install
chmod +x ./install
./install auto
The BUILD_NUMBER
environment variable is used to pull the correct JAR file from S3. Note that we had set things up to use an AWS-provided AMI ensuring that every deployment used the latest Amazon Linux 2 version.
Extended downtime
Our application was a vital cog in our client's infrastructure, but it is mostly used during US business hours - we had the luxury of being able to take it offline for short periods, before the US working day started. Still, it was important to minimise these windows - a few minutes of downtime versus tens of minutes was acceptable.
Using cdk to destroy and then deploy the new application resulted in 3 fundamental problems that affected our deployment windows:
Slow deployments: a single deployment took at least 12 minutes to complete.
Extended downtime: destroying the environment meant that the website was down until the deployment had completed i.e. 12+ minutes.
No automatic rollback: If a deployment failed, the website would be down for at least 24 minutes, until the previous working version was redeployed. This time included 12+ minutes for the initial failed deployment, 12+ minutes for the new deployment, plus whatever time it took to react and fix the problem.
Switching to CodeDeploy
CodeDeploy is an AWS fully managed code deployment service, for AWS Lambda, EC2, ECS and on-premises servers. For EC2 deployments, CodeDeploy supports 2 deployment types: in place
and blue\green
.
With an in place
deployment, the old version of the application is stopped, the latest version installed and then started and validated. A blue/green
deployment starts the application on a new instance and then switches traffic from the old instances to the new instances. For more information, see the deployment types user guide.
Here are the 4 steps we followed to switching from a Github Action/CDK based deployment to using CodeDeploy.
1. Create a CodeDeploy application
To specify a CodeDeploy application, you will first need to create and configure a ServerApplication. Ours was written in Kotlin:
class Codedeploy(
private val stack: Stack,
brand: Brand
) {
val deploymentGroupName = "IN_PLACE_ALL_AT_ONCE"
val codeDeployApplicationName = "projectCodeDeploy"
val autoScalingGroupName = "AutoScalingGroup"
init {
createApplication(brand)
createCfnDeploymentGroup(brand)
}
private fun createApplication(brand: Brand) {
ServerApplication.Builder.create(stack, "CodeDeployApplication")
.applicationName(codeDeployApplicationName)
.build()
}
private fun createCfnDeploymentGroup(brand: Brand): CfnDeploymentGroup {
CfnDeploymentGroup.Builder.create(stack, "CfnDeploymentGroup")
.applicationName(codeDeployApplicationName)
.autoScalingGroups(listOf(autoScalingGroupName))
.deploymentGroupName(deploymentGroupName)
.serviceRoleArn(buildCodeDeployIamRole(brand).roleArn)
.deploymentConfigName("CodeDeployDefault.AllAtOnce")
.autoRollbackConfiguration(
CfnDeploymentGroup.AutoRollbackConfigurationProperty.builder()
.enabled(true)
.events(listOf("DEPLOYMENT_FAILURE"))
.build()
)
.build()
}
private fun buildCodeDeployIamRole(brand: Brand): Role {
return Role.Builder.create(stack, "CodeDeployIamRole")
.assumedBy(ServicePrincipal("codedeploy.amazonaws.com"))
.managedPolicies(listOf(ManagedPolicy.fromManagedPolicyArn(stack, "CodeDeployRolePolicy", "arn:aws:iam::aws:policy/service-role/AWSCodeDeployRole")))
.build()
}
}
A few things to note:
An application can have many deployment groups, allowing for different deployment strategies. In our case, we specified a single deployment group.
deploymentConfigName
specifies how many instances of the deployment run at once. See the deployment configuration guide for more details. As our project needed to be deployed as a single unit, the entire deployment was runin place
andall at once
.Enabling
autoRollbackConfiguration
ensures that CodeDeploy will automatically deploy the last working version on a failed deployment.An IAM role is required for the CodeDeploy’s deployment group to have the necessary permissions. Following AWS's principle of least privilege, the role’s managed policies are created from an AWS policy that contains only the permissions needed for CodeDeploy to work.
2. Update the EC2 user data script
The next step is update the EC2's user data script to install and run the CodeDeploy agent. The agent requires ruby
and wget
to be installed.
#!/bin/bash
/# Associate the elastic IP to the EC2 instance
aws ec2 associate-address --instance-id ${'$'}(curl -s http://169.254.169.254/latest/meta-data/instance-id) --allocation-id ${context.elasticIP} --allow-reassociation
yum update -y
yum remove java-1.7.0-openjdk -y
yum install java-1.8.0 -y
yum install java-devel -y
yum install ruby -y
yum install wget -y
mkdir /opt/project
echo 'export environmentName="${yamlEnvironment.environmentName}"' >> /etc/profile
wget https://aws-codedeploy-${context.region}.s3.${context.region}.amazonaws.com/latest/install
chmod +x ./install
./install auto
3. Configure the CodeDeploy agent
Next, configure CodeDeploy's agent. Do this by creating an AppSpec file (appspec.yml)
in the root of your source code. This file informs CodeDeploy what to install from your Github respository and which lifecycle hooks to run in response to deployment lifecycle events. AppSpec hooks
allow you to configure aspects of the deployment. The full list available hooks is documented on the guide.
For EC2 deployments, each hook provides the location of a script (relative to the root of the source code being deployed) which the CodeDeploy agent executes. The timeout
is optional, but cannot be greater than 3600 (1 hour). For example:
version: 0.0
os: linux
files:
- source: /
destination: /opt/project
hooks:
ApplicationStop:
- location: stop_application.sh
timeout: 180
runas: root
ApplicationStart:
- location: start_application.sh
timeout: 10
runas: root
ValidateService:
- location: validate_service.sh
timeout: 60
runas: root
In the above example, the ValidateService
hook runs a script that checks the project’s health check endpoint to confirm the deployment has succeeded.
4. Update the GitHub Action
The final step is to update the GitHub action for deploying the project by using the aws cli
to create a new deployment. The command-line JSON processor jq
is used to extract the deployment ID and check if the deployment succeeded:
- name: Deploy app with AWS CodeDeploy
run: |
DEPLOYMENT_OUTPUT=$(aws deploy create-deployment \
--application-name CodeDeploy \
--deployment-group-name IN_PLACE_ALL_AT_ONCE \
--revision "{\"revisionType\":\"S3\",\"s3Location\":{\"bucket\":\"project-s3-bucket-${ENVIRONMENT}\",\"key\":\"project-builds/deployment-package-2.1.${GITHUB_RUN_NUMBER}.zip\",\"bundleType\":\"zip\"}}" \
--description "${GITHUB_SHA}" \
--region=${AWS_DEFAULT_REGION} \
--output json)
DEPLOYMENT_ID=$(echo $DEPLOYMENT_OUTPUT | jq -r .deploymentId)
echo "DEPLOYMENT_ID=$DEPLOYMENT_ID" >> $GITHUB_ENV
shell: sh
- name: Check Deployment succeeded
run: |
aws deploy wait deployment-successful --deployment-id=${DEPLOYMENT_ID}
shell: sh
The Result
Wiring all the above together resulted in significant improvements to deploy times:
Reduced deployment time: deployment now takes 50-60 seconds, ~13x faster!
Reduced downtime: downtime was reduced to 15-20 seconds, down from ~12 minutes. This time is limited only by how long it takes to stop the old version and serve the new version. For us, this was a ~36x reduction in downtime compared to using CDK.
Automatic rollback on a failed deployment: If a deployment fails, CodeDeploy will roll back to the latest working version. Made possible with a single line of code!
Great, but how much does it cost?
The good news is that if you are deploying to EC2, Lambda or ECS there is no additional charges. Indeed, your overall charges should reduce because of a considerably reduced GitHub consumption. Here's how AWS frames CodeDeploy pricing:
For CodeDeploy on EC2, Lambda, ECS: There is no additional charge for code deployments to Amazon EC2, AWS Lambda or Amazon ECS through AWS CodeDeploy.
For CodeDeploy On-Premises: You pay $0.02 per on-premises instance update using AWS CodeDeploy. There are no minimum fees and no upfront commitments. For example, a deployment to three instances equals three instance updates. You will only be charged if CodeDeploy performs an update to an instance. You will not be charged for any instances skipped during the deployment.
You pay for any other AWS resources (e.g. S3 buckets) you may use in conjunction with CodeDeploy to store and run your application. You only pay for what you use, as you use it; there are no minimum fees and no upfront commitments.
Article By
Oliver Looney
Software Engineer