Published on: Tue Dec 21 2021
In this module, we will start by adding some safety into the deployment with blue green deployment. So, when errors occur or things go wrong, we can reduce the impact and quickly recover from it.
By the end of this module, you should have an infrastructure that performs a blue-green canary deployment with automatic rollback.
Here is what we will be building:
In order for us to turn our existing infrastructure to blue green deployment we need to do the following:
Let’s dive right in!
Now that we have a fully functional application with CI/CD pipeline, we can start to gradually enhance it.
It is great that we have automatic and continuous deployment. However, it comes with a drawback where if an error occurs, we would observe some downtime. In order to fix this, we would also need to re-deploy the previous version of our application.
This doesn’t sound like a great process, and one that requires many manual steps. I already mentioned that blue-green deployment with canary release is one way solve this problem.
Let’s review some of the concepts to see how this strategy helps and what role codedeploy plays in the whole process before we start building it out.
Why blue green deployment ?
Blue green deployment allows us to have zero downtime deployment by keeping two versions of the application running at the same time (blue & green).
The main benefit is that we can leverage a load balancer to route traffic to the “green“ infrastructure (new version) while keeping the ”blue” infrastructure on stand-by (existing version) in the event of unexpected issues.
This allows us to quickly switch between the two versions, and is especially useful in a rollback scenario. The obvious downside of this approach is there are more associated costs with running two infrastructure during the interval.
Typically, you would want to reserve this pattern for zero downtime infrastructure.
In addition, this pattern is more suitable if you are making an incremental change rather than a breaking change like a database changes between version.
Adding a database change further complicates the strategy, and it would require an in-depth analysis to get it right.
Canary deployment is similar to the blue green where both strategies serves to minimize downtime during deployments. However, rather than shifting all traffic to the new infrastructure, it takes an phased approach.
Canary deployment performs the updates in infrastructure in phases. For example, it’ll update 10% of infrastructure over a period of 15 minutes then update the rest (90%) to the new infrastructure.
with AWS Codedeploy, as you will see very soon, the canary deployment is done via the traffic shifting rather than replacing the infrastructure.
So, we are still running blue green deployment but the traffic is just shifted gradually rather than all at once.
Just keep that in mind when we refer to the term, “Canary deployment”, it is similar to the traditional definition but codedeploy does it a little differently.
AWS codedeploy supports various options. I won’t go into all of them but let’s review some basic components and terms.
The deployment group is the core of our blue green deployment process. We will refer to it using our appspec.json
file while preparing a new version of our application, and before we start the deployment process.
Once this is setup, codedeploy handles all the heavy lifting of managing traffic shifting between target groups, updating the AWS ECS and rolling back based on metric thresholds.
So, all we really need to do is to define the configuration for this process, and we should be all set!
under the root directory, run this command.
touch appspec.json
This is just a template file. We will be updating it dynamically in our CI/CD workflow so codedeploy will have the right information each time.
{
"Resources": [
{
"TargetService": {
"Type": "AWS::ECS::Service",
"Properties": {
"TaskDefinition": "<TASK_ARN>",
"LoadBalancerInfo": {
"ContainerName": "<CONTAINER_NAME>",
"ContainerPort": "<CONTAINER_PORT>"
}
}
}
}
]
}
Since, we will be integrating a new AWS component into our infrastructure, we will need to ensure the right components in our infrastructure have the right access or permissions.
There are two parts that require IAM permission updates:
Add the required permissions to manage AWS Codedeploy in our CI/CD role.
The custom terraform module allows for appending additional iam into our role by adding the field other_iam_statements
.
## CI/CD user role for managing pipeline for AWS ECR resources
module "ecr_ecs_ci_user" {
source = "github.com/Jareechang/tf-modules//iam/ecr?ref=v1.0.7"
env = var.env
project_id = var.project_id
create_ci_user = true
ecr_resource_arns = [
"arn:aws:ecr:${var.aws_region}:${data.aws_caller_identity.current.account_id}:repository/web/${var.project_id}",
"arn:aws:ecr:${var.aws_region}:${data.aws_caller_identity.current.account_id}:repository/web/${var.project_id}/*"
]
other_iam_statements = {
codedeploy = {
actions = [
"codedeploy:GetDeploymentGroup",
"codedeploy:CreateDeployment",
"codedeploy:GetDeployment",
"codedeploy:GetDeploymentConfig",
"codedeploy:RegisterApplicationRevision"
]
effect = "Allow"
resources = [
"*"
]
}
}
}
data "aws_iam_policy_document" "codedeploy_assume_role" {
version = "2012-10-17"
statement {
effect = "Allow"
actions = ["sts:AssumeRole"]
principals {
type = "Service"
identifiers = [
"codedeploy.amazonaws.com"
]
}
}
}
resource "aws_iam_role" "codedeploy_role" {
name = "CodeDeployRole${var.project_id}"
description = "CodeDeployRole for ${var.project_id} in ${var.env}"
assume_role_policy = data.aws_iam_policy_document.codedeploy_assume_role.json
lifecycle {
create_before_destroy = true
}
}
resource "aws_iam_role_policy_attachment" "codedeploy_policy_attachment" {
policy_arn = "arn:aws:iam::aws:policy/AWSCodeDeployRoleForECS"
role = aws_iam_role.codedeploy_role.name
}
Since we will be performing traffic shifting, we will need two target group for our application load balancer.
💡 Remember:
- Blue = existing infrastructure
- Green = new infrastructure
# Target group for existing infrastructure
module "ecs_tg_blue" {
project_id = "${var.project_id}-blue"
source = "github.com/Jareechang/tf-modules//alb?ref=v1.0.2"
create_target_group = true
port = local.target_port
protocol = "HTTP"
target_type = "ip"
vpc_id = module.networking.vpc_id
}
# Target group for new infrastructure
module "ecs_tg_green" {
project_id = "${var.project_id}-green"
source = "github.com/Jareechang/tf-modules//alb?ref=v1.0.2"
create_target_group = true
port = local.target_port
protocol = "HTTP"
target_type = "ip"
vpc_id = module.networking.vpc_id
}
module "alb" {
source = "github.com/Jareechang/tf-modules//alb?ref=v1.0.2"
create_alb = true
enable_https = false
internal = false
load_balancer_type = "application"
security_groups = [aws_security_group.alb_ecs_sg.id]
subnets = module.networking.public_subnets[*].id
target_group = module.ecs_tg_blue.tg.arn
}
After our change above, we need to update our load balancer arn reference and configure our ECS service to use codedeploy.
resource "aws_ecs_service" "web_service" {
name = "web-service-${var.project_id}-${var.env}"
cluster = aws_ecs_cluster.web_cluster.id
task_definition = aws_ecs_task_definition.nextjs.arn
desired_count = local.ecs_desired_count
launch_type = local.ecs_launch_type
load_balancer {
target_group_arn = module.ecs_tg_blue.tg.arn
container_name = local.ecs_container_name
container_port = local.target_port
}
network_configuration {
subnets = module.networking.private_subnets[*].id
security_groups = [aws_security_group.ecs_sg.id]
}
deployment_controller {
type = "CODE_DEPLOY"
}
tags = {
Name = "web-service-${var.project_id}-${var.env}"
}
depends_on = [
module.alb.lb,
module.ecs_tg_blue.tg
]
}
locals {
# Target port to expose
target_port = 3000
## ECS Service config
ecs_launch_type = "FARGATE"
ecs_desired_count = 2
ecs_network_mode = "awsvpc"
ecs_cpu = 512
ecs_memory = 1024
ecs_container_name = "nextjs-image"
ecs_log_group = "/aws/ecs/${var.project_id}-${var.env}"
# Retention in days
ecs_log_retention = 1
# Deployment Configuration
ecs_deployment_type = "TimeBasedCanary"
## In minutes
ecs_deployment_config_interval = 5
## In percentage
ecs_deployment_config_pct = 25
}
resource "aws_codedeploy_deployment_config" "custom_canary" {
deployment_config_name = "EcsCanary25Percent20Minutes"
compute_platform = "ECS"
traffic_routing_config {
type = local.ecs_deployment_type
time_based_canary {
interval = local.ecs_deployment_config_interval
percentage = local.ecs_deployment_config_pct
}
}
}
Add the basic scaffold of the codedeploy deployment group and codedeploy application infrastructure.
We also have to specify "BLUE_GREEN" for the deployment_type
.
resource "aws_codedeploy_app" "node_app" {
compute_platform = "ECS"
name = "deployment-app-${var.project_id}-${var.env}"
}
resource "aws_codedeploy_deployment_group" "this" {
app_name = aws_codedeploy_app.node_app.name
deployment_config_name = aws_codedeploy_deployment_config.custom_canary.id
deployment_group_name = "deployment-group-${var.project_id}-${var.env}"
service_role_arn = aws_iam_role.codedeploy_role.arn
blue_green_deployment_config {
deployment_ready_option {
action_on_timeout = "CONTINUE_DEPLOYMENT"
}
terminate_blue_instances_on_deployment_success {
action = "TERMINATE"
termination_wait_time_in_minutes = 0
}
}
deployment_style {
deployment_option = "WITH_TRAFFIC_CONTROL"
deployment_type = "BLUE_GREEN"
}
ecs_service {
cluster_name = aws_ecs_cluster.web_cluster_node.name
service_name = aws_ecs_service.web_service.name
}
}
We need to specify the load balancer configuration for our blue green deployment. Remember when we created a “blue“ and “green“ target group ? This is where we need to specify it.
resource "aws_codedeploy_deployment_group" "node_app" {
app_name = aws_codedeploy_app.node_app.name
deployment_config_name = aws_codedeploy_deployment_config.custom_canary.id
deployment_group_name = "deployment-group-${var.project_id}-${var.env}"
service_role_arn = aws_iam_role.codedeploy_role.arn
blue_green_deployment_config {
deployment_ready_option {
action_on_timeout = "CONTINUE_DEPLOYMENT"
}
terminate_blue_instances_on_deployment_success {
action = "TERMINATE"
termination_wait_time_in_minutes = 0
}
}
deployment_style {
deployment_option = "WITH_TRAFFIC_CONTROL"
deployment_type = "BLUE_GREEN"
}
ecs_service {
cluster_name = aws_ecs_cluster.web_cluster_node.name
service_name = aws_ecs_service.web_service.name
}
load_balancer_info {
target_group_pair_info {
prod_traffic_route {
listener_arns = [module.alb.http_listener.arn]
}
target_group {
name = module.ecs_tg_blue.tg.name
}
target_group {
name = module.ecs_tg_green.tg.name
}
}
}
}
As you can see from the diagram of our infrastructure, we will rollback during failures.
These includes the following events:
I have created two terraform modules with alarms for ALB 5xx errors and errors logs:
These are the alarms we will be using but additional alarms can be added too!
## Cloudwatch log errors
module "application_error_alarm" {
source = "github.com/Jareechang/tf-modules//cloudwatch/alarms/application-log-errors?ref=v1.0.12"
evaluation_periods = "2"
threshold = "10"
arn_suffix = module.alb.lb.arn_suffix
project_id = var.project_id
env = var.env
# Keyword to match for this can be changed
pattern = "Error"
log_group_name = aws_cloudwatch_log_group.ecs.name
metric_name = "ApplicationErrorCount"
metric_namespace = "ECS/${var.project_id}-${var.env}"
}
## ALB errors (5xx)
module "http_error_alarm" {
source = "github.com/Jareechang/tf-modules//cloudwatch/alarms/alb-http-errors?ref=v1.0.8"
evaluation_periods = "2"
threshold = "10"
arn_suffix = module.alb.lb.arn_suffix
project_id = var.project_id
}
resource "aws_codedeploy_deployment_group" "node_app" {
app_name = aws_codedeploy_app.node_app.name
deployment_config_name = aws_codedeploy_deployment_config.custom_canary.id
deployment_group_name = "deployment-group-${var.project_id}-${var.env}"
service_role_arn = aws_iam_role.codedeploy_role.arn
auto_rollback_configuration {
enabled = true
events = ["DEPLOYMENT_FAILURE", "DEPLOYMENT_STOP_ON_ALARM"]
}
alarm_configuration {
alarms = [
module.http_error_alarm.name,
module.application_error_alarm.name
]
enabled = true
}
blue_green_deployment_config {
deployment_ready_option {
action_on_timeout = "CONTINUE_DEPLOYMENT"
}
terminate_blue_instances_on_deployment_success {
action = "TERMINATE"
termination_wait_time_in_minutes = 0
}
}
deployment_style {
deployment_option = "WITH_TRAFFIC_CONTROL"
deployment_type = "BLUE_GREEN"
}
ecs_service {
cluster_name = aws_ecs_cluster.web_cluster_node.name
service_name = aws_ecs_service.web_service.name
}
load_balancer_info {
target_group_pair_info {
prod_traffic_route {
listener_arns = [module.alb.http_listener.arn]
}
target_group {
name = module.ecs_tg_blue.tg.name
}
target_group {
name = module.ecs_tg_green.tg.name
}
}
}
}
export AWS_ACCESS_KEY_ID=<your-key>
export AWS_SECRET_ACCESS_KEY=<your-secret>
export AWS_DEFAULT_REGION=us-east-1
terraform init
terraform plan
terraform apply -auto-approve
⚠️ Note: Remember to run terraform destroy -auto-approve after you are done with the module. Unless you wish to keep the infrastructure for personal use.
Since we will now be leveraging codedeploy for our deployment, we need to remove our previous step where we just update directly on the AWS ECS.
# More code...
# Remove this step
- name: Deploy Amazon ECS task definition
uses: aws-actions/amazon-ecs-deploy-task-definition@v1
with:
task-definition: ${{ steps.task-def.outputs.task-definition }}
service: web-service-node-app-prod
cluster: web-cluster-node-app-prod
wait-for-service-stability: true
name: deploy
on:
push:
branches:
- master
- main
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Install & Build
uses: actions/checkout@v2
- run: yarn install --frozen-lockfile
- run: yarn build && yarn install --production --ignore-scripts --prefer-offline
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v1
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: us-east-1
- name: Login to Amazon ECR
id: login-ecr
uses: aws-actions/amazon-ecr-login@v1
- name: Build, tag, and push image to Amazon ECR
id: build-image
env:
ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
ECR_REPOSITORY: web/node-app/nextjs
IMAGE_TAG: ${{ github.sha }}
run: |
docker build -t $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG .
docker push $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG
echo "::set-output name=image::$ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG"
- name: Fill in the new image ID in the Amazon ECS task definition
id: task-def
uses: aws-actions/amazon-ecs-render-task-definition@v1
with:
task-definition: infra/task-definitions/service.latest.json
container-name: nextjs-image
image: ${{ steps.build-image.outputs.image }}
- name: Update App Spec File
run: |
sed -ie "s/<CONTAINER_NAME>/$CONTAINER_NAME/" ./appspec.json
sed -ie "s/<CONTAINER_PORT>/$CONTAINER_PORT/" ./appspec.json
env:
CONTAINER_NAME: nextjs-image
CONTAINER_PORT: 3000
With this custom action, we can add our configuration and it will start our deployment as soon as our CI/CD pipeline runs that step.
name: deploy
on:
push:
branches:
- master
- main
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Install & Build
uses: actions/checkout@v2
- run: yarn install --frozen-lockfile
- run: yarn build && yarn install --production --ignore-scripts --prefer-offline
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v1
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: us-east-1
- name: Login to Amazon ECR
id: login-ecr
uses: aws-actions/amazon-ecr-login@v1
- name: Build, tag, and push image to Amazon ECR
id: build-image
env:
ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
ECR_REPOSITORY: web/node-app/nextjs
IMAGE_TAG: ${{ github.sha }}
run: |
docker build -t $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG .
docker push $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG
echo "::set-output name=image::$ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG"
- name: Fill in the new image ID in the Amazon ECS task definition
id: task-def
uses: aws-actions/amazon-ecs-render-task-definition@v1
with:
task-definition: infra/task-definitions/service.latest.json
container-name: nextjs-image
image: ${{ steps.build-image.outputs.image }}
- name: Update App Spec File
run: |
sed -ie "s/<CONTAINER_NAME>/$CONTAINER_NAME/" ./appspec.json
sed -ie "s/<CONTAINER_PORT>/$CONTAINER_PORT/" ./appspec.json
env:
CONTAINER_NAME: nextjs-image
CONTAINER_PORT: 3000
- name: Deploy Amazon ECS task definition
uses: aws-actions/amazon-ecs-deploy-task-definition@v1
with:
task-definition: ${{ steps.task-def.outputs.task-definition }}
service: web-service-node-app-prod
cluster: web-cluster-node-app-prod
wait-for-service-stability: true
codedeploy-appspec: appspec.json
codedeploy-application: deployment-app-node-app-prod
codedeploy-deployment-group: deployment-group-node-app-prod
⚠️ Important: Ensure the following names matches what you have defined in your terraform files:
If you have changed it, be sure to update it!
- (ecs) task-definition
- (ecs) service
- (ecs) cluster
- (codedeploy) codedeploy-application
- (codedeploy) codedeploy-deployment-group
Like the previous section, let’s add the updated AWS ID and secret to the github “secrets” vault on your respository.
Within your Github repository you are hosting the code, go under “Setting“ > “Secrets” and click “New repository secret”.
Add the following:
Now we are ready to make a change and test out our blue green deployment pipeline.
After you have pushed your changes, verify that the Github actions has no errors and the deployment has started in your AWS console.
If the codedeploy trigger was successful, you should see the following:
As defined, it should start shifting 25% of the traffic to the new version of our application over a period of 5 minutes.
If you’d like a reference of the result, it should be available at building-with-aws-ecs-part-5.
That’s it! We succeeded in creating our CI/CD with blue green deployment and automatic rollback for our next.js.
Congratulations on making it through this long series. I hope you learned a thing or two ;)!
Now we have a fully functional CI/CD and the infrastructure available. I hope to make other posts that will build on top of this infrastructure, things like:
I can’t wait! I am already working the next one. Be sure to subscribe to stay in the loop, and be notified when that is ready!
Then consider signing up to get notified when new content arrives!