AWS ECS technical series part V

Published on: Tue Dec 21 2021

Series

Technical Series

  1. Introducing the AWS ECS technical series
  2. AWS ECS technical series part I
  3. AWS ECS technical series part II
  4. AWS ECS technical series part III
  5. AWS ECS technical series part IV

Goals

In this module, we will start by adding some safety into the deployment with blue green deployment. So, when errors occur or things go wrong, we can reduce the impact and quickly recover from it.

By the end of this module, you should have an infrastructure that performs a blue-green canary deployment with automatic rollback.

Here is what we will be building:

blue green deployment

In order for us to turn our existing infrastructure to blue green deployment we need to do the following:

  1. Refine our infrastructure to support blue-green deployment and add in codedeploy
  2. Add codedeploy role
  3. Add addditional permissions to our CI/CD role for codedeploy
  4. Update Github Actions to trigger codedeploy

Let’s dive right in!

Content

Introduction

Now that we have a fully functional application with CI/CD pipeline, we can start to gradually enhance it.

It is great that we have automatic and continuous deployment. However, it comes with a drawback where if an error occurs, we would observe some downtime. In order to fix this, we would also need to re-deploy the previous version of our application.

This doesn’t sound like a great process, and one that requires many manual steps. I already mentioned that blue-green deployment with canary release is one way solve this problem.

Let’s review some of the concepts to see how this strategy helps and what role codedeploy plays in the whole process before we start building it out.

Review & Theory

Blue green deployment

Why blue green deployment ?

Blue green deployment allows us to have zero downtime deployment by keeping two versions of the application running at the same time (blue & green).

  • Blue = Infrastructure with existing version of application
  • Green = Infrastructure with new version application

The main benefit is that we can leverage a load balancer to route traffic to the “green“ infrastructure (new version) while keeping the ”blue” infrastructure on stand-by (existing version) in the event of unexpected issues.

This allows us to quickly switch between the two versions, and is especially useful in a rollback scenario. The obvious downside of this approach is there are more associated costs with running two infrastructure during the interval.

Typically, you would want to reserve this pattern for zero downtime infrastructure.

In addition, this pattern is more suitable if you are making an incremental change rather than a breaking change like a database changes between version.

Adding a database change further complicates the strategy, and it would require an in-depth analysis to get it right.

Canary deployment

Canary deployment is similar to the blue green where both strategies serves to minimize downtime during deployments. However, rather than shifting all traffic to the new infrastructure, it takes an phased approach.

Canary deployment performs the updates in infrastructure in phases. For example, it’ll update 10% of infrastructure over a period of 15 minutes then update the rest (90%) to the new infrastructure.

with AWS Codedeploy, as you will see very soon, the canary deployment is done via the traffic shifting rather than replacing the infrastructure.

So, we are still running blue green deployment but the traffic is just shifted gradually rather than all at once.

Just keep that in mind when we refer to the term, “Canary deployment”, it is similar to the traditional definition but codedeploy does it a little differently.

📝 Helpful reference:

AWS Codedeploy

AWS codedeploy supports various options. I won’t go into all of them but let’s review some basic components and terms.

  1. Codedeploy Deloyment group - contains the deployment settings (ie type, load balancer settings, metrics settings)
  2. Codedeploy deployment configuration - contains the deployment configuration of how to perform the deployment
    • Can be pre-defined setting, AWS ECS deployment configuration
    • Can be custom settings (If none of the pre-defined settings fit your need, you can create your own)

The deployment group is the core of our blue green deployment process. We will refer to it using our appspec.json file while preparing a new version of our application, and before we start the deployment process.

Once this is setup, codedeploy handles all the heavy lifting of managing traffic shifting between target groups, updating the AWS ECS and rolling back based on metric thresholds.

So, all we really need to do is to define the configuration for this process, and we should be all set!

Adding an appspec.json

1. Create new file

under the root directory, run this command.

touch appspec.json

2. Add the appspec definition template

This is just a template file. We will be updating it dynamically in our CI/CD workflow so codedeploy will have the right information each time.

{
  "Resources": [
    {
      "TargetService": {
        "Type": "AWS::ECS::Service",
        "Properties": {
          "TaskDefinition": "<TASK_ARN>",
          "LoadBalancerInfo": {
            "ContainerName": "<CONTAINER_NAME>",
            "ContainerPort": "<CONTAINER_PORT>"
          }
        }
      }
    }
  ]
}

IAM for AWS Codedeploy

Since, we will be integrating a new AWS component into our infrastructure, we will need to ensure the right components in our infrastructure have the right access or permissions.

There are two parts that require IAM permission updates:

  1. The CI/CD role - For starting the deployment
  2. The service role - For managing the whole deployment process (ie changing ALB, getting alarm metrics, ECS updates)

1. Update CI/CD role permission

Add the required permissions to manage AWS Codedeploy in our CI/CD role.

The custom terraform module allows for appending additional iam into our role by adding the field other_iam_statements .

## CI/CD user role for managing pipeline for AWS ECR resources
module "ecr_ecs_ci_user" {
  source            = "github.com/Jareechang/tf-modules//iam/ecr?ref=v1.0.7"
  env               = var.env
  project_id        = var.project_id
  create_ci_user    = true
  ecr_resource_arns = [
    "arn:aws:ecr:${var.aws_region}:${data.aws_caller_identity.current.account_id}:repository/web/${var.project_id}",
    "arn:aws:ecr:${var.aws_region}:${data.aws_caller_identity.current.account_id}:repository/web/${var.project_id}/*"
  ]

  other_iam_statements = {
    codedeploy = {
      actions = [
        "codedeploy:GetDeploymentGroup",
        "codedeploy:CreateDeployment",
        "codedeploy:GetDeployment",
        "codedeploy:GetDeploymentConfig",
        "codedeploy:RegisterApplicationRevision"
      ]
      effect = "Allow"
      resources = [
        "*"
      ]
    }
  }
}

2. Add the service role


data "aws_iam_policy_document" "codedeploy_assume_role" {
  version = "2012-10-17"
  statement {
    effect  = "Allow"
    actions = ["sts:AssumeRole"]
    principals {
      type        = "Service"
      identifiers = [
        "codedeploy.amazonaws.com"
      ]
    }
  }
}

resource "aws_iam_role" "codedeploy_role" {
  name               = "CodeDeployRole${var.project_id}"
  description        = "CodeDeployRole for ${var.project_id} in ${var.env}"
  assume_role_policy = data.aws_iam_policy_document.codedeploy_assume_role.json
  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_iam_role_policy_attachment" "codedeploy_policy_attachment" {
  policy_arn = "arn:aws:iam::aws:policy/AWSCodeDeployRoleForECS"
  role       = aws_iam_role.codedeploy_role.name
}

📝 Helpful reference:

Configure target groups

1. Update and create target groups

Since we will be performing traffic shifting, we will need two target group for our application load balancer.

💡 Remember:
  • Blue = existing infrastructure
  • Green = new infrastructure
# Target group for existing infrastructure
module "ecs_tg_blue" {
  project_id          = "${var.project_id}-blue"
  source              = "github.com/Jareechang/tf-modules//alb?ref=v1.0.2"
  create_target_group = true
  port                = local.target_port
  protocol            = "HTTP"
  target_type         = "ip"
  vpc_id              = module.networking.vpc_id
}

# Target group for new infrastructure
module "ecs_tg_green" {
  project_id          = "${var.project_id}-green"
  source              = "github.com/Jareechang/tf-modules//alb?ref=v1.0.2"
  create_target_group = true
  port                = local.target_port
  protocol            = "HTTP"
  target_type         = "ip"
  vpc_id              = module.networking.vpc_id
}

module "alb" {
  source              = "github.com/Jareechang/tf-modules//alb?ref=v1.0.2"
  create_alb         = true
  enable_https       = false
  internal           = false
  load_balancer_type = "application"
  security_groups    = [aws_security_group.alb_ecs_sg.id]
  subnets            = module.networking.public_subnets[*].id
  target_group       = module.ecs_tg_blue.tg.arn
}

2. Configure AWS ECS service

After our change above, we need to update our load balancer arn reference and configure our ECS service to use codedeploy.

resource "aws_ecs_service" "web_service" {
  name            = "web-service-${var.project_id}-${var.env}"
  cluster         = aws_ecs_cluster.web_cluster.id
  task_definition = aws_ecs_task_definition.nextjs.arn
  desired_count   = local.ecs_desired_count
  launch_type     = local.ecs_launch_type

  load_balancer {
    target_group_arn = module.ecs_tg_blue.tg.arn
    container_name   = local.ecs_container_name
    container_port   = local.target_port
  }

  network_configuration {
    subnets         = module.networking.private_subnets[*].id
    security_groups = [aws_security_group.ecs_sg.id]
  }

  deployment_controller {
    type = "CODE_DEPLOY"
  }

  tags = {
    Name = "web-service-${var.project_id}-${var.env}"
  }

  depends_on = [
    module.alb.lb,
    module.ecs_tg_blue.tg
  ]
}

Add the AWS Codedeploy infrastructure

1. Create the custom deployment configuration

locals {
  # Target port to expose
  target_port = 3000

  ## ECS Service config
  ecs_launch_type = "FARGATE"
  ecs_desired_count = 2
  ecs_network_mode = "awsvpc"
  ecs_cpu = 512
  ecs_memory = 1024
  ecs_container_name = "nextjs-image"
  ecs_log_group = "/aws/ecs/${var.project_id}-${var.env}"
  # Retention in days
  ecs_log_retention = 1

  # Deployment Configuration
  ecs_deployment_type = "TimeBasedCanary"
  ## In minutes
  ecs_deployment_config_interval = 5
  ## In percentage 
  ecs_deployment_config_pct = 25
}

resource "aws_codedeploy_deployment_config" "custom_canary" {
  deployment_config_name = "EcsCanary25Percent20Minutes"
  compute_platform       = "ECS"
  traffic_routing_config {
    type = local.ecs_deployment_type 
    time_based_canary {
      interval   = local.ecs_deployment_config_interval
      percentage = local.ecs_deployment_config_pct
    }
  }
}

📝 Helpful reference:

2. Create the deployment group scaffold

Add the basic scaffold of the codedeploy deployment group and codedeploy application infrastructure.

We also have to specify "BLUE_GREEN" for the deployment_type .

resource "aws_codedeploy_app" "node_app" {
  compute_platform = "ECS"
  name             = "deployment-app-${var.project_id}-${var.env}"
}

resource "aws_codedeploy_deployment_group" "this" {
  app_name               = aws_codedeploy_app.node_app.name
  deployment_config_name = aws_codedeploy_deployment_config.custom_canary.id
  deployment_group_name  = "deployment-group-${var.project_id}-${var.env}"
  service_role_arn       = aws_iam_role.codedeploy_role.arn

  blue_green_deployment_config {
    deployment_ready_option {
      action_on_timeout = "CONTINUE_DEPLOYMENT"
    }

    terminate_blue_instances_on_deployment_success {
      action                           = "TERMINATE"
      termination_wait_time_in_minutes = 0
    }
  }

  deployment_style {
    deployment_option = "WITH_TRAFFIC_CONTROL"
    deployment_type   = "BLUE_GREEN"
  }

  ecs_service {
    cluster_name = aws_ecs_cluster.web_cluster_node.name
    service_name = aws_ecs_service.web_service.name
  }
}

📝 Helpful reference:

3. Add Load balancer information

We need to specify the load balancer configuration for our blue green deployment. Remember when we created a “blue“ and “green“ target group ? This is where we need to specify it.

resource "aws_codedeploy_deployment_group" "node_app" {
  app_name               = aws_codedeploy_app.node_app.name
  deployment_config_name = aws_codedeploy_deployment_config.custom_canary.id
  deployment_group_name  = "deployment-group-${var.project_id}-${var.env}"
  service_role_arn       = aws_iam_role.codedeploy_role.arn

  blue_green_deployment_config {
    deployment_ready_option {
      action_on_timeout = "CONTINUE_DEPLOYMENT"
    }

    terminate_blue_instances_on_deployment_success {
      action                           = "TERMINATE"
      termination_wait_time_in_minutes = 0
    }
  }

  deployment_style {
    deployment_option = "WITH_TRAFFIC_CONTROL"
    deployment_type   = "BLUE_GREEN"
  }

  ecs_service {
    cluster_name = aws_ecs_cluster.web_cluster_node.name
    service_name = aws_ecs_service.web_service.name
  }

  load_balancer_info {
    target_group_pair_info {
      prod_traffic_route {
        listener_arns = [module.alb.http_listener.arn]
      }

      target_group {
        name = module.ecs_tg_blue.tg.name
      }

      target_group {
        name = module.ecs_tg_green.tg.name
      }
    }
  }
}

4. Add automatic rollback configuration

As you can see from the diagram of our infrastructure, we will rollback during failures.

These includes the following events:

  • Deployment failure ("DEPLOYMENT_FAILURE")
  • Alarm threshold crossed ("DEPLOYMENT_STOP_ON_ALARM")

I have created two terraform modules with alarms for ALB 5xx errors and errors logs:

These are the alarms we will be using but additional alarms can be added too!

## Cloudwatch log errors
module "application_error_alarm" {
  source             = "github.com/Jareechang/tf-modules//cloudwatch/alarms/application-log-errors?ref=v1.0.12"
  evaluation_periods = "2"
  threshold          = "10"
  arn_suffix         = module.alb.lb.arn_suffix
  project_id         = var.project_id
  env                = var.env
  # Keyword to match for this can be changed
  pattern            = "Error"
  log_group_name     = aws_cloudwatch_log_group.ecs.name
  metric_name        = "ApplicationErrorCount"
  metric_namespace   = "ECS/${var.project_id}-${var.env}"
}

## ALB errors (5xx)
module "http_error_alarm" {
  source             = "github.com/Jareechang/tf-modules//cloudwatch/alarms/alb-http-errors?ref=v1.0.8"
  evaluation_periods = "2"
  threshold          = "10"
  arn_suffix         = module.alb.lb.arn_suffix
  project_id         = var.project_id
}

resource "aws_codedeploy_deployment_group" "node_app" {
  app_name               = aws_codedeploy_app.node_app.name
  deployment_config_name = aws_codedeploy_deployment_config.custom_canary.id
  deployment_group_name  = "deployment-group-${var.project_id}-${var.env}"
  service_role_arn       = aws_iam_role.codedeploy_role.arn

  auto_rollback_configuration {
    enabled = true
    events  = ["DEPLOYMENT_FAILURE", "DEPLOYMENT_STOP_ON_ALARM"]
  }

  alarm_configuration {
    alarms  = [
      module.http_error_alarm.name,
      module.application_error_alarm.name
    ]
    enabled = true
  }

  blue_green_deployment_config {
    deployment_ready_option {
      action_on_timeout = "CONTINUE_DEPLOYMENT"
    }

    terminate_blue_instances_on_deployment_success {
      action                           = "TERMINATE"
      termination_wait_time_in_minutes = 0
    }
  }

  deployment_style {
    deployment_option = "WITH_TRAFFIC_CONTROL"
    deployment_type   = "BLUE_GREEN"
  }

  ecs_service {
    cluster_name = aws_ecs_cluster.web_cluster_node.name
    service_name = aws_ecs_service.web_service.name
  }

  load_balancer_info {
    target_group_pair_info {
      prod_traffic_route {
        listener_arns = [module.alb.http_listener.arn]
      }

      target_group {
        name = module.ecs_tg_blue.tg.name
      }

      target_group {
        name = module.ecs_tg_green.tg.name
      }
    }
  }
}

5. Apply the infrastructure

export AWS_ACCESS_KEY_ID=<your-key>
export AWS_SECRET_ACCESS_KEY=<your-secret>
export AWS_DEFAULT_REGION=us-east-1

terraform init
terraform plan
terraform apply -auto-approve
⚠️ Note: Remember to run terraform destroy -auto-approve after you are done with the module. Unless you wish to keep the infrastructure for personal use.

Update github actions

1. Remove previous ECS update step

Since we will now be leveraging codedeploy for our deployment, we need to remove our previous step where we just update directly on the AWS ECS.

  # More code...

  # Remove this step
  - name: Deploy Amazon ECS task definition
    uses: aws-actions/amazon-ecs-deploy-task-definition@v1
    with:
        task-definition: ${{ steps.task-def.outputs.task-definition }}
        service: web-service-node-app-prod
        cluster: web-cluster-node-app-prod
        wait-for-service-stability: true

2. Add step to update appspec.json file

name: deploy

on:
  push:
    branches:
      - master
      - main

jobs:
  build:
    runs-on: ubuntu-latest

    steps:
      - name: Install & Build
        uses: actions/checkout@v2
      - run: yarn install --frozen-lockfile
      - run: yarn build && yarn install --production --ignore-scripts --prefer-offline

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v1
        with:
            aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
            aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
            aws-region: us-east-1

      - name: Login to Amazon ECR
        id: login-ecr
        uses: aws-actions/amazon-ecr-login@v1

      - name: Build, tag, and push image to Amazon ECR
        id: build-image
        env:
            ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
            ECR_REPOSITORY: web/node-app/nextjs
            IMAGE_TAG: ${{ github.sha }}
        run: |
            docker build -t $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG .
            docker push $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG
            echo "::set-output name=image::$ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG"

      - name: Fill in the new image ID in the Amazon ECS task definition
        id: task-def
        uses: aws-actions/amazon-ecs-render-task-definition@v1
        with:
            task-definition: infra/task-definitions/service.latest.json
            container-name: nextjs-image
            image: ${{ steps.build-image.outputs.image }}

      - name: Update App Spec File
        run: |
          sed -ie "s/<CONTAINER_NAME>/$CONTAINER_NAME/" ./appspec.json
          sed -ie "s/<CONTAINER_PORT>/$CONTAINER_PORT/" ./appspec.json
        env:
          CONTAINER_NAME: nextjs-image
          CONTAINER_PORT: 3000

3. Add codedeploy trigger step

With this custom action, we can add our configuration and it will start our deployment as soon as our CI/CD pipeline runs that step.

name: deploy

on:
  push:
    branches:
      - master
      - main

jobs:
  build:
    runs-on: ubuntu-latest

    steps:
      - name: Install & Build
        uses: actions/checkout@v2
      - run: yarn install --frozen-lockfile
      - run: yarn build && yarn install --production --ignore-scripts --prefer-offline

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v1
        with:
            aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
            aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
            aws-region: us-east-1

      - name: Login to Amazon ECR
        id: login-ecr
        uses: aws-actions/amazon-ecr-login@v1

      - name: Build, tag, and push image to Amazon ECR
        id: build-image
        env:
            ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
            ECR_REPOSITORY: web/node-app/nextjs
            IMAGE_TAG: ${{ github.sha }}
        run: |
            docker build -t $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG .
            docker push $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG
            echo "::set-output name=image::$ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG"

      - name: Fill in the new image ID in the Amazon ECS task definition
        id: task-def
        uses: aws-actions/amazon-ecs-render-task-definition@v1
        with:
            task-definition: infra/task-definitions/service.latest.json
            container-name: nextjs-image
            image: ${{ steps.build-image.outputs.image }}

      - name: Update App Spec File
        run: |
          sed -ie "s/<CONTAINER_NAME>/$CONTAINER_NAME/" ./appspec.json
          sed -ie "s/<CONTAINER_PORT>/$CONTAINER_PORT/" ./appspec.json
        env:
          CONTAINER_NAME: nextjs-image
          CONTAINER_PORT: 3000

      - name: Deploy Amazon ECS task definition
        uses: aws-actions/amazon-ecs-deploy-task-definition@v1
        with:
          task-definition: ${{ steps.task-def.outputs.task-definition }}
          service: web-service-node-app-prod
          cluster: web-cluster-node-app-prod
          wait-for-service-stability: true
          codedeploy-appspec: appspec.json
          codedeploy-application: deployment-app-node-app-prod
          codedeploy-deployment-group: deployment-group-node-app-prod

⚠️ Important: Ensure the following names matches what you have defined in your terraform files:

  • (ecs) task-definition
  • (ecs) service
  • (ecs) cluster
  • (codedeploy) codedeploy-application
  • (codedeploy) codedeploy-deployment-group
If you have changed it, be sure to update it!

📝 Helpful reference:

Final steps

Like the previous section, let’s add the updated AWS ID and secret to the github “secrets” vault on your respository.

Within your Github repository you are hosting the code, go under “Setting“ > “Secrets” and click “New repository secret”.

Add the following:

  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY
github actions secrets tab

Now we are ready to make a change and test out our blue green deployment pipeline.

After you have pushed your changes, verify that the Github actions has no errors and the deployment has started in your AWS console.

If the codedeploy trigger was successful, you should see the following:

image in aws console showing codedeploy traffic shifting start

As defined, it should start shifting 25% of the traffic to the new version of our application over a period of 5 minutes.

If you’d like a reference of the result, it should be available at building-with-aws-ecs-part-5.

Conclusion

That’s it! We succeeded in creating our CI/CD with blue green deployment and automatic rollback for our next.js.

Congratulations on making it through this long series. I hope you learned a thing or two ;)!

Now we have a fully functional CI/CD and the infrastructure available. I hope to make other posts that will build on top of this infrastructure, things like:

  • Adding a database for our next.js application
  • Security hardening wtih WAF
  • Cloudfront gateway in front of ALB
  • Background tasks integration with Lambda and SQS
  • Strategies for distributed tracing and logging

I can’t wait! I am already working the next one. Be sure to subscribe to stay in the loop, and be notified when that is ready!


Enjoy the content ?

Then consider signing up to get notified when new content arrives!

Jerry Chang 2022. All rights reserved.