Practical Guide: Tracing and Logging for Node.js using AWS Lambda

Published on: Sat Aug 06 2022

Series

Goals

In this module, we will use this starter template - aws-webhook-series-part-4 as a starting point.

By the end of this module, you should:

  • ✅ Become familiar with the AWS tools used for tracing and logging

  • ✅ Be able to setup tracing using AWS X-Ray in a Node.js environment

  • ✅ Understand and know how to handle logging in a Node.js environment

Content

Introduction

This is a follow-up guide to the post - A pocket guide to service health and performance for AWS API gateway, Lambda, SQS.

While the other guide is more theory based, this one will be more hands on!

In this guide, we will be setting up tracing using AWS X-ray, then move onto setting up our logger so we can relate each requests to the logs using Node.js’s AsyncLocalStorage.

Even though the implementations here are very specific to our use case (for the webhook technical series), they can be re-worked to fit other scenarios too!

Nevertheless, let’s dive right in to set this up!

Tracing

AWS offers a tool called AWS X-ray specifically for instrumenting tracing within your code.

It is provided via a sdk that you can use, and configure based on your needs.

The benefit of having tracing information available is it gives us insights into specific requests, and the performance of the sub-systems involved in their lifecycle.

Having that information available, allows us to identify bottlenecks and performance limits within our infrastructure.

Let’s take a look at how to set that up.

1. Update Lambda infrastructure configurations

The first step in setting up our infrastructure is we need to configure the permissions for AWS X-Ray and runtime version (and for logging).

// infra/main.tf

module "lambda_ingestion" {
  source               = "./modules/lambda"
  code_src             = "../functions/ingestion/main.zip"
  bucket_id            = aws_s3_bucket.lambda_bucket.id
  timeout              = local.default_lambda_timeout
  function_name        = "Ingestion-function"
  runtime              = "nodejs16.x"
  handler              = "dist/index.handler"
  publish              = true
  alias_name           = "ingestion-dev"
  alias_description    = "Alias for ingestion function"
  iam_statements       = {
    sqs = {
      actions = [
        "sqs:GetQueueAttributes",
        "sqs:GetQueueUrl",
        "sqs:SendMessage",
        "sqs:ReceiveMessage",
      ]
      effect = "Allow"
      resources = [
        aws_sqs_queue.ingest_queue.arn
      ]
    }
    xray = {
      actions = [
        "xray:PutTraceSegments",
        "xray:PutTelemetryRecords",
        "xray:GetSamplingRules",
        "xray:GetSamplingTargets",
        "xray:GetSamplingStatisticSummaries"
      ]
      effect = "Allow"
      resources = [
        "*"
      ]
    }
  }
  environment_vars = {
    QueueUrl                 = aws_sqs_queue.ingest_queue.id
    DefaultRegion            = var.aws_region
    WebhookSignatureSecret   = var.webhook_secret
  }
}

module "lambda_process_queue" {
  source               = "./modules/lambda"
  code_src             = "../functions/process-queue/main.zip"
  bucket_id            = aws_s3_bucket.lambda_bucket.id
  timeout              = local.default_lambda_timeout
  function_name        = "Process-Queue-function"
  runtime              = "nodejs16.x"
  handler              = "dist/index.handler"
  publish              = true
  alias_name           = "process-queue-dev"
  alias_description    = "Alias for ingestion function"
  iam_statements       = {
    sqs = {
      actions = [
        "sqs:GetQueueAttributes",
        "sqs:GetQueueUrl",
        "sqs:ReceiveMessage",
        "sqs:DeleteMessage",
      ]
      effect = "Allow"
      resources = [
        aws_sqs_queue.ingest_queue.arn
      ]
    }
    dlq_sqs = {
      actions = [
        "sqs:GetQueueAttributes",
        "sqs:GetQueueUrl",
        "sqs:SendMessage",
        "sqs:ReceiveMessage",
      ]
      effect = "Allow"
      resources = [
        aws_sqs_queue.ingest_dlq.arn
      ]
    }
    xray = {
      actions = [
        "xray:PutTraceSegments",
        "xray:PutTelemetryRecords",
        "xray:GetSamplingRules",
        "xray:GetSamplingTargets",
        "xray:GetSamplingStatisticSummaries"
      ]
      effect = "Allow"
      resources = [
        "*"
      ]
    }
  }

  environment_vars = {
    DefaultRegion   = var.aws_region
  }
}

2. Enable Active tracing for AWS Lambda

Update tracing configuration for our common lambda module.

// infra/module/lambda/main.tf

resource "aws_lambda_function" "this" {
  function_name    = var.function_name
  s3_bucket        = var.bucket_id
  s3_key           = aws_s3_bucket_object.lambda_default.key
  runtime          = var.runtime
  handler          = var.handler
  publish          = var.publish
  source_code_hash = "${filebase64sha256(var.code_src)}"
  role             = aws_iam_role.lambda_exec.arn
  dynamic "environment" {
    for_each = length(keys(var.environment_vars)) == 0 ? [] : [true]
    content {
      variables = var.environment_vars
    }
  }
  tracing_config {
    mode = "Active"
  }
}

3. Install the tracing sdk - aws-xray-sdk

Under functions/ingestion , run the following:

pnpm add aws-xray-sdk
Note: This will be automatically be added into the package as part of the bundling process.

4. Capture tracing using the request client

// src/services/sqs-service.ts

import AWS from 'aws-sdk';
import { APIGatewayProxyEvent } from 'aws-lambda';
import { getSqsMessage } from '@app/utils';
import config from '@app/config';
import { AwsSqsServiceError } from '@app/errors';
import AWSXRay from 'aws-xray-sdk';

export const sqs = new AWS.SQS({
  apiVersion: '2012-11-05',
  region: config.queue.sqs.defaultRegion,
});

// Capture the client
AWSXRay.captureAWSClient(sqs);

// -- code omitted --

5. Update Tests

Add Mock for the aws-xray-sdk client:

  • src/index.test.ts
  • src/sevices/__tests__/sqs-service.test.ts
jest.mock('aws-xray-sdk', () => ({
  __esModule: true,
  default: {
    captureAWSClient: jest.fn(),
  }
}));

Logging

For our infrastructure in the webhook technical series, the logs available are:

  • API gateway logs
  • AWS Lambda logs (Ingestion & Process-Queue functions)

These logs give insights into events of the sub-systems in each request.

Logging is an invaluable resource in debugging issues and error tracking.

However, we must ensure that we set it up in a way that make it easy to relate the requests to the logs.

This is where Transaction ID or correlation ID comes into play (If you like to learn more refer to the references below).

Node.js actually offers a class using async_hooks just for use case, called AsyncLocalStorage .

It is very similar to ThreadLocal provided in other programming languages, async_hooks defines lifecycle methods that can be used to track your async resources.

Which makes it well suited for propagating context in a distributed environment.

Let’s see how to set this up in in our AWS Lambda function.

Node.js’s Async Resource lifecycle:

Node.js async resource lifecycle

Node.js’s Available Async hook:

Node.js available async hooks

Using Node.js’s AsyncLocalStorage for context propogation:

Node.js AsyncLocalStorage used for logging
Note: AsyncLocalStorage is only available in v13.10.0, v12.17.0 and above so make sure the AWS Lambda runtime is using the right version.

Also, be mindful that this feature is still experimental.

Let’s take a look at how to set this up!

1. Setup util for AsyncLocalStorage

Using Node.js’s AsyncLocalStorage we can capture contextual information, and propagate that across the lifecycle of our async resource.

Let’s create an utility for doing that.

Create a new file:

touch utils/async-local-storage.ts

Add the utility:

// utils/async-local-storage.ts

import { AsyncLocalStorage } from 'async_hooks';

const asyncLocalStorage: any = new AsyncLocalStorage();

export default asyncLocalStorage;

export function captureRequestContext(event) {
  const context = new Map();
  context.set('awsRequestId', event?.requestContext?.requestId);
  asyncLocalStorage.enterWith(context);
}

2. Use the captureRequestContext util

We will comment out verifySignature for now just so we can test this out without needing to verify the signature.

// src/index.ts

import {
  APIGatewayProxyEvent,
  APIGatewayProxyResult
} from 'aws-lambda';

import { sendMessage } from '@app/services/sqs-service';
//import {
  //verifySignature,
//} from '@app/utils';

import { captureRequestContext } from '@app/utils/async-local-storage';

export const handler = async(
  event: APIGatewayProxyEvent
): Promise<APIGatewayProxyResult> => {
  captureRequestContext(event);
  console.log('Event : ', JSON.stringify({
    event,
  }, null, 4));
  // 1. Verify Signature
  //verifySignature(event);

  const messageId: string = await sendMessage(event);
  // 2. Add to Queue

  // 3. Error handling (final touch)

  // 4. Response
  return {
    statusCode: 200,
    body: JSON.stringify({
      messageId,
      message: 'success'
    }),
  }
}

3. Add awsRequestId to pino logger

Having this context information available, we can configure our logger so it adds it to our logs every time it is used.

import pino from 'pino';

import config from '@app/config';

import asyncLocalStorage from '@app/utils/async-local-storage';

const logger = pino({
  enabled: config.log.enabled,
  level: config.log.level,
  redact: ['key', 'password', 'secret'].concat(config.log.redact ?? []),
  mixin() {
    const requestId = asyncLocalStorage.getStore().get('awsRequestId');
    return {
      requestId,
    };
  }
});

export default logger;

4. Add log points to sqs-service

// services/sqs-service.ts

import AWS from 'aws-sdk';
import { APIGatewayProxyEvent } from 'aws-lambda';
import { getSqsMessage } from '@app/utils';
import config from '@app/config';
import logger from '@app/services/logger';
import { AwsSqsServiceError } from '@app/errors';
import AWSXRay from 'aws-xray-sdk';

export const sqs = new AWS.SQS({
  apiVersion: '2012-11-05',
  region: config.queue.sqs.defaultRegion,
});

// Capture the client
AWSXRay.captureAWSClient(sqs);

/*
 * adds an event message to the queue
 *
 * **/
export async function sendMessage(
  event: APIGatewayProxyEvent
): Promise<string> {
  const message: AWS.SQS.Types.SendMessageRequest = getSqsMessage(event);
  logger.info('From service/sqs-service#sendMessage');
  try {
    const result: AWS.SQS.Types.SendMessageResult = (
      await sqs.sendMessage(message).promise());
    return result?.MessageId ?? '';
  } catch (error: any) {
    throw new AwsSqsServiceError('failed to sendMessage')
      .setOperation('services/sqs-service:sendMessage')
      .setContext({
        error,
        message,
      });
  }
}

5. Add log point to get-sqs-message util

To see the context propagation in action, let’s add another log point into the code path.

import config from '@app/config';
import logger from '@app/services/logger';

export default function getSqsMessage(
  event: APIGatewayProxyEvent,
): AWS.SQS.Types.SendMessageRequest {
  logger.info('From utils/get-sqs-message');
  return {
    MessageAttributes: {
      requestId: {
        DataType: 'String',
        StringValue: event?.requestContext?.requestId ?? '',
      }
    },
    MessageBody: event?.body ?? '',
    QueueUrl: config.queue.sqs.url ?? '',
  };
}

6. Update tests

Making these changes will break some tests because we are missing the mock function for the logger.

Add the logger mocks to the following files:

  • src/index.test.ts
  • src/utils/__tests__/get-sqs-message.test.ts
  • src/services/__tests__/sqs-service.test.ts
jest.mock('@app/services/logger', () => ({
  __esModule: true,
  default: {
    info: jest.fn(),
  }
}));

7. Add Environment variables to the AWS Lambda - Ingestion

Let’s configure to enable our logger and the log level through the environment variable.

For more configurations, see src/config.ts .

module "lambda_ingestion" {
  source               = "./modules/lambda"
  code_src             = "../functions/ingestion/main.zip"
  bucket_id            = aws_s3_bucket.lambda_bucket.id
  timeout              = local.default_lambda_timeout
  function_name        = "Ingestion-function"
  runtime              = "nodejs16.x"
  handler              = "dist/index.handler"
  publish              = true
  alias_name           = "ingestion-dev"
  alias_description    = "Alias for ingestion function"
  iam_statements       = {
    sqs = {
      actions = [
        "sqs:GetQueueAttributes",
        "sqs:GetQueueUrl",
        "sqs:SendMessage",
        "sqs:ReceiveMessage",
      ]
      effect = "Allow"
      resources = [
        aws_sqs_queue.ingest_queue.arn
      ]
    }
    xray = {
      actions = [
        "xray:PutTraceSegments",
        "xray:PutTelemetryRecords",
        "xray:GetSamplingRules",
        "xray:GetSamplingTargets",
        "xray:GetSamplingStatisticSummaries"
      ]
      effect = "Allow"
      resources = [
        "*"
      ]
    }
  }
  environment_vars = {
    QueueUrl                 = aws_sqs_queue.ingest_queue.id
    DefaultRegion            = var.aws_region
    WebhookSignatureSecret   = var.webhook_secret
    LOG_LEVEL                = "info"
    LOG_ENABLED              = true
  }
}
⚠️ Note: All the environment variables are converted to strings so becareful when dealing with other data types

8. Apply the infrastructure

let’s use terraform to apply the CI/CD infrastructure.

// This will re-generate the assets
pnpm run generate-assets --filter @function/*

export AWS_ACCESS_KEY_ID=<your-key>
export AWS_SECRET_ACCESS_KEY=<your-secret>
export AWS_DEFAULT_REGION=us-east-1

terraform init
terraform plan
terraform apply -auto-approve

Test it out

When the infrastructure is ready, make an API call to the endpoint.

In the response, under the headers, you should see a value under apigw-requestid: , this is transaction id that will be propogated throughout the code path, and added as a context to our logger.

Let’s try to search for it under CloudWatch Log Insights to see the events of a request.

Example - Searching log events by a specific apigw-requestid :

Webhook Service Transaction Id

📝 Helpful reference:

Conclusion

That’s it!

Now we have tracing and logging available for our infrastructure!

A recap:

  • Use tracing when you want to optimize performance (see the specific performance traces of each sub-systems)

  • Use logging when you want to debug issues or errors in each request by using the logged events

  • Node.js’s AsyncLocalStorage allow us to propagate the transaction IDs in the async resource to be able to relate our requests to the logs

I hope this was helpful! If you did, please share this article with a friend or co-worker 🙏!

If you’d like a reference of the finished result, it is available at aws-webhook-series-tracing-and-logging.


Enjoy the content ?

Then consider signing up to get notified when new content arrives!

Jerry Chang 2022. All rights reserved.