Published on: Sat Aug 06 2022
In this module, we will use this starter template - aws-webhook-series-part-4 as a starting point.
By the end of this module, you should:
✅ Become familiar with the AWS tools used for tracing and logging
✅ Be able to setup tracing using AWS X-Ray in a Node.js environment
✅ Understand and know how to handle logging in a Node.js environment
This is a follow-up guide to the post - A pocket guide to service health and performance for AWS API gateway, Lambda, SQS.
While the other guide is more theory based, this one will be more hands on!
In this guide, we will be setting up tracing using AWS X-ray, then move onto setting up our logger so we can relate each requests to the logs using Node.js’s AsyncLocalStorage.
Even though the implementations here are very specific to our use case (for the webhook technical series), they can be re-worked to fit other scenarios too!
Nevertheless, let’s dive right in to set this up!
AWS offers a tool called AWS X-ray specifically for instrumenting tracing within your code.
It is provided via a sdk that you can use, and configure based on your needs.
The benefit of having tracing information available is it gives us insights into specific requests, and the performance of the sub-systems involved in their lifecycle.
Having that information available, allows us to identify bottlenecks and performance limits within our infrastructure.
Let’s take a look at how to set that up.
The first step in setting up our infrastructure is we need to configure the permissions for AWS X-Ray and runtime version (and for logging).
// infra/main.tf
module "lambda_ingestion" {
source = "./modules/lambda"
code_src = "../functions/ingestion/main.zip"
bucket_id = aws_s3_bucket.lambda_bucket.id
timeout = local.default_lambda_timeout
function_name = "Ingestion-function"
runtime = "nodejs16.x"
handler = "dist/index.handler"
publish = true
alias_name = "ingestion-dev"
alias_description = "Alias for ingestion function"
iam_statements = {
sqs = {
actions = [
"sqs:GetQueueAttributes",
"sqs:GetQueueUrl",
"sqs:SendMessage",
"sqs:ReceiveMessage",
]
effect = "Allow"
resources = [
aws_sqs_queue.ingest_queue.arn
]
}
xray = {
actions = [
"xray:PutTraceSegments",
"xray:PutTelemetryRecords",
"xray:GetSamplingRules",
"xray:GetSamplingTargets",
"xray:GetSamplingStatisticSummaries"
]
effect = "Allow"
resources = [
"*"
]
}
}
environment_vars = {
QueueUrl = aws_sqs_queue.ingest_queue.id
DefaultRegion = var.aws_region
WebhookSignatureSecret = var.webhook_secret
}
}
module "lambda_process_queue" {
source = "./modules/lambda"
code_src = "../functions/process-queue/main.zip"
bucket_id = aws_s3_bucket.lambda_bucket.id
timeout = local.default_lambda_timeout
function_name = "Process-Queue-function"
runtime = "nodejs16.x"
handler = "dist/index.handler"
publish = true
alias_name = "process-queue-dev"
alias_description = "Alias for ingestion function"
iam_statements = {
sqs = {
actions = [
"sqs:GetQueueAttributes",
"sqs:GetQueueUrl",
"sqs:ReceiveMessage",
"sqs:DeleteMessage",
]
effect = "Allow"
resources = [
aws_sqs_queue.ingest_queue.arn
]
}
dlq_sqs = {
actions = [
"sqs:GetQueueAttributes",
"sqs:GetQueueUrl",
"sqs:SendMessage",
"sqs:ReceiveMessage",
]
effect = "Allow"
resources = [
aws_sqs_queue.ingest_dlq.arn
]
}
xray = {
actions = [
"xray:PutTraceSegments",
"xray:PutTelemetryRecords",
"xray:GetSamplingRules",
"xray:GetSamplingTargets",
"xray:GetSamplingStatisticSummaries"
]
effect = "Allow"
resources = [
"*"
]
}
}
environment_vars = {
DefaultRegion = var.aws_region
}
}
Update tracing configuration for our common lambda module.
// infra/module/lambda/main.tf
resource "aws_lambda_function" "this" {
function_name = var.function_name
s3_bucket = var.bucket_id
s3_key = aws_s3_bucket_object.lambda_default.key
runtime = var.runtime
handler = var.handler
publish = var.publish
source_code_hash = "${filebase64sha256(var.code_src)}"
role = aws_iam_role.lambda_exec.arn
dynamic "environment" {
for_each = length(keys(var.environment_vars)) == 0 ? [] : [true]
content {
variables = var.environment_vars
}
}
tracing_config {
mode = "Active"
}
}
Under functions/ingestion
, run the following:
pnpm add aws-xray-sdk
Note: This will be automatically be added into the package as part of the bundling process.
// src/services/sqs-service.ts
import AWS from 'aws-sdk';
import { APIGatewayProxyEvent } from 'aws-lambda';
import { getSqsMessage } from '@app/utils';
import config from '@app/config';
import { AwsSqsServiceError } from '@app/errors';
import AWSXRay from 'aws-xray-sdk';
export const sqs = new AWS.SQS({
apiVersion: '2012-11-05',
region: config.queue.sqs.defaultRegion,
});
// Capture the client
AWSXRay.captureAWSClient(sqs);
// -- code omitted --
Add Mock for the aws-xray-sdk client:
src/index.test.ts
src/sevices/__tests__/sqs-service.test.ts
jest.mock('aws-xray-sdk', () => ({
__esModule: true,
default: {
captureAWSClient: jest.fn(),
}
}));
For our infrastructure in the webhook technical series, the logs available are:
These logs give insights into events of the sub-systems in each request.
Logging is an invaluable resource in debugging issues and error tracking.
However, we must ensure that we set it up in a way that make it easy to relate the requests to the logs.
This is where Transaction ID or correlation ID comes into play (If you like to learn more refer to the references below).
Node.js actually offers a class using async_hooks
just for use case, called AsyncLocalStorage
.
It is very similar to ThreadLocal
provided in other programming languages, async_hooks
defines lifecycle methods that can be used to track your async resources.
Which makes it well suited for propagating context in a distributed environment.
Let’s see how to set this up in in our AWS Lambda function.
Node.js’s Async Resource lifecycle:
Node.js’s Available Async hook:
Using Node.js’s AsyncLocalStorage for context propogation:
Note: AsyncLocalStorage is only available in v13.10.0, v12.17.0 and above so make sure the AWS Lambda runtime is using the right version.Also, be mindful that this feature is still experimental.
Let’s take a look at how to set this up!
Using Node.js’s AsyncLocalStorage
we can capture contextual information, and propagate that across the lifecycle of our async resource.
Let’s create an utility for doing that.
Create a new file:
touch utils/async-local-storage.ts
Add the utility:
// utils/async-local-storage.ts
import { AsyncLocalStorage } from 'async_hooks';
const asyncLocalStorage: any = new AsyncLocalStorage();
export default asyncLocalStorage;
export function captureRequestContext(event) {
const context = new Map();
context.set('awsRequestId', event?.requestContext?.requestId);
asyncLocalStorage.enterWith(context);
}
We will comment out verifySignature
for now just so we can test this out without needing to verify the signature.
// src/index.ts
import {
APIGatewayProxyEvent,
APIGatewayProxyResult
} from 'aws-lambda';
import { sendMessage } from '@app/services/sqs-service';
//import {
//verifySignature,
//} from '@app/utils';
import { captureRequestContext } from '@app/utils/async-local-storage';
export const handler = async(
event: APIGatewayProxyEvent
): Promise<APIGatewayProxyResult> => {
captureRequestContext(event);
console.log('Event : ', JSON.stringify({
event,
}, null, 4));
// 1. Verify Signature
//verifySignature(event);
const messageId: string = await sendMessage(event);
// 2. Add to Queue
// 3. Error handling (final touch)
// 4. Response
return {
statusCode: 200,
body: JSON.stringify({
messageId,
message: 'success'
}),
}
}
Having this context information available, we can configure our logger so it adds it to our logs every time it is used.
import pino from 'pino';
import config from '@app/config';
import asyncLocalStorage from '@app/utils/async-local-storage';
const logger = pino({
enabled: config.log.enabled,
level: config.log.level,
redact: ['key', 'password', 'secret'].concat(config.log.redact ?? []),
mixin() {
const requestId = asyncLocalStorage.getStore().get('awsRequestId');
return {
requestId,
};
}
});
export default logger;
// services/sqs-service.ts
import AWS from 'aws-sdk';
import { APIGatewayProxyEvent } from 'aws-lambda';
import { getSqsMessage } from '@app/utils';
import config from '@app/config';
import logger from '@app/services/logger';
import { AwsSqsServiceError } from '@app/errors';
import AWSXRay from 'aws-xray-sdk';
export const sqs = new AWS.SQS({
apiVersion: '2012-11-05',
region: config.queue.sqs.defaultRegion,
});
// Capture the client
AWSXRay.captureAWSClient(sqs);
/*
* adds an event message to the queue
*
* **/
export async function sendMessage(
event: APIGatewayProxyEvent
): Promise<string> {
const message: AWS.SQS.Types.SendMessageRequest = getSqsMessage(event);
logger.info('From service/sqs-service#sendMessage');
try {
const result: AWS.SQS.Types.SendMessageResult = (
await sqs.sendMessage(message).promise());
return result?.MessageId ?? '';
} catch (error: any) {
throw new AwsSqsServiceError('failed to sendMessage')
.setOperation('services/sqs-service:sendMessage')
.setContext({
error,
message,
});
}
}
To see the context propagation in action, let’s add another log point into the code path.
import config from '@app/config';
import logger from '@app/services/logger';
export default function getSqsMessage(
event: APIGatewayProxyEvent,
): AWS.SQS.Types.SendMessageRequest {
logger.info('From utils/get-sqs-message');
return {
MessageAttributes: {
requestId: {
DataType: 'String',
StringValue: event?.requestContext?.requestId ?? '',
}
},
MessageBody: event?.body ?? '',
QueueUrl: config.queue.sqs.url ?? '',
};
}
Making these changes will break some tests because we are missing the mock function for the logger.
Add the logger mocks to the following files:
src/index.test.ts
src/utils/__tests__/get-sqs-message.test.ts
src/services/__tests__/sqs-service.test.ts
jest.mock('@app/services/logger', () => ({
__esModule: true,
default: {
info: jest.fn(),
}
}));
Let’s configure to enable our logger and the log level through the environment variable.
For more configurations, see src/config.ts
.
module "lambda_ingestion" {
source = "./modules/lambda"
code_src = "../functions/ingestion/main.zip"
bucket_id = aws_s3_bucket.lambda_bucket.id
timeout = local.default_lambda_timeout
function_name = "Ingestion-function"
runtime = "nodejs16.x"
handler = "dist/index.handler"
publish = true
alias_name = "ingestion-dev"
alias_description = "Alias for ingestion function"
iam_statements = {
sqs = {
actions = [
"sqs:GetQueueAttributes",
"sqs:GetQueueUrl",
"sqs:SendMessage",
"sqs:ReceiveMessage",
]
effect = "Allow"
resources = [
aws_sqs_queue.ingest_queue.arn
]
}
xray = {
actions = [
"xray:PutTraceSegments",
"xray:PutTelemetryRecords",
"xray:GetSamplingRules",
"xray:GetSamplingTargets",
"xray:GetSamplingStatisticSummaries"
]
effect = "Allow"
resources = [
"*"
]
}
}
environment_vars = {
QueueUrl = aws_sqs_queue.ingest_queue.id
DefaultRegion = var.aws_region
WebhookSignatureSecret = var.webhook_secret
LOG_LEVEL = "info"
LOG_ENABLED = true
}
}
⚠️ Note: All the environment variables are converted to strings so becareful when dealing with other data types
let’s use terraform to apply the CI/CD infrastructure.
// This will re-generate the assets
pnpm run generate-assets --filter "@function/*"
export AWS_ACCESS_KEY_ID=<your-key>
export AWS_SECRET_ACCESS_KEY=<your-secret>
export AWS_DEFAULT_REGION=us-east-1
terraform init
terraform plan
terraform apply -auto-approve
When the infrastructure is ready, make an API call to the endpoint.
In the response, under the headers, you should see a value under apigw-requestid:
, this is transaction id that will be propogated throughout the code path, and added as a context to our logger.
Let’s try to search for it under CloudWatch Log Insights to see the events of a request.
Example - Searching log events by a specific apigw-requestid
:
That’s it!
Now we have tracing and logging available for our infrastructure!
A recap:
Use tracing when you want to optimize performance (see the specific performance traces of each sub-systems)
Use logging when you want to debug issues or errors in each request by using the logged events
Node.js’s AsyncLocalStorage
allow us to propagate the transaction IDs in the async resource to be able to relate our requests to the logs
I hope this was helpful! If you did, please share this article with a friend or co-worker 🙏!
If you’d like a reference of the finished result, it is available at aws-webhook-series-tracing-and-logging.
Then consider signing up to get notified when new content arrives!