Automate AWS Lambda Monitoring with Observability as Code

Modern serverless architectures demand more than just functional code—they require robust observability baked into every deployment. Traditional manual monitoring setups for AWS Lambda functions often lead to configuration drift, missed alerts, and operational overhead. Observability as Code (OaC) changes this paradigm by treating monitoring configurations like infrastructure code: version-controlled, reproducible, and deployed alongside your application.

The Pitfalls of Manual Lambda Monitoring

Manually instrumenting AWS Lambda functions for observability might seem straightforward at first. You configure logging to Datadog, set up monitors, dashboards, and alerts for a specific environment. This approach works in the short term, but scalability exposes critical flaws:

Environment drift: New environments (staging, production, regional variations) often receive inconsistent configurations, leading to blind spots.
Team dependencies: Onboarding new developers requires repeated knowledge transfer about monitoring setups.
Configuration gaps: Critical monitors or dashboards may be omitted during deployments, creating coverage gaps.
Operational risk: Manual changes introduce human error, with configurations falling out of sync over time.

Each of these issues compounds, demanding ongoing manual intervention that distracts from core development work and increases the likelihood of undetected incidents.

How Observability as Code Transforms Monitoring

Observability as Code eliminates these challenges by embedding monitoring configurations directly in your deployment pipeline. Every monitor, metric tag, alert rule, and trace configuration lives in version control alongside your application code. When you deploy your stack, the observability layer deploys automatically—ensuring consistency, repeatability, and governance across all environments.

The benefits extend beyond immediate consistency:

Environment parity: Identical configurations across development, staging, and production environments.
Auditability: Changes to monitoring rules are tracked in version control, enabling peer review and rollback capabilities.
Disaster recovery: Monitoring rules can be redeployed instantly from code, reducing recovery time.
Cost control: Standardized configurations help prevent over-provisioning of monitoring resources.
Scalability: Teams can expand monitoring coverage without proportional increases in operational overhead.

Building a Node.js Lambda API with OaC

To demonstrate this approach, we built a simple REST API using Node.js and AWS Lambda—a trip cost estimator deployed behind an API gateway. The function accepts three parameters: trip duration in days, number of people, and accommodation tier, returning a cost breakdown.

POST /estimate
{
  "days": 7,
  "people": 2,
  "accommodation": "mid-range"
}

The observability stack centers on the Serverless Framework with the Datadog plugin (serverless-plugin-datadog). When you run sls deploy, the plugin automatically:

Attaches the Datadog Lambda Library layer, instrumenting the Node.js runtime for Application Performance Monitoring (APM).
Adds the Datadog Lambda Extension layer, which streams metrics, traces, and logs directly to Datadog without requiring a separate Forwarder Lambda.
Injects essential environment variables (DD_SITE, DD_SERVICE, DD_ENV etc.).
Creates or updates Datadog monitors defined in your serverless.yml file.

Configuring Observability in serverless.yml

The configuration file becomes the single source of truth for both application and observability requirements. Here’s a complete example:

service: trip-estimator

useDotenv: true

provider:
  name: aws
  runtime: nodejs20.x
  region: us-east-1
  stage: ${opt:stage, 'dev'}
  logRetentionInDays: 1
  environment:
    DD_API_KEY: ${env:DD_API_KEY}
    DD_SITE: ${env:DD_SITE, 'us5.datadoghq.com'}
    DD_ENV: ${sls:stage}
    DD_SERVICE: trip-estimator
    DD_VERSION: "1.0.0"
    DD_LOGS_INJECTION: "true"

functions:
  tripCostEstimator:
    handler: handler.estimate
    events:
      - httpApi:
          path: /estimate
          method: POST

plugins:
  - serverless-plugin-datadog

custom:
  datadog:
    apiKey: ${env:DD_API_KEY}
    appKey: ${env:DD_APP_KEY}
    site: ${env:DD_SITE, 'us5.datadoghq.com'}
    env: ${sls:stage}
    service: -estimator
    version: "1.0.0"
    enableDDLogs: true
    enableDDTracing: true
    enableEnhancedMetrics: true
    captureLambdaPayload: true
    addLayers: true
    monitors:
      - lambda-high-error-rate:
          thresholds:
            errorRate: 5
      - lambda-high-p90-latency:
          thresholds:
            p90: 1000

Critical credentials like API keys are managed through environment variables in a .env file (excluded from version control) or, ideally, AWS Secrets Manager for production environments.

Implementing APM with Custom Traces

The Datadog plugin handles layer attachment automatically, so APM tracing works out of the box. Each Lambda invocation generates a root trace in Datadog APM without requiring code changes. For deeper visibility into business logic, you can add custom spans using the dd-trace library.

During local development, dd-trace is added as a dev dependency:

{
  "devDependencies": {
    "dd-trace": "^5.x",
    "serverless-plugin-datadog": "^5.x"
  }
}

In your handler, require dd-trace with a fallback to maintain local testing capability:

let tracer;
try {
  tracer = require('dd-trace');
} catch {
  tracer = null;
}

async function estimate(event) {
  let summary;
  if (tracer) {
    summary = tracer.trace('trip.estimate', {
      tags: {
        days: event.days,
        people: event.people,
        accommodation: event.accommodation
      }
    }, (span) => {
      const result = calculateCost(event.days, event.people, event.accommodation);
      span.setTag('grand_total', result.grandTotal);
      span.setTag('per_person_total', result.perPersonTotal);
      return result;
    });
  } else {
    summary = calculateCost(event.days, event.people, event.accommodation);
  }
  return summary;
}

This approach yields comprehensive tracing:

An auto-instrumented root span for the Lambda invocation.
A child span named trip.estimate tagged with business attributes.
End-to-end latency, error rate, and throughput metrics per service.

Structured Logging with Trace Correlation

The Datadog Lambda Extension automatically forwards all function logs to Datadog when enableDDLogs is enabled. The real breakthrough comes from trace correlation—embedding the active trace ID and span ID in every log line.

Here’s a structured logging helper that enables this correlation:

function logWithContext(message, context = {}) {
  const traceId = tracer ? tracer.scope().active().context().toTraceId() : null;
  const spanId = tracer ? tracer.scope().active().context().toSpanId() : null;
  
  const logEntry = {
    message,
    ...context,
    dd: {
      trace_id: traceId,
      span_id: spanId
    }
  };
  
  console.log(JSON.stringify(logEntry));
}

In Datadog, clicking any log entry automatically links to its corresponding trace, and vice versa. This eliminates the tedious task of manually correlating logs with performance traces during incident investigations.

Best Practices and Lessons Learned

Implementing Observability as Code requires thoughtful planning. Start by standardizing your naming conventions for services and environments to ensure consistency across dashboards and monitors. Use tags liberally to enable flexible filtering and aggregation in Datadog.

Cost control remains a critical consideration. Monitor your Datadog usage closely, especially when enabling enhanced metrics and trace collection. Consider sampling high-volume traces in production environments and disabling payload capture for non-sensitive endpoints.

Finally, treat your observability configuration as living code. Regularly review monitors and dashboards as your application evolves, and conduct post-mortems on incidents to identify gaps in coverage. The goal isn’t just to monitor your system—it’s to make your system observable by design.

As serverless architectures grow in complexity, the ability to automate and standardize observability will separate sustainable systems from fragile ones. By adopting Observability as Code today, teams can reduce operational burden, improve reliability, and focus on delivering value rather than maintaining monitoring setups.

AI summary

Learn how to implement Observability as Code for AWS Lambda using Serverless Framework and Datadog. Streamline monitoring, reduce drift, and improve reliability with version-controlled configurations.

Automate AWS Lambda Monitoring with Observability as Code

The Pitfalls of Manual Lambda Monitoring

How Observability as Code Transforms Monitoring

Building a Node.js Lambda API with OaC

Configuring Observability in serverless.yml

Implementing APM with Custom Traces

Structured Logging with Trace Correlation

Best Practices and Lessons Learned

Comments

How GitHub Issues delivers instant navigation with client-side caching

How to Publish Your Flutter App on F-Droid Without Friction

AI Coding Agents Simplify Enterprise App Development – But Data Context Remains Key