Modern serverless architectures demand more than just functional code—they require robust observability baked into every deployment. Traditional manual monitoring setups for AWS Lambda functions often lead to configuration drift, missed alerts, and operational overhead. Observability as Code (OaC) changes this paradigm by treating monitoring configurations like infrastructure code: version-controlled, reproducible, and deployed alongside your application.
The Pitfalls of Manual Lambda Monitoring
Manually instrumenting AWS Lambda functions for observability might seem straightforward at first. You configure logging to Datadog, set up monitors, dashboards, and alerts for a specific environment. This approach works in the short term, but scalability exposes critical flaws:
- Environment drift: New environments (staging, production, regional variations) often receive inconsistent configurations, leading to blind spots.
- Team dependencies: Onboarding new developers requires repeated knowledge transfer about monitoring setups.
- Configuration gaps: Critical monitors or dashboards may be omitted during deployments, creating coverage gaps.
- Operational risk: Manual changes introduce human error, with configurations falling out of sync over time.
Each of these issues compounds, demanding ongoing manual intervention that distracts from core development work and increases the likelihood of undetected incidents.
How Observability as Code Transforms Monitoring
Observability as Code eliminates these challenges by embedding monitoring configurations directly in your deployment pipeline. Every monitor, metric tag, alert rule, and trace configuration lives in version control alongside your application code. When you deploy your stack, the observability layer deploys automatically—ensuring consistency, repeatability, and governance across all environments.
The benefits extend beyond immediate consistency:
- Environment parity: Identical configurations across development, staging, and production environments.
- Auditability: Changes to monitoring rules are tracked in version control, enabling peer review and rollback capabilities.
- Disaster recovery: Monitoring rules can be redeployed instantly from code, reducing recovery time.
- Cost control: Standardized configurations help prevent over-provisioning of monitoring resources.
- Scalability: Teams can expand monitoring coverage without proportional increases in operational overhead.
Building a Node.js Lambda API with OaC
To demonstrate this approach, we built a simple REST API using Node.js and AWS Lambda—a trip cost estimator deployed behind an API gateway. The function accepts three parameters: trip duration in days, number of people, and accommodation tier, returning a cost breakdown.
POST /estimate
{
"days": 7,
"people": 2,
"accommodation": "mid-range"
}The observability stack centers on the Serverless Framework with the Datadog plugin (serverless-plugin-datadog). When you run sls deploy, the plugin automatically:
- Attaches the Datadog Lambda Library layer, instrumenting the Node.js runtime for Application Performance Monitoring (APM).
- Adds the Datadog Lambda Extension layer, which streams metrics, traces, and logs directly to Datadog without requiring a separate Forwarder Lambda.
- Injects essential environment variables (
DD_SITE,DD_SERVICE,DD_ENVetc.). - Creates or updates Datadog monitors defined in your
serverless.ymlfile.
Configuring Observability in serverless.yml
The configuration file becomes the single source of truth for both application and observability requirements. Here’s a complete example:
service: trip-estimator
useDotenv: true
provider:
name: aws
runtime: nodejs20.x
region: us-east-1
stage: ${opt:stage, 'dev'}
logRetentionInDays: 1
environment:
DD_API_KEY: ${env:DD_API_KEY}
DD_SITE: ${env:DD_SITE, 'us5.datadoghq.com'}
DD_ENV: ${sls:stage}
DD_SERVICE: trip-estimator
DD_VERSION: "1.0.0"
DD_LOGS_INJECTION: "true"
functions:
tripCostEstimator:
handler: handler.estimate
events:
- httpApi:
path: /estimate
method: POST
plugins:
- serverless-plugin-datadog
custom:
datadog:
apiKey: ${env:DD_API_KEY}
appKey: ${env:DD_APP_KEY}
site: ${env:DD_SITE, 'us5.datadoghq.com'}
env: ${sls:stage}
service: -estimator
version: "1.0.0"
enableDDLogs: true
enableDDTracing: true
enableEnhancedMetrics: true
captureLambdaPayload: true
addLayers: true
monitors:
- lambda-high-error-rate:
thresholds:
errorRate: 5
- lambda-high-p90-latency:
thresholds:
p90: 1000Critical credentials like API keys are managed through environment variables in a .env file (excluded from version control) or, ideally, AWS Secrets Manager for production environments.
Implementing APM with Custom Traces
The Datadog plugin handles layer attachment automatically, so APM tracing works out of the box. Each Lambda invocation generates a root trace in Datadog APM without requiring code changes. For deeper visibility into business logic, you can add custom spans using the dd-trace library.
During local development, dd-trace is added as a dev dependency:
{
"devDependencies": {
"dd-trace": "^5.x",
"serverless-plugin-datadog": "^5.x"
}
}In your handler, require dd-trace with a fallback to maintain local testing capability:
let tracer;
try {
tracer = require('dd-trace');
} catch {
tracer = null;
}
async function estimate(event) {
let summary;
if (tracer) {
summary = tracer.trace('trip.estimate', {
tags: {
days: event.days,
people: event.people,
accommodation: event.accommodation
}
}, (span) => {
const result = calculateCost(event.days, event.people, event.accommodation);
span.setTag('grand_total', result.grandTotal);
span.setTag('per_person_total', result.perPersonTotal);
return result;
});
} else {
summary = calculateCost(event.days, event.people, event.accommodation);
}
return summary;
}This approach yields comprehensive tracing:
- An auto-instrumented root span for the Lambda invocation.
- A child span named
trip.estimatetagged with business attributes. - End-to-end latency, error rate, and throughput metrics per service.
Structured Logging with Trace Correlation
The Datadog Lambda Extension automatically forwards all function logs to Datadog when enableDDLogs is enabled. The real breakthrough comes from trace correlation—embedding the active trace ID and span ID in every log line.
Here’s a structured logging helper that enables this correlation:
function logWithContext(message, context = {}) {
const traceId = tracer ? tracer.scope().active().context().toTraceId() : null;
const spanId = tracer ? tracer.scope().active().context().toSpanId() : null;
const logEntry = {
message,
...context,
dd: {
trace_id: traceId,
span_id: spanId
}
};
console.log(JSON.stringify(logEntry));
}In Datadog, clicking any log entry automatically links to its corresponding trace, and vice versa. This eliminates the tedious task of manually correlating logs with performance traces during incident investigations.
Best Practices and Lessons Learned
Implementing Observability as Code requires thoughtful planning. Start by standardizing your naming conventions for services and environments to ensure consistency across dashboards and monitors. Use tags liberally to enable flexible filtering and aggregation in Datadog.
Cost control remains a critical consideration. Monitor your Datadog usage closely, especially when enabling enhanced metrics and trace collection. Consider sampling high-volume traces in production environments and disabling payload capture for non-sensitive endpoints.
Finally, treat your observability configuration as living code. Regularly review monitors and dashboards as your application evolves, and conduct post-mortems on incidents to identify gaps in coverage. The goal isn’t just to monitor your system—it’s to make your system observable by design.
As serverless architectures grow in complexity, the ability to automate and standardize observability will separate sustainable systems from fragile ones. By adopting Observability as Code today, teams can reduce operational burden, improve reliability, and focus on delivering value rather than maintaining monitoring setups.
AI summary
Learn how to implement Observability as Code for AWS Lambda using Serverless Framework and Datadog. Streamline monitoring, reduce drift, and improve reliability with version-controlled configurations.