iToverDose/Software· 13 MAY 2026 · 12:03

AWS Lambda Durable Functions: A Simpler Way to Build Long-Running Workflows

AWS Lambda now supports durable execution, letting functions pause mid-process for up to a year without paying for idle time. This eliminates the need for separate state management tools like Step Functions, merging orchestration and business logic into a single, familiar codebase.

DEV Community5 min read0 Comments

AWS Lambda has long been a go-to for running short-lived, event-driven functions. But when workflows span minutes—or even days—developers traditionally relied on workarounds like Step Functions or custom state tables. At re:Invent 2025, AWS introduced Lambda Durable Functions, a built-in capability that lets functions checkpoint progress, survive failures, and suspend execution for up to a full year without incurring compute charges during idle periods.

This shift simplifies development for teams that prefer writing logic in code rather than orchestrating workflows across multiple AWS services. Below, we explore what Durable Functions are, how they work under the hood, when to use them instead of Step Functions, and what actual implementation looks like in practice.

Why Traditional Lambda Falls Short for Long Workflows

Standard AWS Lambda functions execute from start to finish within a single invocation, capped at 15 minutes. If a 10-step process fails at step seven, the entire workflow must restart from the beginning. Handling long delays—such as waiting for human approval or external callbacks—requires manual state management: storing progress in DynamoDB, setting up API Gateway endpoints for callbacks, and writing resume logic. Teams typically turn to Step Functions to manage this complexity, but that introduces a new layer of abstraction. Business logic lives in Lambda, while orchestration logic moves into a separate state machine defined in Amazon States Language or CDK constructs.

Lambda Durable Functions unify both layers. You define your workflow directly in your Lambda handler using familiar programming constructs, reducing context switching and streamlining development.

How Durable Execution Works Under the Hood

Durable execution is enabled at function creation via a DurableConfig block in SAM templates or CDK constructs. Once enabled, the durable execution SDK provides primitives that let functions checkpoint progress, pause, and resume seamlessly.

Here are the core primitives available in the SDK:

  • `context.step()` – Executes a block of code and caches the result. If the function later fails and replays, the completed step is skipped, and its cached result is returned immediately.
  • `context.wait()` – Pauses execution for a specified duration (minutes, hours, or days), with no compute charges accrued during the wait period.
  • `context.waitForCallback()` – Suspends execution until an external system responds with success or failure. Supports waits of up to one year.
  • `context.waitForCondition()` – Polls a condition on a schedule until it evaluates to true.
  • `context.parallel()` – Runs multiple durable operations concurrently.
  • `context.invoke()` – Invokes another Lambda function and checkpoints the result for replay safety.

Behind the scenes, Lambda uses a checkpoint/replay mechanism. When a function resumes—whether after a delay, a failure, or a code deployment—Lambda invokes the handler from the top. The SDK then replays through completed steps instantly (returning cached results) and resumes execution at the point where it paused.

A Real-World Example: User Onboarding with Durable Execution

Consider a user onboarding flow that creates a profile, waits up to 24 hours for email verification, and then sends a welcome email. Here’s how this looks using Lambda Durable Functions:

import { DurableContext, withDurableExecution } from '@aws/durable-execution-sdk-js';

export const handler = withDurableExecution(
  async (event, context: DurableContext) => {
    // Step 1: Create user profile
    const profile = await context.step("create-profile", async () => 
      createUserProfile(event.email, event.name)
    );

    // Step 2: Wait for email verification (up to 24 hours)
    const verification = await context.waitForCallback(
      "wait-for-email-verification",
      async (callbackId) => {
        await sendVerificationEmail(profile, callbackId);
      },
      { timeout: { hours: 24 } }
    );

    // Step 3: Complete onboarding based on verification status
    const result = await context.step("complete-onboarding", async () => {
      if (!verification || !verification.verified) {
        return { ...profile, status: 'failed' };
      }
      await sendWelcomeEmail(profile.email, profile.name);
      return { ...profile, status: 'active' };
    });

    return result;
  }
);

If profile creation succeeds but sending the verification email fails, the function retries from the waitForCallback step. The profile creation is skipped (its result is cached), and no time or resources are wasted re-executing completed steps. When the user clicks the verification link 6 hours later, the function resumes at the complete-onboarding step—without any manual intervention or external state tables. And crucially, you pay nothing during those 6 hours of waiting.

The corresponding SAM template defines the function and its durable configuration:

Resources:
  UserOnboardingFunction:
    Type: AWS::Serverless::Function
    Properties:
      FunctionName: UserOnboardingFunction
      CodeUri: ./src
      Handler: index.handler
      Runtime: nodejs24.x
      Timeout: 60
      DurableConfig:
        ExecutionTimeout: 90000  # 25 hours
        RetentionPeriodInDays: 7

Note that the Timeout property still enforces the per-invocation limit (max 15 minutes), while ExecutionTimeout defines the total duration of the durable execution—including waits—up to one year.

Durable Functions vs. Step Functions: Which Should You Choose?

AWS documentation frames the comparison as "application development in Lambda" versus "workflow orchestration across AWS services." A more direct way to decide is to evaluate your team’s priorities and workflow complexity.

Choose Lambda Durable Functions when:

  • Your team prefers writing logic in standard programming languages and familiar IDEs.
  • Your workflow primarily involves calling AWS services via SDKs from Lambda.
  • You want to keep orchestration and business logic in the same file.
  • You’re comfortable with the checkpoint/replay model and its idiosyncrasies.
  • You need to test locally using sam local invoke.

Choose AWS Step Functions when:

  • You’re orchestrating multiple AWS services with native integrations (e.g., SQS, SNS, DynamoDB, ECS).
  • Non-technical stakeholders need to review or modify workflows using visual designers.
  • You want zero runtime maintenance, avoiding SDK or library version updates.
  • You rely on the platform’s 220+ native service integrations.

Consider a hybrid approach when:

  • Step Functions manages high-level workflows across various AWS services.
  • Durable Functions handles complex application logic within individual Lambda functions.

Ultimately, Durable Functions shine in scenarios where workflows are Lambda-centric: a sequence of Lambda invocations with embedded delays or conditional logic. Step Functions, on the other hand, excels when orchestrating disparate AWS services with minimal custom code. If your workflow involves "Lambda calls Lambda calls Lambda with some waits in between," Durable Functions reduces overhead. If it’s more like "receive SQS message, write to DynamoDB, start ECS task, wait for completion, send SNS notification," Step Functions offers native integrations that avoid writing boilerplate Lambda handlers.

The Future of Serverless Orchestration

Lambda Durable Functions signal a maturing serverless ecosystem, where long-running, stateful workflows no longer require external orchestration tools. By embedding durability directly into Lambda, AWS lowers the barrier to building complex applications without sacrificing cost efficiency or developer productivity. For teams already comfortable with Lambda, this is a compelling evolution—one that consolidates logic, simplifies debugging, and reduces cognitive overhead. As adoption grows, expect more tooling, integrations, and best practices to emerge, further blurring the line between micro-functions and macro-workflows.

AI summary

AWS Lambda Durable Functions let you pause and resume workflows for up to a year without paying for idle time. Discover how they simplify orchestration and compare to Step Functions.

Comments

00
LEAVE A COMMENT
ID #WKHKVT

0 / 1200 CHARACTERS

Human check

3 + 3 = ?

Will appear after editor review

Moderation · Spam protection active

No approved comments yet. Be first.