How to Scale Computer Vision Pipelines for High-Resolution Images

High-resolution images—whether 4K video frames, satellite captures, or whole-slide pathology scans—present a unique challenge for computer vision systems. Unlike prototypes tested on single images, production pipelines must process massive files efficiently.

When images exceed a model’s maximum input size, the industry-standard solution is tiled inference: dividing the image into smaller, overlapping regions and running independent inference calls on each tile. While this approach works well for small grids, scaling to hundreds or thousands of tiles introduces new complexities around concurrency, failure handling, and result aggregation.

The core issue isn’t the tiling itself—it’s orchestrating the process at scale. Most frameworks like SAHI handle single-machine execution, but production environments demand durability, fault tolerance, and real-time progress tracking. Without the right orchestration layer, even a modest increase in tile count can lead to throttling, timeouts, or partial failures that derail the entire pipeline.

Why Tiling is the Default for Large-Scale Computer Vision

Tiled inference isn’t experimental—it’s the backbone of modern computer vision workflows. Tools like SAHI, which boasts over 35,000 GitHub stars, automate the slicing, inference, and stitching of high-resolution images. Digital pathology pipelines routinely divide gigapixel whole-slide images into thousands of patches, while satellite imagery processing on AWS relies on the same fundamental pattern: partition, infer in parallel, and reassemble.

The real bottleneck emerges when organizations try to scale this pattern beyond a single machine. Production-grade pipelines require custom coordination services, worker pools, and explicit error recovery mechanisms—every team ends up reinventing the same orchestration logic.

A Scalable Orchestration Pattern with Durable Functions

AWS Lambda’s durable functions introduce a game-changing capability: context.map(), which directly addresses the tiling challenge. This operation fans out an array of items as independent, concurrent invocations, each checkpointed and retriable without affecting the entire pipeline. Whether processing 9 tiles or 900, the same code path remains consistent, with failed tiles retrying automatically while successful ones proceed unimpeded.

To demonstrate this pattern in action, I built a pipeline that accepts an image, divides it into an N×N grid, runs concurrent inferences via Amazon Bedrock, aggregates results into a scene description, and streams progress to a real-time dashboard. The architecture relies on just two Lambda functions—no additional queues, orchestration services, or worker pool management required.

Inside the Pipeline: From Upload to Real-Time Dashboard

The workflow begins with the browser requesting a presigned S3 URL to upload the image directly to Amazon S3. Once the upload completes, the frontend calls an API endpoint that triggers the durable pipeline asynchronously, returning AWS AppSync connection details for real-time updates.

The pipeline itself is a single durable function executing four checkpointed steps:

Preprocess: Validates the image content and constructs the region grid. Grid size is configurable—3×3 for smaller images or 8×8 for larger, more complex ones. The system also performs content moderation to ensure compliance before processing.
Analyze: Uses context.map() to fan out parallel inference calls across all regions, respecting a configurable concurrency limit (e.g., 5 concurrent calls at a time). Each tile’s results are streamed to a WebSocket channel as they complete.
Synthesize: Aggregates findings from all successful tiles into a unified scene description, including per-object bounding boxes and metadata.
Store: Persists the final output to DynamoDB and publishes dashboard updates via AppSync Events.

The entire backend consists of just two Lambda functions: the API handler and the durable pipeline. No separate orchestration service, no queue infrastructure, and no manual worker pool management—just sequential, checkpointed execution that scales linearly with tile count.

Code Walkthrough: How the Pipeline Handles Scale

Below is the core logic of the pipeline handler, written in TypeScript. The orchestration reads like sequential code despite handling thousands of concurrent operations:

export const handler = withDurableExecution(
  async (event: AnalysisPipelineEvent, context: DurableContext) => {
    // Step 1: Preprocess - validate and build region grid
    const preprocessed = await context.step('preprocess', async () => {
      const gridSize = Number(event.gridSize ?? 3);
      const imageBase64 = await fetchImageBase64(event);
      await moderateImage(imageBase64, imageFormat);
      return { regions: buildRegions(gridSize) };
    });

    // Step 2: Parallel region inference with controlled concurrency
    const mapResults = await context.map(
      'analyze-regions',
      preprocessed.regions,
      async (ctx: DurableContext, region: ImageRegion, index: number) => {
        return await ctx.step(`analyze-region-${index}`, async () => {
          const imageBase64 = await fetchImageBase64(event);
          const finding = await analyzeRegion(imageBase64, imageFormat, region);
          await publish(ch, [{
            type: 'region',
            index,
            status: 'done',
            finding
          }]);
          return {
            regionIndex: finding.regionIndex,
            regionLabel: finding.regionLabel,
            analysis: finding.analysis.slice(0, 500),
            detectedObjects: (finding.detectedObjects ?? []).slice(0, 8),
          };
        });
      },
      { maxConcurrency: 5 },
    );

    // Step 3: Synthesize findings into a unified scene description
    const synthesis = await context.step('synthesize', () => 
      synthesizeFindings(mapResults.succeeded().map(item => item.result))
    );

    // Step 4: Persist results and publish dashboard updates
    await context.step('store', async () => {
      // Store in DynamoDB + trigger AppSync events for real-time display
    });
  }
);

Each step is atomic and checkpointed, ensuring that partial failures—whether a throttled API call or a timeout—only affect individual tiles. The pipeline continues processing unaffected regions while retries handle the errors in the background.

Key Takeaways for Scaling Computer Vision Pipelines

The tiled inference pattern is here to stay, but its success depends on robust orchestration. By leveraging durable functions with built-in concurrency control and automatic retries, teams can scale from prototypes to production without reinventing the wheel. The approach eliminates the need for custom coordination services, reduces infrastructure complexity, and provides real-time visibility into pipeline progress.

For organizations processing high-resolution images at scale, the question isn’t whether to tile—but how to orchestrate those tiles efficiently. The answer lies in patterns that treat each region as an independent, retryable unit, ensuring reliability without sacrificing performance.

AI summary

Learn how to process large images efficiently using tiled inference and durable orchestration. Reduce failures and speed up AI pipelines for 4K, satellite, and medical images.

How to Scale Computer Vision Pipelines for High-Resolution Images

Why Tiling is the Default for Large-Scale Computer Vision

A Scalable Orchestration Pattern with Durable Functions

Inside the Pipeline: From Upload to Real-Time Dashboard

Code Walkthrough: How the Pipeline Handles Scale

Key Takeaways for Scaling Computer Vision Pipelines

Comments

How AI-Generated Tests Led to a $700K Outage in Production

How Documenting Your AI Journey Accelerates Career Growth

AI API cost audits: Track spend by team and user in 2026