NodeLLM 1.17 enhances MCP sampling with parallel tools and ORM control

The latest release of NodeLLM, version 1.17, marks a significant leap in Model Context Protocol (MCP) integration by empowering servers to request LLM completions directly from clients. This long-awaited feature, teased in earlier updates, reverses the typical MCP flow where the client queries the server for tools.

MCP Sampling: Server-Driven LLM Completions

Traditionally, MCP servers expose tools and resources to clients, but NodeLLM 1.17 allows servers to ask the client to execute LLM completions on their behalf. This capability eliminates the need for servers to manage their own API keys or provider integrations, as they can now leverage the client's configured model.

The new createLLMSamplingHandler function enables servers to handle sampling requests using a real NodeLLM instance. Here’s how it works:

import { createLLM } from "@node-llm/core";
import { MCP, createLLMSamplingHandler } from "@node-llm/mcp";

const llm = createLLM({ provider: "openai" });

const mcp = await MCP.connect(
  { command: "node", args: ["./sampling-server.mjs"] },
  { sampling: createLLMSamplingHandler(llm, "gpt-4o-mini") }
);

const tools = await mcp.discoverTools();

For advanced use cases, developers can pass a custom handler function to createLLMSamplingHandler instead of relying on the default LLM and model configuration. This handler receives raw sampling parameters and returns a controlled response, allowing fine-grained control over model routing and guardrails.

Faster Workflows with Concurrent Tool Execution

NodeLLM 1.17 also introduces support for concurrent tool execution, a feature that optimizes performance when multiple independent tool calls are returned by the model in a single turn. Previously, tools were executed sequentially by default, which could lead to unnecessary delays. With the new toolConcurrency option, developers can enable parallel execution for non-dependent tasks.

const chat = llm
  .chat("gpt-4o-mini")
  .withTool(WeatherTool)
  .withToolConcurrency(true);

await chat.ask("What is the weather in Tokyo, London, and New York?");

This feature is supported across all chat modes, including streaming and agentic loops, ensuring consistent performance improvements without requiring changes to tool definitions.

Unified Callback Handling for Middleware-Like Workflows

Another enhancement in NodeLLM 1.17 addresses a subtle but critical issue in callback management. Previously, registering a second handler for the same event (e.g., onEndMessage, beforeRequest) would silently replace the first one, posing risks for composable middleware workflows. The update ensures all registered handlers execute in order, enabling clean integration of multiple independent concerns such as logging, UI updates, or data auditing.

chat
  .onEndMessage((msg) => audit.log(msg))
  .onEndMessage(() => ui.refresh());

chat
  .beforeRequest(redactPII)
  .beforeRequest(logOutboundPrompt);

This change maintains backward compatibility for single-handler use cases while unlocking more sophisticated workflows for developers building modular systems.

ORM 0.8.0: Persistent Tool Control in Database-Backed Chats

The @node-llm/orm package receives a major update with version 0.8.0, bringing the same tool execution controls found in the core library to ORM-backed chats. This ensures that applications using Prisma or other database integrations can leverage precise tool orchestration without dropping to lower-level APIs.

import { createChat } from "@node-llm/orm/prisma";
import { ToolExecutionMode } from "@node-llm/core";

const chat = await createChat(prisma, { model: "gpt-4o" })
  .withToolExecution(ToolExecutionMode.CONFIRM)
  .onConfirmToolCall(async (call) => await askUserToApprove(call))
  .onToolCallError((call, error) => ({ error: error.message }))
  .withToolChoice("get_weather")
  .withToolConcurrency(true);

The toolExecution option now supports three modes:

auto (default): Executes tools automatically.
confirm: Requires explicit user approval before each tool call, ideal for human-in-the-loop workflows.
dry-run: Simulates tool calls without execution, useful for testing and validation.

All tool interactions, including their outcomes, are now properly persisted in the database, ensuring consistency between the application state and the model's actions.

Enhanced Monitoring for Token and Cache Tracking

NodeLLM’s monitoring tools receive updates to provide deeper insights into token usage and cache behavior. The @node-llm/monitor 0.4.2 and @node-llm/monitor-otel 0.1.1 packages now track cache, reasoning, and image token counts, offering a more granular breakdown of resource consumption.

interface ExtractedTokenUsage {
  prompt: number;
  completion: number;
  cached: number;
  cacheCreation: number;
  reasoning: number;
  image: number;
}

The monitoring system normalizes token accounting across different providers, including Vercel AI SDK, OpenAI, and OpenTelemetry conventions. This ensures accurate cost tracking and usage analysis as reasoning models and prompt caching become increasingly prevalent.

Getting Started with NodeLLM 1.17

To begin using the latest features, install the updated packages:

npm install @node-llm/core@1.17.0 @node-llm/mcp@0.2.0 @node-llm/orm@0.8.0 @node-llm/testing@0.5.1
npm install @node-llm/monitor@0.4.2 @node-llm/monitor-otel@0.1.1

For a detailed list of changes and improvements, refer to the project’s commit history and changelog on GitHub. NodeLLM 1.17 sets the stage for more efficient, flexible, and controlled LLM workflows in Node.js applications, bridging the gap between client and server capabilities while maintaining robust tool management and monitoring.

The future of MCP integration looks promising, with further optimizations expected as the ecosystem evolves. Developers can now build more sophisticated, data-driven applications without compromising on performance or control.

AI summary

NodeLLM 1.17, MCP tabanlı LLM'lerde istemci-sunucu akışını tersine çeviren örnekleme desteği sunuyor. Paralel araç çalıştırma, ORM iyileştirmeleri ve callback yönetimiyle geliştirici deneyimini nasıl yükselttiğini keşfedin.

NodeLLM 1.17 enhances MCP sampling with parallel tools and ORM control

MCP Sampling: Server-Driven LLM Completions

Faster Workflows with Concurrent Tool Execution

Unified Callback Handling for Middleware-Like Workflows

ORM 0.8.0: Persistent Tool Control in Database-Backed Chats

Enhanced Monitoring for Token and Cache Tracking

Getting Started with NodeLLM 1.17

Comments

Why indie SaaS teams need better changelog tools in 2026

AI coding agents need structure: AWS’s AI-DLC method explained

Real-time whale wallet tracking for Polymarket built with Node.js