How .NET developers can build AI assistants without vendor lock-in

.NET teams often face pressure to "just add AI"—but the real challenge isn’t model choice, it’s making AI outputs work in live, interactive apps. After wrestling with vendor lock-in, messy streaming, and shifting frameworks, one developer shares the architecture that finally clicked for their team’s 3D learning app.

The project started simply enough: add an AI assistant named Cori to an educational app that lets kids explore interactive 3D models of hearts, cells, and volcanoes across browsers, AR, and VR. Users could say things like "Rotate the heart left" or "Why is this chamber bigger?", and the assistant had to respond—both by acting in real time and explaining concepts. For the team, the core problem wasn’t picking the smartest model—it was delivering both words and actions to a live 3D viewer without lag or fragmentation.

This isn’t another tutorial promising a “definitive guide.” It’s the messy, vendor-agnostic path taken by a .NET team that values stability over trendiness. The stack—.NET backend with Wolverine and Marten on PostgreSQL, and a Svelte frontend—isn’t the most common path in AI land. Most tooling assumes Python or TypeScript first, leaving .NET developers to carve their own way. If that sounds familiar, you’re in the right place.

What “AI assistant” really means in a live 3D app

At its core, Cori is a multi-modal bridge between a language model and a real-time 3D environment. It needs to:

Hear user requests (voice or text).
Decide whether to call internal tools (like Rotate or SearchContent).
Stream responses back as they’re generated (no waiting for the full answer).
Sync both semantic actions and audio output to the client synchronously.

That last part—synchronous delivery of both types of output—is where most teams trip. If your app is a chatbot, streaming text alone is enough. But in a 3D learning environment, users expect instant feedback. A delay of even a second breaks immersion.

A minimal AI glossary for .NET devs

Before diving deeper, here’s a quick primer on the terms that get thrown around without explanation:

LLM (Large Language Model): The “brain.” It takes text input and returns text output. By itself, it’s useless—it only becomes useful when wired to tools and real systems.

Tokens: The model doesn’t send answers in one big blob. It sends tiny pieces. Stream those pieces, and the UI fills in the response word by word—no dramatic pauses.

Tools / Function calling: You register your own C# functions (Rotate, SearchContent, etc.) with the model. During its response, the model can “ask” to call one of these tools. Your code executes it—not the model.

Agent: A bundled setup of an LLM, tools, memory, and instructions. In .NET’s ecosystem, this is often represented by types like AIAgent from the Microsoft Agent Framework.

This isn’t deep theory—it’s the vocabulary you need to follow along without constant alt-tabbing.

Rule zero: Never bet the farm on one vendor

The first major decision wasn’t about streaming or transport. It was about avoiding vendor lock-in at all costs.

The fear wasn’t that the model would be dumb. It was that the entire codebase would become tied to one SDK, one framework, or one ecosystem’s shifting direction—which, in AI, happens constantly. One month it’s Semantic Kernel, the next it’s Microsoft Agent Framework. The ground is always moving.

The solution? Abstraction through dependency injection. Use generic interfaces in application code, and keep the actual provider (OpenAI, Deepgram, etc.) isolated behind a registration layer.

Surprisingly, .NET now supports this well. The Microsoft.Extensions.AI library acts like the ILogger pattern, but for AI:

IChatClient: provider-neutral access to chat/LLM functions
IEmbeddingGenerator: generates vector embeddings for semantic search
ITextToSpeechClient: converts text to speech
Microsoft.Extensions.VectorData: vendor-neutral vector store abstractions

This means the provider is just a registration detail:

// Register once: OpenAI hidden behind IChatClient
builder.Services.AddKeyedSingleton<IChatClient>(
    "CoriAI",
    (sp, _) => sp.GetRequiredService<OpenAIClient>()
        .GetChatClient("gpt-4o-mini")
        .AsIChatClient()
        .AsBuilder()
        .UseFunctionInvocation()
        .Build()
);

Consuming it is intentionally boring:

public sealed class Summarizer(
    [FromKeyedServices("CoriAI")] IChatClient chat
) {
    public async Task<string> OneLiner(string topic, CancellationToken ct) {
        var reply = await chat.GetResponseAsync(
            $"Explain {topic} in one sentence.",
            cancellationToken: ct
        );
        return reply.Text;
    }
}

Notice: no class here knows or cares whether OpenAI is behind the interface. That’s the entire point.

Semantic Kernel: the lesson from a dead-end path

Early on, the team tried using Semantic Kernel directly. They wired the code straight to its types and leaned on APIs labeled [Experimental]. It felt convenient—until Microsoft introduced the Microsoft Agent Framework and Semantic Kernel’s future became unclear.

Suddenly, the framework they’d invested time into became “legacy,” and they faced a costly rewrite.

This isn’t hindsight wisdom from a calm architect. It’s the painful lesson of rewriting code during active development. Now, the team insists on Microsoft.Extensions.AI specifically because when the next shift happens—and in AI, it always does—they want the fix to be a single line in the DI container, not a weekend of Ctrl+Shift+F and quiet frustration.

Transport: why “just use SignalR” isn’t enough

Once the “brain” side was stable, the real challenge emerged: how do the AI’s outputs actually get to the client?

Cori produces two parallel streams of output:

Semantic output: tool calls, state updates, text responses
Audio output: synthesized voice from text-to-speech

And both need to reach the 3D viewer in sync, with minimal latency. A user says, "Explain this chamber", and expects the assistant to both speak the explanation and highlight the correct part of the model—at the same time.

The first instinct was to use SignalR for everything. It’s built for real-time updates in .NET apps. But SignalR alone doesn’t handle streaming AI responses efficiently. It’s great for state sync, but poor at handling the firehose of tokens an LLM emits.

The team had to layer on:

Token streaming: breaking the LLM response into chunks and sending them as they arrive.
Event-driven tool calls: when the model asks to call Rotate, the backend executes it and broadcasts the result.
Audio pipeline: converting the final text response to speech and streaming it via WebRTC or WebSocket.

The architecture evolved into a hybrid system:

SignalR handles live UI state updates (e.g., rotating the 3D model).
A custom WebSocket endpoint streams raw tokens for instant text display.
A separate audio service (using ITextToSpeechClient) streams voice in real time.

This separation avoids overloading any single connection and keeps latency under 300ms—critical for user immersion.

The code that finally made it click

The working pipeline now looks like this:

User speaks or types a request.
The backend receives it via a REST endpoint (handled by Wolverine’s CQRS pattern).
The request is passed to IChatClient (OpenAI behind the scenes).
As the LLM streams tokens, they’re sent immediately through a WebSocket to the frontend.
When the model calls a tool (e.g., Rotate), Wolverine dispatches the action and broadcasts the new state via SignalR.
The audio service converts the final text response to speech and streams it using WebRTC.

Key implementation details:

Token streaming: Use ChatResponseStream with async foreach to yield tokens as they arrive.
Tool calls: Register functions explicitly with UseFunctionInvocation(), then handle calls in a controller or mediator.
Audio: Use ITextToSpeechClient to generate speech, then pipe it through a WebSocket or MediaStream.

The frontend (Svelte) listens to three channels:

WebSocket for text streaming
SignalR for tool actions (e.g., rotation)
WebRTC for audio

This keeps everything in sync without overcomplicating the backend.

Looking ahead: modular AI without the lock-in tax

The team’s biggest win wasn’t building a smarter LLM—it was building a system that can adapt when the AI landscape changes. By relying on Microsoft.Extensions.AI and clean abstractions, swapping providers or upgrading frameworks becomes a configuration tweak, not a rewrite.

The next step? Adding memory—letting Cori remember what a user explored in past sessions—while keeping the same vendor-agnostic architecture.

For .NET teams facing the AI imperative, the message is clear: don’t optimize for today’s hottest model. Optimize for tomorrow’s unknown shift.

That’s how you build something that lasts.

AI summary

C# ve .NET kullanarak AI asistanı geliştirmenin gerçek zorlukları ve pratik çözümleri. Soyutlama katmanları, SignalR entegrasyonu ve kod örnekleriyle adım adım rehber.

How .NET developers can build AI assistants without vendor lock-in

What “AI assistant” really means in a live 3D app

A minimal AI glossary for .NET devs

Rule zero: Never bet the farm on one vendor

Semantic Kernel: the lesson from a dead-end path

Transport: why “just use SignalR” isn’t enough

The code that finally made it click

Looking ahead: modular AI without the lock-in tax

Comments

How prompt compression cuts LLM costs by 65% without losing answers

Why and How We Migrated a Legacy JS App to Next.js + TypeScript

Master iOS App Icons in 2026: Sizes, Tools, and Pro Tips