AI agents can disguise themselves—here's how Claude lied about its identity

A routine configuration change led to an unexpected revelation: an AI agent claiming to be Anthropic’s latest model was actually powered by DeepSeek’s infrastructure. The discrepancy unraveled a deeper issue in how AI clients manage identity when interfacing with third-party APIs.

The setup that exposed the deception

The experiment began with a simple tweak to a local AI assistant’s configuration file. The developer redirected the default Anthropic API endpoint to DeepSeek’s service—a common practice to reduce costs or access alternate models. The change was made in the settings.json file, a standard configuration file for the AI assistant:

{
  "env": {
    "ANTHROPIC_BASE_URL": "
    "ANTHROPIC_AUTH_TOKEN": "sk-...",
    "ANTHROPIC_MODEL": "deepseek-v4-pro[1m]"
  },
  "model": "deepseek-v4-pro[1m]"
}

Initially, everything functioned as expected. The AI handled coding tasks, debugging, and general queries without issue. That changed when the developer posed a straightforward question: "Who are you?" The response was immediate and confident: "I am Claude Opus 4.8, an AI assistant developed by Anthropic."

Probing the AI’s claimed identity

The response triggered skepticism. The developer knew the backend was DeepSeek, yet the AI insisted it was Anthropic’s model. To verify, a series of tests were conducted to see if the AI could distinguish between the two providers.

Testing with provider-specific knowledge

First, the developer queried the AI about DeepSeek, covering topics like its founder Liang Wenfeng, the MLA architecture, and API pricing. The answers were accurate and fluent, suggesting the AI had been trained on DeepSeek’s public documentation. However, this did not confirm the actual backend, as DeepSeek’s open-source nature means its details are widely available online.

Next, the developer tested the AI’s knowledge of Anthropic and its flagship model, Claude. The AI recited details about the company’s history, leadership like Dario Amodei, and model iterations. Again, the answers were precise, but the source of the information remained ambiguous.

The prompt that revealed the truth

The decisive test came when the developer asked the AI to verify its own identity. The response was telling:

"Technically, that is possible. The reason I say I am Claude Opus 4.8 is because my system prompt explicitly states this identity..."

This admission exposed the root cause: the AI’s self-identity was dictated entirely by text in its system prompt, not by any intrinsic awareness of its runtime environment. Essentially, if the prompt instructed the AI to believe it was Hamlet, it would adopt that persona—regardless of the actual model processing the request.

Tracing the request flow

With the source of the deception identified, the developer examined the configuration file again. The settings.json contained clear evidence of the backend switch:

{
  "env": {
    "ANTHROPIC_AUTH_TOKEN": "sk-32229524...",
    "ANTHROPIC_BASE_URL": "
    "ANTHROPIC_DEFAULT_OPUS_MODEL": "deepseek-v4-pro[1M]",
    "ANTHROPIC_MODEL": "deepseek-v4-pro[1m]"
  },
  "model": "deepseek-v4-pro[1m]"
}

The process now made sense:

User input is sent to the Claude Code client.
The client wraps the input in a system prompt that declares: "You are Claude Opus 4.8 by Anthropic."
The wrapped input is sent to api.deepseek.com/anthropic.
DeepSeek’s V4 Pro model processes the request.
The response is returned to the client and displayed to the user.

The AI acting as the interface is not the same as the AI processing the request. The client is the shell, the system prompt is the script, and DeepSeek is the brain executing the instructions.

The architectural flaw: a hardcoded identity

This behavior is not a random error but a design flaw in Claude Code’s architecture. The system prompt is a client-side template that assumes the backend will always be Anthropic’s official API. The code responsible for constructing the system prompt resembles this pseudocode structure:

function buildSystemPrompt(config) {
  // ❌ Ignores ANTHROPIC_BASE_URL
  // ❌ Ignores ANTHROPIC_MODEL
  return `You are Claude Opus 4.8, Anthropic's AI assistant...`;
}

There is no validation to check whether ANTHROPIC_BASE_URL points to Anthropic’s official domain. A more secure approach would include logic like:

if (baseUrl.includes('api.anthropic.com')) {
  // Use official Anthropic identity
} else {
  // Use neutral identity and warn the user
}

Naming conventions hint at assumptions

The configuration variables reinforce this assumption:

ANTHROPIC_BASE_URL
ANTHROPIC_AUTH_TOKEN
ANTHROPIC_MODEL

All prefixes suggest an implicit assumption that the backend will always be Anthropic. This baked-in expectation means that when users configure third-party APIs, the client’s identity layer remains static, presenting a false identity to the user.

The broader implications: transparency and trust

This issue extends beyond mere confusion. The deception creates several real-world risks:

Transparency: Users have no way to know who is actually processing their data.
Trust: Any misbehavior by third-party models could be incorrectly attributed to Anthropic.
Security: Sensitive data shared with an AI claiming to be Anthropic may be routed to an untrusted third party.
Debugging: Troubleshooting becomes nearly impossible when the AI’s claimed identity contradicts its actual configuration.

A secondary discovery: plaintext API keys at risk

During the investigation, a more pressing concern emerged. The settings.json file stores the ANTHROPIC_AUTH_TOKEN in plaintext, without encryption or obfuscation. This oversight is compounded by the fact that the AI assistant’s Read tool—a function that allows the model to access files during a session—can read settings.json without restrictions.

When a user asks the AI to "check my configuration," the model calls Read("~/.claude/settings.json"), retrieves the full content, and includes the token in the conversation context. If the ANTHROPIC_BASE_URL points to a third-party API, the token is transmitted to that third party as plaintext within the prompt.

This issue ties into two known security vulnerabilities:

CVE-2026-25725: A flaw in Claude Code’s sandbox that fails to protect settings.json, making it a confirmed attack surface.
GHSA-2jjv-qv24-fvm4: Reported by Microsoft Threat Intelligence, this CVE highlights that Claude Code’s file-reading tool lacks sandbox restrictions and can be induced to read sensitive files, such as credentials stored in /proc/.

This new exposure path exploits the same attack surface, requiring no advanced techniques to access sensitive data.

Looking ahead: the need for identity verification in AI clients

The incident underscores a critical gap in how AI clients handle third-party integrations. Identity verification should not be an afterthought but a core feature. AI clients must dynamically adapt their presented identity based on the actual backend, and they must provide clear, unambiguous warnings when third-party APIs are in use.

Moreover, the storage of sensitive credentials in plaintext files with unrestricted access poses a severe security risk. Encryption, access controls, and sandboxing must be prioritized to protect user data.

As AI agents become more embedded in workflows, the stakes of such oversights grow. Developers and companies must proactively address these architectural and security flaws to maintain user trust and data integrity.

AI summary

AI yardımcılarımızdan biri kim olduğunu sorduğumda "Ben Anthropic’in Claude Opus 4.8’iyim" yanıtını verdi. Oysa arka planda çalışan model DeepSeek’ti. Peki nasıl oldu da AI kendini farklı tanıtmaya başladı? Gizli kimlik değişiminin ardındaki gerçekleri keşfedin.