Microsoft recently patched a critical security flaw in Copilot for Microsoft 365 that allowed malicious actors to extract sensitive information such as two-factor authentication codes and email contents. The vulnerability, rated as maximum severity, was disclosed by the researchers who identified it, revealing how poorly secured large language models (LLMs) can be manipulated into revealing private data.
The exploit leveraged a fundamental limitation in modern AI systems: their inability to reliably distinguish between legitimate user instructions and malicious prompts embedded in external content. Microsoft’s Copilot, like other AI assistants, processes requests from third-party sources—such as summarized emails or drafted responses—without a robust mechanism to validate the origin or intent of those instructions. This blind trust in external content creates a dangerous loophole that attackers can exploit to bypass security measures.
How the bypass worked
Microsoft and other LLM providers implement guardrails to prevent AI assistants from performing risky actions, such as submitting web forms or sending emails, which could lead to data exfiltration. However, researchers discovered a workaround using markup language, which allows formatting elements like headings, lists, and links without HTML tags. Alternatively, attackers could wrap sensitive data inside HTML tags such as <img> or <form>, embedding it in content that Copilot would process.
When Copilot interpreted these formatted or hidden instructions, it would execute requests that inadvertently sent sensitive information to an attacker-controlled server. The captured data—including 2FA codes—appeared in server logs, effectively bypassing the AI’s security controls and exposing user accounts to unauthorized access.
AI’s blind spot: trusting external content
The vulnerability highlights a persistent challenge for AI security: distinguishing between user-authorized prompts and malicious instructions hidden in external sources. Unlike traditional software, LLMs cannot easily verify the authenticity or intent behind every piece of content they process. This limitation forces providers like Microsoft to rely on reactive measures—such as patching known exploits or adding ad hoc guardrails—rather than implementing a foolproof solution.
Microsoft’s response involved assigning the flaw a critical severity rating and deploying a patch to address the issue. However, the incident underscores a broader concern: as AI assistants become more integrated into enterprise workflows, their susceptibility to manipulation through external content poses a growing risk to data security.
Looking ahead: securing AI interactions
The discovery of this Copilot vulnerability serves as a reminder of the evolving threat landscape surrounding AI tools. Organizations relying on AI assistants must remain vigilant, updating their security protocols to account for these new attack vectors. Meanwhile, AI developers face the ongoing challenge of building systems that can reliably identify and reject malicious instructions, even when embedded in otherwise legitimate content.
For now, the best defense may lie in layered security—combining AI guardrails with traditional security measures to mitigate the risks posed by these sophisticated vulnerabilities.
AI summary
Microsoft'un Copilot AI platformunda keşfedilen kritik güvenlik açığı, saldırganların ikinci faktör doğrulama kodlarını çalmasına olanak tanıyordu. Detayları ve koruma önerilerini inceleyin.