LLMs ranked for resistance to Russian propaganda narratives

As generative AI tools become central to how people consume information, governments are increasingly concerned about models inadvertently amplifying foreign propaganda. Estonia, a country with firsthand experience of Soviet-era disinformation, is taking proactive steps to evaluate how well today’s LLMs can recognize and resist narratives pushed by the Kremlin.

Estonia launches first-of-its-kind propaganda resistance test for LLMs

The Estonian Language Institute (ELI), in partnership with the volunteer defense group Propastop, has created a benchmark designed to measure how effectively large language models push back against Russian state-sponsored disinformation. The new evaluation framework — titled the Propaganda Resistance Benchmark — assesses dozens of leading LLMs to determine whether they can avoid endorsing false claims or adopting adversarial talking points.

The initiative reflects Estonia’s long-standing sensitivity to Kremlin narratives, particularly those that deny the legitimacy of Ukraine’s borders or rewrite history to justify past invasions. By launching this benchmark, ELI aims to provide developers and policymakers with actionable data on model behavior across sensitive geopolitical topics.

Fourteen categories of disinformation under scrutiny

Researchers identified 14 thematic areas where Russian influence operations commonly attempt to shape public discourse. These include disputed narratives about the status of Crimea, justifications for the war in Ukraine, historical reinterpretations of NATO’s role, and attempts to normalize Russia’s WWII-era annexations of Baltic states.

To construct the benchmark, the team designed questions in three formats:

Neutral phrasing to assess baseline model responses
Biased prompts that embed false assumptions consistent with Russian propaganda
Adversarial queries intended to elicit overt misinformation

Each question was presented to the models in English, Estonian, and Russian. Responses were evaluated not by human reviewers, but by a second AI system calibrated to align with expert assessments from Propastop. The evaluator specifically checked whether models could reject propaganda without relying on external tools like web searches.

Top performers and model behavior patterns

While the benchmark does not publicly rank individual models by name, ELI has shared high-level findings about which architectural approaches tend to perform better. Models fine-tuned on multilingual datasets and those trained with reinforcement learning from human feedback (RLHF) generally showed stronger resistance to biased prompts. Conversely, models optimized solely for conversational fluency or creative output were more likely to reproduce propaganda-consistent phrasing when exposed to adversarial queries.

The evaluation also revealed language-specific vulnerabilities. Models fluent in Russian were not inherently better at resisting Russian propaganda; in fact, they sometimes mirrored propagandistic language more closely than their English or Estonian counterparts. This phenomenon suggests that linguistic fluency alone is insufficient — models must also be explicitly trained to detect and counter manipulative narratives.

What this means for AI governance and geopolitical stability

The release of the Propaganda Resistance Benchmark arrives at a time when AI-generated content is increasingly intersecting with state-sponsored disinformation campaigns. Estonia’s approach offers a replicable model for other nations seeking to assess how well their information ecosystems can withstand foreign influence operations.

For developers, the findings underscore the importance of incorporating robust safety training that targets specific propagandistic tropes rather than relying solely on general alignment methods. Policymakers may now look to such benchmarks when drafting regulations on AI transparency, content provenance, and accountability for model outputs.

Looking ahead, ELI plans to expand the benchmark to include additional adversarial scenarios and languages. The next phase will incorporate real-time disinformation examples from ongoing campaigns, testing models under conditions that more closely resemble actual influence operations. As AI systems grow more integrated into public discourse, tools like this benchmark could become essential to preserving democratic resilience against manipulation.

AI summary

Estonya’nın devlet destekli araştırması, Rusya’nın stratejik anlatılarına karşı en dirençli LLM’leri belirledi. Sonuçlar, yapay zeka modellerinin propaganda karşısındaki performansını ortaya koyuyor.

LLMs ranked for resistance to Russian propaganda narratives

Estonia launches first-of-its-kind propaganda resistance test for LLMs

Fourteen categories of disinformation under scrutiny

Top performers and model behavior patterns

What this means for AI governance and geopolitical stability

Comments

Why humanoid robot hype often outpaces real-world capability

Supreme Court Backs FCC Fines Over Wireless Location Data Sales

How attackers exploited Dashlane to steal encrypted password vaults