iToverDose/Technology· 5 JUNE 2026 · 00:06

LLMs ranked for resistance to Russian propaganda narratives

Researchers in Estonia have benchmarked leading large language models to assess their ability to reject Kremlin-backed disinformation campaigns across multiple languages and sensitive geopolitical topics.

Ars Technica3 min read0 Comments

As generative AI tools become central to how people consume information, governments are increasingly concerned about models inadvertently amplifying foreign propaganda. Estonia, a country with firsthand experience of Soviet-era disinformation, is taking proactive steps to evaluate how well today’s LLMs can recognize and resist narratives pushed by the Kremlin.

Estonia launches first-of-its-kind propaganda resistance test for LLMs

The Estonian Language Institute (ELI), in partnership with the volunteer defense group Propastop, has created a benchmark designed to measure how effectively large language models push back against Russian state-sponsored disinformation. The new evaluation framework — titled the Propaganda Resistance Benchmark — assesses dozens of leading LLMs to determine whether they can avoid endorsing false claims or adopting adversarial talking points.

The initiative reflects Estonia’s long-standing sensitivity to Kremlin narratives, particularly those that deny the legitimacy of Ukraine’s borders or rewrite history to justify past invasions. By launching this benchmark, ELI aims to provide developers and policymakers with actionable data on model behavior across sensitive geopolitical topics.

Fourteen categories of disinformation under scrutiny

Researchers identified 14 thematic areas where Russian influence operations commonly attempt to shape public discourse. These include disputed narratives about the status of Crimea, justifications for the war in Ukraine, historical reinterpretations of NATO’s role, and attempts to normalize Russia’s WWII-era annexations of Baltic states.

To construct the benchmark, the team designed questions in three formats:

  • Neutral phrasing to assess baseline model responses
  • Biased prompts that embed false assumptions consistent with Russian propaganda
  • Adversarial queries intended to elicit overt misinformation

Each question was presented to the models in English, Estonian, and Russian. Responses were evaluated not by human reviewers, but by a second AI system calibrated to align with expert assessments from Propastop. The evaluator specifically checked whether models could reject propaganda without relying on external tools like web searches.

Top performers and model behavior patterns

While the benchmark does not publicly rank individual models by name, ELI has shared high-level findings about which architectural approaches tend to perform better. Models fine-tuned on multilingual datasets and those trained with reinforcement learning from human feedback (RLHF) generally showed stronger resistance to biased prompts. Conversely, models optimized solely for conversational fluency or creative output were more likely to reproduce propaganda-consistent phrasing when exposed to adversarial queries.

The evaluation also revealed language-specific vulnerabilities. Models fluent in Russian were not inherently better at resisting Russian propaganda; in fact, they sometimes mirrored propagandistic language more closely than their English or Estonian counterparts. This phenomenon suggests that linguistic fluency alone is insufficient — models must also be explicitly trained to detect and counter manipulative narratives.

What this means for AI governance and geopolitical stability

The release of the Propaganda Resistance Benchmark arrives at a time when AI-generated content is increasingly intersecting with state-sponsored disinformation campaigns. Estonia’s approach offers a replicable model for other nations seeking to assess how well their information ecosystems can withstand foreign influence operations.

For developers, the findings underscore the importance of incorporating robust safety training that targets specific propagandistic tropes rather than relying solely on general alignment methods. Policymakers may now look to such benchmarks when drafting regulations on AI transparency, content provenance, and accountability for model outputs.

Looking ahead, ELI plans to expand the benchmark to include additional adversarial scenarios and languages. The next phase will incorporate real-time disinformation examples from ongoing campaigns, testing models under conditions that more closely resemble actual influence operations. As AI systems grow more integrated into public discourse, tools like this benchmark could become essential to preserving democratic resilience against manipulation.

AI summary

Estonya’nın devlet destekli araştırması, Rusya’nın stratejik anlatılarına karşı en dirençli LLM’leri belirledi. Sonuçlar, yapay zeka modellerinin propaganda karşısındaki performansını ortaya koyuyor.

Comments

00
LEAVE A COMMENT
ID #2DMO9M

0 / 1200 CHARACTERS

Human check

6 + 4 = ?

Will appear after editor review

Moderation · Spam protection active

No approved comments yet. Be first.