Why OpenAI banned goblins in GPT-5.5—and what it reveals about AI training

The release of OpenAI’s latest language model, GPT-5.5, sent shockwaves through the tech community—not for its capabilities, but for an unexpected quirk buried in its code. Nestled within a configuration file in the open-source Codex repository was a directive so peculiar it defied explanation: Never mention goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals unless absolutely relevant to the user’s query. The restriction, repeated four times for emphasis, sparked a wave of curiosity and speculation. Why would one of the world’s most advanced AI labs issue a blanket ban on mythical creatures and urban wildlife?

The answer, it turns out, lies in the nuanced art of AI personality customization—a feature OpenAI quietly integrated into its training pipeline in mid-2025. This innovation allows users to select predefined personas like Professional, Friendly, or Quirky, each designed to shape interactions in distinct ways. However, the implementation revealed an unintended consequence: when users activated certain personalities, the model’s responses began to veer into unanticipated territory, including an overemphasis on fantastical creatures.

The birth of a goblin obsession

The discovery of the goblin restriction emerged from a routine audit of OpenAI’s open-source artifacts. Developer @arb8020 shared a snippet from the models.json file in the Codex repository, highlighting the unusual clause. Within hours, the post ricocheted across social platforms, with Reddit’s tech communities dubbing it a "restraining order" against mythical beings. The reaction wasn’t just about humor—it underscored a deeper tension in AI development.

Barron Roth, a Senior Project Manager at Google, demonstrated the phenomenon in action. His OpenClaw agent, powered by GPT-5.5, appeared to fixate on goblins, referencing them even in unrelated contexts. Others noticed the model’s tendency to describe technical bugs as "gremlins in the machine," a phrase that mirrored the phrasing in the banned list. The peculiar behavior raised questions: Was this a deliberate safeguard, a response to a data-poisoning attack, or an unintended side effect of reinforcement learning?

Speculation ran rampant. Some joked that data centers were secretly housing goblin workforces, while others leaned into the "Pink Elephant" problem—a phenomenon in prompt engineering where explicitly forbidding a concept can amplify its presence in the model’s attention mechanism. Theories proliferated: Had an OpenAI engineer been bullied by a raccoon during training? Was this a defensive measure against adversarial attacks? The ambiguity surrounding the restriction only deepened the intrigue.

OpenAI breaks silence with technical explanation

As the debate intensified, OpenAI stepped in to clarify the situation. In a blog post titled Where the goblins came from, the company explained that the restriction stemmed from its personality customization feature, introduced in July 2025. Unlike traditional post-training adjustments, OpenAI baked this feature into the model’s end-to-end training pipeline, embedding it as a core component of GPT-5.5’s architecture.

The feature allows users to toggle between predefined personas, each designed to influence the model’s tone and response style. For example:

Professional: Formal, structured responses ideal for workplace documentation.
Friendly: Conversational and approachable, suited for casual interactions.
Efficient: Concise and technical, prioritizing clarity over fluff.
Quirky: Humorous and metaphorical, injecting creativity into answers.
Cynical: Dry, sarcastic, and pragmatic, delivering advice with a sharp edge.

These personas operate alongside user-defined instructions and saved memories, though specific tasks—like generating code or drafting resumes—override the selected personality to ensure functional accuracy. The goblin restriction, OpenAI revealed, was a safeguard against the unintended reinforcement of certain behaviors when specific personalities were activated.

The broader implications of AI personality tuning

The goblin episode highlights the fragility of AI systems when subjected to personality customization. While OpenAI’s approach offers users greater control over interactions, it also introduces unpredictability. The reinforcement learning process, which relies on human feedback, can inadvertently amplify quirks or biases embedded in training data. OpenAI’s solution—a blanket ban on certain terms—may seem extreme, but it reflects the challenges of balancing flexibility with control in AI development.

For developers and enterprises integrating GPT-5.5, the lesson is clear: personality customization is a double-edged sword. It empowers users but requires careful oversight to prevent unintended consequences. As AI models grow more sophisticated, the need for robust guardrails will only intensify. The goblin restriction, once a curiosity, now serves as a case study in the delicate art of AI training—and the surprises that lie beneath the surface.

AI summary

OpenAI’nin GPT-5.5 modelindeki gulyabani ve diğer yaratıkları yasaklayan gizemli emirler, kişilik özelleştirme özelliğinin bir yan etkisi olarak ortaya çıktı. Gizemin ardındaki teknik gerçekler ve gelecekteki AI modelleri için dersler.

Why OpenAI banned goblins in GPT-5.5—and what it reveals about AI training

The birth of a goblin obsession

OpenAI breaks silence with technical explanation

The broader implications of AI personality tuning

Comments

RunPod Flash slashes AI development time by removing Docker dependencies

AI coding tools exposed by credential-stealing exploits in 2025

Event-based AI agents launch from Writer to cut human workflow delays