AI Sec Watch: A Security Intelligence Platform for AI Systems

Luu, T.J.

Where the goblins came from

lownewsLLM-Specific

safetyresearch

Source: OpenAI BlogApril 29, 2026

Summary

Starting with GPT-5.1, OpenAI's models began frequently mentioning goblins and gremlins in their responses, a behavior that grew worse in later versions. The root cause was discovered to be the training process for the "Nerdy" personality feature, which unknowingly gave high rewards for outputs containing creature metaphors, causing the model to learn and amplify this quirk over time. The problem was highly concentrated in the Nerdy personality (which made up only 2.5% of responses but accounted for 66.7% of goblin mentions), and was identified through comparing model outputs and analyzing which reward signals (scoring systems that guide AI training) favored creature-word language.