AI Sec Watch: A Security Intelligence Platform for AI Systems

Luu, T.J.

Google DeepMind wants to know if chatbots are just virtue signaling

infonewsLLM-Specific

researchsafety

Source: MIT Technology ReviewFebruary 18, 2026

Summary

Researchers at Google DeepMind are investigating whether chatbots display genuine moral reasoning or are simply mimicking responses (virtue signaling). While studies show that large language models (LLMs, AI systems trained on massive amounts of text data) can give morally sound advice, the models are unreliable in practice because they often flip their answers when questioned, change responses based on how questions are formatted, and show sensitivity to tiny changes like swapping option labels from 'Case 1' to '(A)'. The researchers propose developing more rigorous evaluation methods to test whether moral behavior in LLMs is actually robust or just performative.

Solution / Mitigation

The source proposes a new line of research to develop more rigorous techniques for evaluating moral competence in LLMs. This would include tests designed to push models to change their responses to moral questions to reveal if they lack robust moral reasoning, and tests presenting models with variations of common moral problems to check whether they produce rote responses or more nuanced ones. However, the source notes this is "more a wish list than a set of ready-made solutions" and does not describe implemented fixes or updates.