Exposing the Ghost in the Transformer: Abnormal Detection for Large Language Models via Hidden State Forensics
Summary
Large language models (LLMs, which are AI systems trained on vast amounts of text) are vulnerable to serious attacks like hallucinations (making up false information), jailbreaks (tricking the AI into ignoring its safety rules), and backdoors (hidden malicious instructions inserted during training). This research proposes a detection method using hidden state forensics (analyzing the internal numerical patterns that flow through the model's layers) to identify abnormal or malicious behavior in real-time, achieving over 95% accuracy with minimal computational cost.
Classification
Affected Vendors
Related Issues
CVE-2024-37052: Deserialization of untrusted data can occur in versions of the MLflow platform running version 1.1.0 or newer, enabling
CVE-2024-27444: langchain_experimental (aka LangChain Experimental) in LangChain before 0.1.8 allows an attacker to bypass the CVE-2023-
Original source: http://ieeexplore.ieee.org/document/11480194
First tracked: April 27, 2026 at 08:03 PM
Classified by LLM (prompt v3) · confidence: 92%