Robustness Over Time: Understanding Adversarial Examples’ Effectiveness on Longitudinal Versions of Large Language Models
Summary
Researchers studied how well different versions of major LLMs (like GPT, Llama, and Qwen) resist adversarial attacks, which are inputs designed to trick AI systems into making mistakes, ignoring safety guidelines, or producing false information. They found that newer versions of these models don't always become more resistant to these attacks, and that simply making models larger doesn't guarantee better security.
Classification
Affected Vendors
Related Issues
Original source: http://ieeexplore.ieee.org/document/11426969
First tracked: April 2, 2026 at 08:03 PM
Classified by LLM (prompt v3) · confidence: 92%