AI Sec Watch: A Security Intelligence Platform for AI Systems

Luu, T.J.

Harmful Fine-tuning Attacks and Defenses for Large Language Models: A Survey

inforesearchPeer-ReviewedLLM-Specific

securityresearch

Source: ACM Digital Library (TOPS, DTRAP, CSUR)June 24, 2026

Summary

This academic survey examines harmful fine-tuning attacks (methods where attackers modify an AI model's training process to make it behave dangerously) and the defenses designed to stop them. The paper reviews different types of attacks, how they work, and various protection strategies researchers have developed to keep large language models safe from this threat.

Classification

Attack Type

Model Poisoning

Attack SophisticationModerate

Impact (CIA+S)

integritysafety

Related Issues

high

CVE-2024-37052: Deserialization of untrusted data can occur in versions of the MLflow platform running version 1.1.0 or newer, enabling

Similar attackNVD/CVE Database

info

Model Stability Defense Against Model Poisoning in Federated Learning

Monthly digest — independent AI security research

Original source: https://dl.acm.org/doi/abs/10.1145/3817114?af=R

First tracked: June 24, 2026 at 08:01 AM

Classified by LLM (prompt v3) · confidence: 92%