AI Sec Watch: A Security Intelligence Platform for AI Systems

Luu, T.J.

Jailbreak and Guard Aligned Language Models With Only Few In-Context Demonstrations

inforesearchPeer-ReviewedLLM-Specific

securityresearch

Source: IEEE Xplore (Security & AI Journals)February 2, 2026

Summary

This research shows that large language models can be tricked or protected using in-context learning (ICL, a technique where an AI learns from examples provided in its current input rather than from training). The researchers developed two methods: an In-Context Attack that uses harmful examples to make LLMs produce unsafe outputs, and an In-Context Defense that uses refusal examples to strengthen safety. The study demonstrates that both attacking and defending LLM safety through carefully chosen demonstrations are effective and scalable.