Online Safety Analysis for LLMs: A Benchmark, an Assessment, and a Path Forward
inforesearchPeer-ReviewedLLM-Specific
safetyresearch
Source: IEEE Xplore (Security & AI Journals)August 29, 2025
Summary
This research creates a benchmark and evaluation framework for online safety analysis of LLMs, which involves detecting unsafe outputs while the AI is generating text rather than after it finishes. The study tests various safety detection methods on different LLMs and finds that combining multiple methods together, called hybridization, can improve safety detection effectiveness. The work aims to help developers choose appropriate safety methods for their specific applications.
Classification
Attack SophisticationModerate
Impact (CIA+S)
safety
AI Component TargetedInference
Original source: http://ieeexplore.ieee.org/document/11145129
First tracked: March 16, 2026 at 04:14 PM
Classified by LLM (prompt v3) · confidence: 92%