Academic papers, new techniques, benchmarks, and theoretical findings in AI/LLM security.
This is a systematic literature review, a type of research paper that surveys and analyzes existing studies on differential privacy (a mathematical technique that adds carefully measured noise to data to protect individual privacy) in machine learning. The review examines how researchers are applying differential privacy to train AI models while keeping personal information safe from being extracted or misused.
This academic survey paper categorizes and describes different privacy concerns and system designs in collaborative deep learning (machine learning where multiple parties train models together while keeping their data private). The paper creates a taxonomy, which is a systematic classification scheme, to help organize the various approaches and challenges in this field.
Researchers have developed BioGuard, a defense method that protects biometric classifiers (AI systems that identify people using fingerprints, faces, or iris scans) against model extraction attacks (where attackers try to steal or copy the AI model by repeatedly querying it). The method works without needing malicious sample data to train it, making it practical for real-world deployment.
This research proposes HeteroFed, a framework for federated learning (a distributed machine learning approach where multiple devices train a shared model without sending raw data to a central server) that addresses privacy and performance challenges in edge intelligence scenarios. The framework uses four main techniques: personalized model construction for different devices, dynamic gradient clipping (limiting how much model parameters can change), adaptive noise addition for privacy protection, and improved model aggregation to maintain accuracy despite privacy protections.
Fix: The source proposes HeteroFed as a solution framework containing four specific mechanisms: (1) heterogeneous model construction to enable personalized model training for different smart devices, (2) dynamic gradient clipping to dynamically adjust the magnitude of gradients on models uploaded by devices, (3) adaptive noise addition to customize differential privacy (mathematical techniques that add noise to protect individual data) protection based on device model convergence status, and (4) deviation-aware model aggregation for accurate model aggregation to mitigate noise perturbation effects.
IEEE Xplore (Security & AI Journals)Researchers discovered that two widely-used encryption schemes for secure database searches (m-ORE and om-ORE, which allow multiple parties to query encrypted data without revealing the queries or data) can be attacked by a malicious client and server working together to insert fake records into the database. The team developed a new scheme called MORES that fixes this vulnerability while also making searches about one-third faster and more efficient than the older schemes.
Fix: The source proposes MORES, described as 'the first multi-client ORE scheme that preserves range-query functionality while provably resisting arbitrarily malicious participants.' The text indicates MORES can serve as 'an immediate drop-in replacement for encrypted-database systems that demand both efficiency and robustness in adversarial environments,' but does not provide implementation details, version numbers, or step-by-step deployment instructions.
IEEE Xplore (Security & AI Journals)This research paper examines macro-level collaborative leakage, which occurs when individually harmless data pieces reveal sensitive information when combined together. The authors conducted mathematical analyses to understand why this happens and found that the problem stems from how risk data (data that don't directly expose private information) correlate with sensitive information. While Gaussian distribution (a common bell-curve statistical pattern) can help prevent this type of leakage, the paper concludes that this protection is limited and more comprehensive security mechanisms are needed.
Adversarial examples (inputs crafted to fool AI systems) are a serious security risk for deep neural networks (AI systems with many layers), especially in physical-world attacks like fooling object detection in surveillance cameras. This research proposes Adversarial Spectrum Defense (ASD), a defense method that uses spectral decomposition (breaking down data into different frequency components) via Discrete Wavelet Transform (a mathematical technique to analyze patterns at multiple scales) to detect and defend against patch-based and texture-based adversarial attacks, and shows it achieves better protection when combined with Adversarial Training (training the AI on attack examples to make it more robust).
Fix: The source proposes Adversarial Spectrum Defense (ASD), which 'leverages spectral decomposition via Discrete Wavelet Transform (DWT) to analyze adversarial patterns across multiple frequency scales' and 'by integrating this spectral analysis with the off-the-shelf Adversarial Training (AT) model, ASD provides a comprehensive defense strategy against both patch-based and texture-based adversarial attacks.' The paper reports that 'ASD+AT achieved state-of-the-art (SOTA) performance against various attacks, outperforming the APs of previous defense methods by 21.73%'.
IEEE Xplore (Security & AI Journals)Researchers developed AdvFor, a black-box attack method (a way to trick an AI system without seeing its internal workings) that can fool image forgery localization models, which are AI systems trained to detect where images have been fake-edited or manipulated. The attack uses reinforcement learning (a technique where an AI learns by trial and error to maximize rewards) to craft minimal changes to images that make forgery detection fail, using only 7 queries per image, and the researchers tested it on multiple real-world models to show it works effectively.
Researchers developed DiffMI, a new attack that can recover people's facial identities from face recognition systems by reversing the embeddings (compressed numerical representations of faces). Unlike previous attacks, DiffMI doesn't require expensive training on specific targets and can work against unseen faces and new recognition models, achieving success rates between 84-93% against systems designed to resist such attacks.
```json { "summary": "This paper introduces AuthRF, a security system that protects RF sensing models (AI systems that interpret radio frequency signals from WiFi or radar) by using user-specific digital "passports" embedded in the signal processing pipeline. Valid passports allow the model to work correctly, while invalid or fake ones distort the signal and degrade performance, preventing unauthorized use. The approach is designed to be proactive and work during runtime, addressing limitation
Voice biometric systems (technology that identifies people by their voice) are vulnerable to replay attacks (where an attacker plays back a recorded voice to fool the system), but there hasn't been enough realistic training data to build good defenses. This research created RIRplay, a simulated database that realistically mimics how replay attacks actually happen across different acoustic environments, which improved detection performance significantly when tested on real-world voice spoofing challenges.
This research proposes a new method for private set operations (PSO, techniques that let organizations securely compare or combine datasets without revealing private information) that reduces the computational burden on client devices. The approach uses secret sharing (splitting data into pieces so no single party can see the whole picture) to allow servers to do most of the work while clients can stay offline, making it practical for large-scale collaborative research across institutions like hospitals.
A Q1 2026 security report by OWASP documents major AI and agentic AI (AI systems that can take autonomous actions) exploits, showing a shift from theoretical risks to real-world attacks targeting AI agent identities, permissions, and supply chains. Key incidents include a Mexican government breach where attackers used Claude to automate reconnaissance and exploitation, affecting 150 GB of sensitive data, along with other incidents involving prompt injection (tricking AI by hiding malicious instructions in its input), privilege abuse, and supply-chain vulnerabilities in AI tools.
This academic survey examines how well large language model-based agents (AI systems that use LLMs to make decisions and take actions) can generalize, meaning how effectively they perform on new tasks or situations they weren't specifically trained for. The paper reviews research across different domains to understand what factors help or limit an agent's ability to adapt and work reliably in unfamiliar contexts.
Asynchronous federated learning (AFL, where multiple devices train a shared AI model without waiting for each other to finish) is faster than synchronous methods but more vulnerable to Byzantine attacks (when some devices send false or corrupted data to sabotage the model). Researchers propose Belisa, a framework that uses feature fingerprints (unique patterns in how local models represent data) to identify and filter out malicious devices, improving robustness and efficiency in real-world scenarios where devices have different data and hardware capabilities.
Fix: The source proposes Belisa as a Byzantine-robust AFL framework that addresses this vulnerability. Belisa works by leveraging a reference model trained on publicly available data to quantify feature fingerprints (discrepancies between feature representations of local models) and filtering out malicious models through clustering. According to the paper, Belisa lowered average test error rates to 0.42x that of baseline methods under attack scenarios and accelerated aggregation by an average of 12.3x compared to other methods.
IEEE Xplore (Security & AI Journals)Quantum computing poses a major threat to current security systems because it can break traditional encryption methods that protect critical infrastructure and cloud services. This paper examines how quantum computing affects different layers of infrastructure (from applications to networks) and proposes moving toward quantum-resistant cryptography (encryption methods designed to withstand quantum computer attacks) as a protective strategy. The authors advocate for collaboration across sectors to develop and implement these new security approaches before quantum threats become critical.
This research paper proposes a method to detect FDI attacks (false data injection, where attackers insert fake sensor readings into control systems) by using encoding techniques to transform measurement data into a different mathematical space. The approach aims to catch stealthy FDI attacks that are designed to evade traditional detection methods by disguising themselves as normal system behavior.
Referring video object segmentation (RVOS, the task of identifying and outlining objects in videos based on text descriptions) is used in safety-critical applications like autonomous driving, but the deep neural networks that power these systems are vulnerable to adversarial perturbations (tiny, intentional changes to input data designed to fool AI models). This research demonstrates for the first time that RVOS models can be reliably attacked using a method called xM-ICM, which corrupts both visual and text information to mislead the models, and shows this attack works even when attackers have limited information about the system.
LitCVit is a lightweight AI model designed to detect malicious encrypted network traffic (data sent over secure connections) without needing to decrypt it or manually extract features. The model uses self-supervised learning (training where the AI learns patterns from unlabeled data) and vision transformers (a type of neural network architecture) to analyze patterns across multiple data packets and flows (sequences of related network communications) while running much faster than existing approaches, achieving 98% accuracy on test datasets.
This paper presents CIBPU, a new secure branch prediction unit (BPU, a component that helps processors predict which instructions to execute next) that protects against attacks trying to infer sensitive information by observing how the BPU behaves. Unlike previous designs that either isolated the BPU physically or used encryption with frequent key updates, CIBPU uses redundant storage (extra copies of data), smart indexing, and encryption without periodic key changes to hide branch conflicts (situations where different instructions compete for the same storage space) from attackers. The researchers tested CIBPU in simulators and on real hardware, finding it adds only about 2-4% performance slowdown, which is better than other secure branch prediction approaches.