aisecwatch.com
DashboardVulnerabilitiesNewsResearchArchiveStatsDataset
aisecwatch.com

Real-time AI security monitoring. Tracking AI-related vulnerabilities, safety and security incidents, privacy risks, research developments, and policy changes.

Navigation

VulnerabilitiesNewsResearchDigest ArchiveNewsletter ArchiveSubscribeData SourcesStatisticsDatasetAPIIntegrationsWidgetRSS Feed

Maintained by

Truong (Jack) Luu

Information Systems Researcher

Research

Academic papers, new techniques, benchmarks, and theoretical findings in AI/LLM security.

to
Export CSV
227 items

The Evolution of AI Compliance Assistance from Reactive Support to Co-Agency

inforesearchPeer-Reviewed
policysafety
Mar 6, 2026

A banking group implemented a retrieval-augmented AI-powered compliance assistant (a system where AI pulls in external compliance documents to answer questions) to help with regulatory requirements while maintaining human oversight. The article identifies key challenges with this approach, including authority illusion (over-trusting the AI's answers), unclear responsibility for decisions, loss of human judgment about context, and gaps in understanding how the system works, then proposes a four-phase framework to help organizations move from passive AI assistants toward systems where AI and humans reason together.

AIS eLibrary (Journal of AIS, CAIS, etc.)

QuEST: Quantization-Conditioned Efficient Stealthy Trojan

inforesearchPeer-Reviewed
security

A Fine-Tuning Data Recovery Attack on Generative Language Models via Backdooring

inforesearchPeer-Reviewed
security

Toward Robust Radio Frequency Fingerprint Identification: A Federated Learning Framework With Feature Alignment

inforesearchPeer-Reviewed
research

AdaParse: Personalized Fingerprinting for Visual Generative Model Reverse Engineering

inforesearchPeer-Reviewed
research

A Differentially Private Quadrature Amplitude Modulation Mechanism for Federated Analytics

inforesearchPeer-Reviewed
research

Extracting Training Dialogue Data From Large Language Model-Based Task Bots

inforesearchPeer-Reviewed
security

Efficient Byzantine-Robust Privacy-Preserving Federated Learning via Dimension Compression

inforesearchPeer-Reviewed
research

Are Large Vision-Language Models Robust to Adversarial Visual Transformations?

inforesearchPeer-Reviewed
security

OwnerHunter: Multilingual Website Owner Identification Powered by Large Language Model

inforesearchPeer-Reviewed
research

Cert-SSBD: Certified Backdoor Defense With Sample-Specific Smoothing Noises

inforesearchPeer-Reviewed
security

A Novel Perspective on Gradient Defense: Layer-Specific Protection Against Privacy Leakage

inforesearchPeer-Reviewed
security

Risk-Aware Privacy Preservation for LLM Inference

inforesearchPeer-Reviewed
security

Decoupled and Privacy-Preserving Key Generation in ABE Under the Minimal Disclosure Principle

inforesearchPeer-Reviewed
security

PPOM-Attack: A Substitute Model-Free Perturbation Prediction and Optimization Method for Black-Box Adversarial Attack Against Face Recognition

inforesearchPeer-Reviewed
security

PromptFuzz: Harnessing Fuzzing Techniques for Robust Testing of Prompt Injection in LLMs

inforesearchPeer-Reviewed
security

Secure and Efficient Model Training Framework for Multiuser Semantic Communications via Over-the-Air Mixup

inforesearchPeer-Reviewed
research

Adversarial Training for Graph Neural Networks via Graph Subspace Energy Optimization

inforesearchPeer-Reviewed
research

Model Hijacking Attack in Federated Learning

inforesearchPeer-Reviewed
security

Model Inversion Attack Against Federated Unlearning

inforesearchPeer-Reviewed
security
Previous3 / 12Next
research
Mar 5, 2026

QuEST is a new framework that makes backdoor attacks (hidden malicious behaviors injected into AI models) more stealthy and efficient when models undergo quantization (compressing models to use less memory and computation). The framework uses special training techniques and parameter sharing to hide the attack from detection systems while reducing the computational resources needed to carry out the attack.

IEEE Xplore (Security & AI Journals)
research
Mar 5, 2026

Researchers discovered a new attack called Lure that targets generative language models (GLMs, which are AI systems that generate text) during the fine-tuning process (when developers customize an open-source model with their own data). By hiding malicious code in the source code of an open-source model, attackers can trick a fine-tuned model into remembering and later revealing the proprietary data used to customize it through specially crafted prompts (input text designed to trigger specific outputs).

IEEE Xplore (Security & AI Journals)
Mar 5, 2026

This research addresses security challenges in Internet of Things (IoT) devices by improving radio frequency fingerprint identification (RFFI, a method that uniquely identifies devices based on their wireless signal characteristics) using federated learning (a distributed AI training approach where data stays on local devices rather than being sent to a central server). The paper proposes a feature alignment strategy to handle non-IID data (data that isn't uniformly distributed across different receivers), which occurs when different receivers have different hardware and environmental conditions, and demonstrates that the approach achieves 90.83% identification accuracy with improved stability compared to existing federated learning methods.

Fix: The paper proposes a feature alignment strategy based on federated learning that guides each client (receiver) to learn aligned intermediate feature representations during local training, effectively mitigating the adverse impact of distribution shifts on model generalization in heterogeneous wireless environments.

IEEE Xplore (Security & AI Journals)
security
Mar 5, 2026

AdaParse is a framework that can identify the specific settings (hyperparameters, which are configuration values that control how a model behaves) used to create AI-generated images by analyzing those images in detail. Unlike older methods that use a single general fingerprint (a characteristic pattern), AdaParse creates customized fingerprints for each image, allowing it to distinguish between images made with different settings across many different generative models (AI systems that create images).

IEEE Xplore (Security & AI Journals)
privacy
Mar 5, 2026

This research proposes a new method called DP-QAM (Differentially Private Quadrature Amplitude Modulation) to solve privacy and communication problems in federated analytics (a system where multiple devices analyze data together without sending raw data to a central server). The method takes advantage of natural errors that occur during data compression and wireless transmission to add extra privacy protection, while balancing privacy, communication efficiency, and accuracy.

IEEE Xplore (Security & AI Journals)
privacy
Mar 5, 2026

Large Language Models (LLMs, AI systems trained on massive amounts of text) used in task-oriented dialogue systems (AI assistants designed to help users complete specific goals like booking travel) can accidentally memorize and leak sensitive training data, including personal information like phone numbers and complete travel schedules. Researchers demonstrated new attack techniques that can extract thousands of pieces of training data from these systems with over 70% accuracy in the best cases. The paper identifies factors that influence how much data LLMs memorize in dialogue systems but does not propose specific fixes.

IEEE Xplore (Security & AI Journals)
security
Mar 5, 2026

This research addresses vulnerabilities in Federated Learning (FL, a system where multiple computers train an AI model together without sharing their raw data), which faces attacks from malicious participants and privacy leaks from gradient updates (the numerical adjustments that improve the model). The authors propose a new method combining homomorphic encryption (a way to perform calculations on encrypted data without decrypting it) and dimension compression (reducing the size of data while keeping important relationships intact) to protect privacy and defend against Byzantine attacks (when malicious actors send corrupted data to sabotage the system) while reducing computational costs by 25 to 35 times.

IEEE Xplore (Security & AI Journals)
research
Mar 5, 2026

Large vision-language models (LVLMs, which are AIs that understand both images and text) can be attacked using simple visual transformations, such as rotations or color changes, that fool them into giving wrong answers. Researchers found that combining multiple harmful transformations can make these attacks more effective, and they can be optimized using gradient approximation (a mathematical technique to find the best attack parameters). This research highlights a previously overlooked safety risk in how well LVLMs resist these kinds of adversarial attacks (attempts to trick AI systems).

IEEE Xplore (Security & AI Journals)
Mar 2, 2026

OwnerHunter is a system that uses large language models (AI trained on vast amounts of text) to identify who owns a website by analyzing webpage content across multiple languages. It improves on older methods that struggled when webpages listed many names or were written in non-English languages, using strategies like checking multiple sources on a page and verifying results to accurately determine the true owner.

IEEE Xplore (Security & AI Journals)
research
Feb 24, 2026

Deep neural networks can be attacked through backdoors, where attackers secretly poison training data to make the model misclassify certain inputs while appearing normal otherwise. This paper proposes Cert-SSBD, a defense method that uses randomized smoothing (adding random noise to samples) with sample-specific noise levels, optimized per sample using stochastic gradient ascent, combined with a new certification approach to make models more resistant to these attacks.

Fix: The proposed Cert-SSBD method addresses the issue by employing stochastic gradient ascent to optimize the noise magnitude for each sample, applying this sample-specific noise to multiple poisoned training sets to retrain smoothed models, aggregating predictions from multiple smoothed models, and introducing a storage-update-based certification method that dynamically adjusts each sample's certification region to improve certification performance.

IEEE Xplore (Security & AI Journals)
research
Feb 24, 2026

Gradient leakage attacks (methods that steal private data by analyzing the mathematical updates sent between computers in federated learning, where AI training happens across multiple devices) pose privacy risks in federated learning systems. Researchers discovered that different layers of neural networks (sections that process information at different stages) leak different amounts of private information, so they created Layer-Specific Gradient Protection (LSGP), which applies stronger privacy protection to layers that leak more sensitive data rather than protecting all layers equally.

IEEE Xplore (Security & AI Journals)
privacy
Feb 24, 2026

When users send prompts to LLM services like ChatGPT, sensitive personal information (such as names, addresses, or ID numbers) can leak out, even when basic privacy protections are used. This paper presents Rap-LI, a framework that identifies which parts of a user's input contain sensitive data and applies stronger privacy protection to those specific parts, rather than treating all data equally.

IEEE Xplore (Security & AI Journals)
Feb 23, 2026

This research proposes a new privacy-preserving method for key generation in ABE (attribute-based encryption, a system that lets users control access to data based on their personal attributes). The method follows a principle called Minimal Disclosure, where users only reveal the specific attributes they need to prove, rather than exposing all their attributes. The protocol separates attribute verification from key generation into two steps, uses batch verification to improve performance, and introduces metrics to measure how well it resists attacks that try to infer hidden user attributes.

IEEE Xplore (Security & AI Journals)
research
Feb 23, 2026

Researchers developed PPOM-Attack, a method to fool face recognition (FR) systems by generating adversarial images (slightly altered photos that trick AI into misidentifying someone). Unlike earlier attacks that used substitute models (simpler AI systems trained to mimic the target system), PPOM-Attack directly queries the real face recognition system to learn how to create effective perturbations (tiny pixel changes), achieving 21.7% higher success rates while keeping the altered images looking natural.

IEEE Xplore (Security & AI Journals)
research
Feb 23, 2026

Prompt injection attacks (tricking an AI by hiding malicious instructions in its input) pose a serious security risk to Large Language Models, as attackers can overwrite a model's original instructions to manipulate its responses. Researchers developed PromptFuzz, a testing framework that uses fuzzing techniques (automatically generating many variations of input data to find weaknesses) to systematically evaluate how well LLMs resist these attacks. Testing showed that PromptFuzz was highly effective at finding vulnerabilities, ranking in the top 0.14% of attackers in a real competition and successfully exploiting 92% of popular LLM-integrated applications tested.

IEEE Xplore (Security & AI Journals)
security
Feb 23, 2026

This paper presents SIMix, a training framework for systems where multiple users learn AI models together over wireless networks while protecting their private data. The system uses Over-the-Air Mixup (OAM, a technique that combines data from multiple users through wireless transmission to hide sensitive information) and groups users strategically to reduce communication needs by up to 25% while defending against model inversion attacks (attempts to reconstruct private training data from a trained model) and label inference attacks (guessing what category a user's data belongs to).

Fix: The paper proposes integrating Over-the-Air Mixup with label-aware user grouping, including a closed-form Tx-Rx scaling optimization that minimizes mean square error under channel noise, and an extended max-clique algorithm that dynamically partitions users into groups with minimal intra-label similarity to reduce model inversion attack success rates.

IEEE Xplore (Security & AI Journals)
Feb 19, 2026

Graph neural networks (GNN, a type of AI that learns from data organized as interconnected nodes and edges) are vulnerable to adversarial topology perturbation, which means attackers can fool them by slightly changing the graph structure. This paper proposes AT-GSE, a new adversarial training method (a technique that strengthens AI models by training them on intentionally corrupted inputs) that uses graph subspace energy, a measure of how stable a graph is, to improve GNN robustness against these attacks.

IEEE Xplore (Security & AI Journals)
research
Feb 19, 2026

Researchers discovered a new attack called HijackFL that can hijack machine learning models in federated learning systems (where multiple computers train a shared model without sharing raw data). The attack works by adding tiny pixel-level changes to input samples so the model misclassifies them as something else, while appearing normal to the server and other participants, achieving much higher success rates than previous methods.

IEEE Xplore (Security & AI Journals)
privacy
Feb 19, 2026

Researchers discovered a new attack called federated unlearning inversion attack (FUIA) that can extract private data from federated unlearning (FU, a process designed to remove a specific person's data influence from shared machine learning models across multiple computers). The attack works by having a malicious server observe the model's parameter changes during the unlearning process and reconstruct the forgotten data, undermining the privacy protection that FU is supposed to provide.

Fix: The source mentions that 'two potential defense strategies that introduce a trade-off between privacy protection and model performance' were explored, but no specific details, names, or implementations of these defense strategies are provided in the text.

IEEE Xplore (Security & AI Journals)