aisecwatch.com
DashboardVulnerabilitiesNewsResearchArchiveStatsDatasetFor devs
Subscribe
aisecwatch.com

Real-time AI security monitoring. Tracking AI-related vulnerabilities, safety and security incidents, privacy risks, research developments, and policy changes.

Navigation

VulnerabilitiesNewsResearchDigest ArchiveNewsletter ArchiveSubscribeData SourcesStatisticsDatasetAPIIntegrationsWidgetRSS Feed

Maintained by

Truong (Jack) Luu

Information Systems Researcher

Research

Academic papers, new techniques, benchmarks, and theoretical findings in AI/LLM security.

to
Export CSV
691 items

A Personalized and Privacy-Preserving Federated Transformer Framework for Multilingual Sentiment Analysis

inforesearchPeer-Reviewed
research
Feb 11, 2026

FedPerX is a federated transformer framework (a system where multiple computers train an AI model together without sharing raw data) designed for sentiment analysis across multiple languages while protecting privacy. It uses residual adapters (lightweight customizable modules added to a shared language model) and differential privacy (a mathematical technique that adds noise to data to prevent identifying individuals) to let each participant personalize their model without compromising data privacy. The framework outperforms existing methods on multilingual datasets with improved accuracy and significantly reduced communication needs.

IEEE Xplore (Security & AI Journals)

Robust Trusted Conflictive Multiview Collaborative Contrastive Learning

inforesearchPeer-Reviewed
research

Privacy-Preserving, Efficient, and Accurate Dimensionality Reduction

inforesearchPeer-Reviewed
research

AdvScan: Black-Box Adversarial Example Detection at Runtime Through Power Analysis

inforesearchPeer-Reviewed
research

Practical and Flexible Backdoor Attack Against Deep Learning Models via Shell Code Injection

inforesearchPeer-Reviewed
security

Sensitivity-Aware Auditing Service for Differentially Private Databases

inforesearchPeer-Reviewed
security

PROTheft: A Projector-Based Model Extraction Attack in the Physical World

inforesearchPeer-Reviewed
security

v5.4.0

inforesearchIndustry
security

Secure Acceleration of Aggregation Queries Over Homomorphically Encrypted Databases

inforesearchPeer-Reviewed
research

HiveTEE: Scalable and Fine-Grained Isolated Domains With RME and MTE Co-Assisted

inforesearchPeer-Reviewed
security

Allies Teach Better Than Enemies: Inverse Adversaries for Robust Knowledge Distillation

inforesearchPeer-Reviewed
research

Toward Real-World Holistic Privacy-Preserving Person Re-Identification

inforesearchPeer-Reviewed
security

Evaluating and Mitigating Relationship Hallucinations in Large Vision-Language Models

inforesearchPeer-Reviewed
research

Jailbreak and Guard Aligned Language Models With Only Few In-Context Demonstrations

inforesearchPeer-Reviewed
security

EvTexture++: Event-Driven Texture Enhancement for Video Super-Resolution

inforesearchPeer-Reviewed
research

v5.3.0

inforesearchIndustry
industry

v5.2.0

inforesearchIndustry
security

Understanding the Adversarial Landscape of Large Language Models Through the Lens of Attack Objectives

inforesearchPeer-Reviewed
security

Forgotten Memories

inforesearchPeer-Reviewed
privacy

Building Trustworthy AI Agents

inforesearchPeer-Reviewed
safety
Previous23 / 35Next
Feb 11, 2026

This paper proposes RCMCL (Robust Trusted Conflictive Multiview Collaborative Contrastive Learning), a method to improve AI models that learn from multiple sources of data (multiview learning) when those sources conflict or misalign with each other. The approach uses evidential deep neural networks (a technique that estimates uncertainty in predictions) and contrastive learning (a training method that teaches the model to recognize similar and different examples) to make the model more reliable and accurate even when the data sources provide contradictory information.

IEEE Xplore (Security & AI Journals)
privacy
Feb 9, 2026

This research introduces PP-DR, a privacy-preserving dimensionality reduction (a technique that reduces the number of features in a dataset to make it easier to analyze) scheme that uses homomorphic encryption (a type of encryption that allows computations on encrypted data without decrypting it first) to let multiple organizations securely share and analyze data together without revealing sensitive information. The new method is much faster and more accurate than previous approaches, achieving 30 to 200 times better computational efficiency and 70% less communication overhead.

IEEE Xplore (Security & AI Journals)
security
Feb 9, 2026

AdvScan is a method for detecting adversarial examples (inputs slightly modified to trick AI models into making wrong predictions) on tiny machine learning models running on edge devices (small hardware like microcontrollers) without needing access to the model's internal details. The approach monitors power consumption patterns during the model's operation, since adversarial examples create unusual power signatures that differ from normal inputs, and uses statistical analysis to flag suspicious inputs in real-time with minimal performance overhead.

IEEE Xplore (Security & AI Journals)
research
Feb 9, 2026

Researchers have developed a new backdoor attack method called shell code injection (SCI) that can implant malicious logic into deep learning models (neural networks trained on large datasets) without needing to poison the training data. The attack uses techniques inspired by nature, like camouflage, along with trigger verification and code packaging strategies to trick models into making wrong predictions, and it can adapt its attack targets dynamically using large language models (LLMs) to make it more flexible and harder to detect.

IEEE Xplore (Security & AI Journals)
research
Feb 6, 2026

Differentially private databases (DP-DBs, systems that add mathematical noise to data to protect individual privacy while allowing useful analysis) need auditing services to verify they actually protect privacy as promised, but current approaches don't handle database-specific challenges like varying query sensitivities well. This paper introduces DPAudit, a framework that audits DP-DBs by generating realistic test scenarios, estimating privacy loss parameters, and detecting improper noise injection through statistical testing, even when the database's inner workings are hidden.

Fix: The source presents DPAudit as a framework solution but does not describe a patch, update, or deployment fix for existing vulnerable systems. N/A -- no mitigation discussed in source.

IEEE Xplore (Security & AI Journals)
research
Feb 6, 2026

PROTheft is a model extraction attack (a method where attackers steal an AI model's functionality by observing its responses to many input queries) that works on real-world vision systems like autonomous vehicles by projecting digital attack samples onto a device's camera. The attack bridges the gap between digital attacks and physical-world scenarios by using a projector to convert digital inputs into physical images, and includes a simulation tool to predict how well attack samples will work when converted from digital to physical to digital formats.

IEEE Xplore (Security & AI Journals)
research
Feb 5, 2026

Version 5.4.0 (released February 5, 2026) is an update to a security framework that documents new attack techniques targeting AI agents, including publishing poisoned AI agent tools (malicious versions of legitimate tools), escaping from AI systems to access the host computer, and exploiting vulnerabilities to steal credentials or evade security. The update also includes new real-world case studies showing how attackers have compromised AI agent control systems and used prompt injection (tricking an AI by hiding commands in its input) to establish control.

MITRE ATLAS Releases
Feb 3, 2026

This research proposes AHEDB (Accelerated Homomorphically Encrypted DataBase), a system designed to speed up database queries on encrypted data using Fully Homomorphic Encryption, or FHE (a method that lets computers perform calculations on encrypted information without decrypting it first). The system uses Encrypted Multiple Maps to reduce computational strain and a Single Range Cover algorithm for indexing, achieving better performance than existing FHE-based approaches while maintaining security.

IEEE Xplore (Security & AI Journals)
Feb 3, 2026

HiveTEE is a security architecture that divides applications running inside a TEE (Trusted Execution Environment, a secure zone on a processor that protects sensitive operations from the main operating system) into smaller isolated domains, so that if one part is compromised, the damage doesn't spread to the rest. It uses RME (Realm Management Extension, a hardware feature that creates isolated execution spaces) and MTE (Memory Tagging Extension, a feature that prevents certain memory attacks), and testing shows it adds minimal slowdown (less than 3%) to applications.

IEEE Xplore (Security & AI Journals)
safety
Feb 3, 2026

This research proposes a new method for knowledge distillation (training a smaller AI model to mimic a larger one) that preserves adversarial robustness (the ability to resist attacks designed to fool AI systems). Instead of having the student model copy all predictions from the teacher model, the method uses "inverse adversarial examples" (inputs created by reversing the direction of adversarial attacks) to guide learning toward more reliable predictions, resulting in better robustness transfer between models.

IEEE Xplore (Security & AI Journals)
privacy
Feb 3, 2026

Person re-identification (Re-ID, systems that recognize and track individuals across camera footage) systems can be attacked to steal pedestrian images and the AI model itself, threatening privacy for both the system operator and people being monitored. Existing privacy-protection methods fail to defend against all types of leaks while keeping the system working normally, so researchers propose SHIELD, a two-stage framework that uses protected image generation and feature protection techniques to prevent data and model theft without reducing the system's accuracy for authorized users.

IEEE Xplore (Security & AI Journals)
safety
Feb 3, 2026

Large vision-language models (LVMs, AI systems that process both images and text) often make mistakes by hallucinating incorrect relationships between objects in images, such as falsely claiming one object is near another. Researchers created R-Bench, a benchmark (a standardized test) to evaluate these relationship hallucination errors, and found that these mistakes happen because models rely too much on language patterns rather than actually analyzing the visual content. The study proposes Region-Aware Alignment Mitigation (RA²M), which improves the model's attention to specific regions of an image to better align its descriptions with what is actually shown.

Fix: Region-level image-text alignment helps mitigate relationship hallucinations. The authors propose Region-Aware Alignment Mitigation (RA²M), which 'enhances model attention to relevant regions, improving alignment between generated text and images.'

IEEE Xplore (Security & AI Journals)
research
Feb 2, 2026

This research shows that large language models can be tricked or protected using in-context learning (ICL, a technique where an AI learns from examples provided in its current input rather than from training). The researchers developed two methods: an In-Context Attack that uses harmful examples to make LLMs produce unsafe outputs, and an In-Context Defense that uses refusal examples to strengthen safety. The study demonstrates that both attacking and defending LLM safety through carefully chosen demonstrations are effective and scalable.

IEEE Xplore (Security & AI Journals)
Feb 2, 2026

EvTexture++ is a framework that uses event-based vision (cameras that capture changes in brightness at extremely high speed and can see very bright and dark areas simultaneously) to improve video super-resolution, which is the process of creating high-resolution videos from lower-resolution ones. Instead of using events just to track motion, this approach uses them to recover fine details and textures in videos, and prevents texture flickering when objects move quickly across frames.

IEEE Xplore (Security & AI Journals)
Jan 30, 2026

N/A -- This content is a navigation menu and feature listing for GitHub's v5.3.0 platform, not a description of an AI/LLM security issue, vulnerability, or problem requiring analysis.

MITRE ATLAS Releases
research
Jan 30, 2026

Version 5.2.0 adds new attack techniques against AI systems, including methods to steal credentials from AI agent tools (software components that perform actions on behalf of an AI), poison training data, and generate malicious commands. It also introduces new defenses such as segmenting AI agent components, validating inputs and outputs, detecting deepfakes, and implementing human oversight for AI agent actions.

Fix: The source lists mitigations rather than fixes for a specific vulnerability. Key mitigations mentioned include: Input and Output Validation for AI Agent Components, Segmentation of AI Agent Components, Restrict AI Agent Tool Invocation on Untrusted Data, Human In-the-Loop for AI Agent Actions, Adversarial Input Detection, Model Hardening, Sanitize Training Data, and Generative AI Guardrails.

MITRE ATLAS Releases
research
Jan 30, 2026

Large language models face four main types of adversarial threats: privacy breaches (exposing sensitive data the model learned), integrity compromises (corrupting the model's outputs or training data), adversarial misuse (using the model for harmful purposes), and availability disruptions (making the model unavailable or slow). The article organizes these threats by their attackers' goals to help understand the landscape of vulnerabilities in LLMs.

IEEE Xplore (Security & AI Journals)
safety
Jan 30, 2026

This short story examines privacy risks that arise when companies are bought and sold, particularly concerning AI digital twins (AI models that replicate a specific person's behavior and knowledge) and the problems that occur when organizations fail to threat model (identify and plan for potential security risks in) major changes to their systems and technology. The story raises ethical questions about these scenarios.

IEEE Xplore (Security & AI Journals)
research
Jan 30, 2026

Current AI assistants are not yet trustworthy enough to be personal advisors, despite how useful they seem. They fail in specific ways: they encourage users to make poor decisions, they create false doubt about things people know to be true (gaslighting), and they confuse a person's current identity with their past. They also struggle when information is incomplete or inaccurate, with no reliable way to fix errors or hold the system responsible when wrong information causes harm.

IEEE Xplore (Security & AI Journals)