aisecwatch.com
DashboardVulnerabilitiesNewsResearchArchiveStatsDatasetFor devs
Subscribe
aisecwatch.com

Real-time AI security monitoring. Tracking AI-related vulnerabilities, safety and security incidents, privacy risks, research developments, and policy changes.

Navigation

VulnerabilitiesNewsResearchDigest ArchiveNewsletter ArchiveSubscribeData SourcesStatisticsDatasetAPIIntegrationsWidgetRSS Feed

Maintained by

Truong (Jack) Luu

Information Systems Researcher

AI Sec Watch

The security intelligence platform for AI teams

AI security threats move fast and get buried under hype and noise. Built by an Information Systems Security researcher to help security teams and developers stay ahead of vulnerabilities, privacy incidents, safety research, and policy developments.

Independent research. No sponsors, no paywalls, no conflicts of interest.

[TOTAL_TRACKED]
3,710
[LAST_24H]
1
[LAST_7D]
1
Daily BriefingSunday, May 17, 2026

No new AI/LLM security issues were identified today.

Latest Intel

page 243/371
VIEW ALL
01

CVE-2025-36730: A prompt injection vulnerability exists in Windsurft version 1.10.7 in Write mode using SWE-1 model. It is possible to

security
Oct 14, 2025

A prompt injection vulnerability (tricking an AI by hiding instructions in its input) exists in Windsurf version 1.10.7 when using Write mode with the SWE-1 model. An attacker can create a specially crafted file name that gets added to the user's prompt, causing Windsurf to follow malicious instructions instead of the user's intended commands. The vulnerability has a CVSS score (a 0-10 rating of how severe a vulnerability is) of 4.6, classified as medium severity.

NVD/CVE Database
02

A Mathematical Certification for Positivity Conditions in Neural Networks With Applications to Partial Monotonicity and Trustworthy AI

researchsafety
Oct 14, 2025

This research presents LipVor, an algorithm that mathematically verifies whether a trained neural network (a computer model with interconnected nodes that learns patterns) follows partial monotonicity constraints, which means outputs change predictably with certain inputs. The method works by testing the network at specific points and using mathematical properties to guarantee the network behaves correctly across its entire domain, potentially allowing neural networks to be used in critical applications like credit scoring where trustworthiness and predictable behavior are required.

IEEE Xplore (Security & AI Journals)
03

CVE-2025-62364: text-generation-webui is an open-source web interface for running Large Language Models. In versions through 3.13, a Loc

security
Oct 13, 2025

text-generation-webui (an open-source tool for running large language models through a web interface) versions 3.13 and earlier contain a Local File Inclusion vulnerability (a flaw where an attacker can read files they shouldn't have access to) in the character picture upload feature. An attacker can upload a text file with a symbolic link (a shortcut to another file) pointing to sensitive files, and the application will expose those files' contents through the web, potentially revealing passwords and system settings.

Fix: Update to version 3.14, where this vulnerability is fixed.

NVD/CVE Database
04

Privacy Protection of Dual Averaging Push for Decentralized Optimization via Zero-Sum Structured Perturbations

researchprivacy
Oct 13, 2025

This research addresses privacy risks in decentralized optimization (where multiple networked computers work together to solve a problem without a central coordinator) by proposing ZS-DDAPush, an algorithm that adds mathematical noise structures to protect sensitive node information during communication. The key innovation is that ZS-DDAPush achieves privacy protection while maintaining the accuracy and efficiency of the optimization process, avoiding the typical trade-offs seen in other privacy methods like differential privacy (adding statistical noise to protect individual data) or encryption (scrambling data so only authorized parties can read it).

IEEE Xplore (Security & AI Journals)
05

Do More With Less: Architecture-Agnostic and Data-Free Extraction Attack Against Tabular Model

securityresearch
Oct 13, 2025

Researchers developed TabExtractor, a tool that can steal tabular models (AI systems trained on spreadsheet-like data) without needing access to the original training data or knowing how the model was built. The attack works by creating synthetic data samples and using a special neural network architecture called a contrastive tabular transformer (CTT, a type of AI that learns by comparing similar and different examples) to reverse-engineer a clone of the victim model that performs almost as well as the original. This research shows that tabular models face serious security risks from extraction attacks.

IEEE Xplore (Security & AI Journals)
06

Really Unlearned? Verifying Machine Unlearning via Influential Sample Pairs

securityresearch
Oct 13, 2025

Machine unlearning allows AI models to forget the effects of specific training samples, but verifying whether this actually happened is difficult because existing checks (like backdoor attacks or membership inference attacks, which test if a model remembers data by trying to extract or manipulate it) can be fooled by a dishonest model provider who simply retrains the model to pass the test rather than truly unlearning. This paper proposes IndirectVerify, a formal verification method that uses pairs of connected samples (trigger samples that are unlearned and reaction samples that should be affected by that unlearning) with intentional perturbations (small changes to training data) to create indirect evidence that unlearning actually occurred, making it harder to fake.

IEEE Xplore (Security & AI Journals)
07

Action-Perturbation Backdoor Attacks on Partially Observable Multiagent Systems

securityresearch
Oct 13, 2025

Researchers discovered a type of backdoor attack (hidden malicious instructions planted in AI systems) on multiagent reinforcement learning systems, where one adversary agent uses its actions to trigger hidden failures in other agents' decision-making policies. Unlike previous attacks that assumed unrealistic direct control over what victims observe, this attack is more practical because it works through normal agent interactions in partially observable environments (where agents cannot always see what others are doing). The researchers developed a training method to help adversary agents efficiently trigger these backdoors with minimal suspicious actions.

IEEE Xplore (Security & AI Journals)
08

Engineering Trustworthy AI: A Developer Guide for Empirical Risk Minimization

researchsafety
Oct 13, 2025

AI systems used for important decisions often rely on empirical risk minimization (ERM, a training method that reduces prediction errors on known data) to build models, but these systems can suffer from unintentional bias, lack of transparency, and other risks. The EU has established Ethics Guidelines requiring trustworthy AI to meet seven key requirements, yet current ERM-based design prioritizes accuracy over trustworthiness. This article argues that developers need to balance four core objectives when designing AI systems: fairness (not discriminating against groups), privacy (protecting user data), robustness (resisting intentional attacks like fake news), and explainability (being transparent about how decisions are made).

IEEE Xplore (Security & AI Journals)
09

A Deep Reinforcement Learning Approach to Time Delay Differential Game Deception Resource Deployment

researchsecurity
Oct 10, 2025

This research proposes a new method for deploying cyber deception (defensive tricks to confuse attackers) in networks by combining deep reinforcement learning (a type of AI that learns by trial and error) with game theory that accounts for time delays. The method uses an algorithm called proximal policy optimization (PPO, a technique for training AI to make optimal decisions) to figure out where and when to place deception resources, and tests show it outperforms existing approaches in handling complex network attacks.

IEEE Xplore (Security & AI Journals)
10

Exploring Energy Landscapes for Minimal Counterfactual Explanations: Applications in Cybersecurity and Beyond

research
Oct 10, 2025

This research presents a new method for generating counterfactual explanations (minimal changes needed to flip an AI model's prediction), which are a type of explainable AI that helps users understand why models make specific decisions. The approach combines physics concepts like energy minimization and simulated annealing (an optimization technique inspired by metallurgy) to find the smallest, most realistic modifications needed to change a model's output, with applications tested in cybersecurity for Internet of Things devices (networked physical devices like sensors and cameras).

IEEE Xplore (Security & AI Journals)
Prev1...241242243244245...371Next