aisecwatch.com
DashboardVulnerabilitiesNewsResearchArchiveStatsDataset
aisecwatch.com

Real-time AI security monitoring. Tracking AI-related vulnerabilities, safety and security incidents, privacy risks, research developments, and policy changes.

Navigation

VulnerabilitiesNewsResearchDigest ArchiveNewsletter ArchiveSubscribeData SourcesStatisticsDatasetAPIIntegrationsWidgetRSS Feed

Maintained by

Truong (Jack) Luu

Information Systems Researcher

AI Sec Watch

The security intelligence platform for AI teams

AI security threats move fast and get buried under hype and noise. Built by an Information Systems Security researcher to help security teams and developers stay ahead of vulnerabilities, privacy incidents, safety research, and policy developments.

[TOTAL_TRACKED]
2,754
[LAST_24H]
29
[LAST_7D]
174
Daily BriefingWednesday, April 1, 2026
>

Claude Code Source Leaked via npm Packaging Error: Anthropic confirmed that Claude Code's source code was accidentally leaked through an npm package containing a source map file, exposing nearly 2,000 TypeScript files and over 512,000 lines of code. Users who downloaded the affected version on March 31, 2026 may have received a trojanized HTTP client (compromised software) containing malware.

>

AI Tool Discovers Zero-Days in Vim and GNU Emacs Within Minutes: Researcher Hung Nguyen used Anthropic's Claude Code to quickly discover zero-day exploits (previously unknown security flaws) in Vim and GNU Emacs that would allow attackers to execute arbitrary code by tricking users into opening malicious files. Claude Code generated proof-of-concept exploits (working examples of attacks) within minutes, demonstrating how AI can accelerate vulnerability discovery.

Latest Intel

page 123/276
VIEW ALL
01

OpenAI Explains URL-Based Data Exfiltration Mitigations in New Paper

securityresearch
Critical This Week5 issues
critical

GHSA-6vh2-h83c-9294: PraisonAI: Python Sandbox Escape via str Subclass startswith() Override in execute_code

CVE-2026-34938GitHub Advisory DatabaseApr 1, 2026
Apr 1, 2026
>

Critical Python Sandbox Escape in PraisonAI: PraisonAI's `execute_code()` function can be bypassed by creating a custom string subclass with an overridden `startswith()` method, allowing attackers to run arbitrary OS commands on the host system (CVE-2026-34938). This is especially dangerous because many deployments auto-approve code execution, so attackers could trigger it silently through indirect prompt injection (sneaking malicious instructions into the AI's input).

>

Multiple High-Severity Vulnerabilities in ONNX Format: ONNX (Open Neural Network Exchange, a standard format for sharing machine learning models) versions before 1.21.0 contain several high-severity vulnerabilities including path traversal via symlink (CVE-2026-27489, CVSS 8.7) and improper validation allowing attackers to craft malicious models that overwrite internal object properties (CVE-2026-34445). These flaws allow attackers to read arbitrary files outside intended directories or manipulate model behavior.

Feb 5, 2026

OpenAI published a paper describing new mitigations for URL-based data exfiltration (a technique where attackers trick AI agents into sending sensitive data to attacker-controlled websites by embedding malicious URLs in inputs). The issue was originally reported to OpenAI in 2023 but received little attention at the time, though Microsoft implemented a fix for the same vulnerability in Bing Chat.

Fix: Microsoft applied a fix via a Content-Security-Policy header (a security rule that controls which external resources a webpage can load) in May 2023 to generally prevent loading of images. OpenAI's specific mitigations are discussed in their new paper 'Preventing URL-Based Data Exfiltration in Language-Model Agents', but detailed mitigation methods are not described in this source text.

Embrace The Red
02

CVE-2025-62616: AutoGPT is a platform that allows users to create, deploy, and manage continuous artificial intelligence agents that aut

security
Feb 4, 2026

AutoGPT is a platform for creating and managing AI agents that automate workflows. Before version 0.6.34, the SendDiscordFileBlock feature had an SSRF vulnerability (server-side request forgery, where an attacker tricks the server into making unwanted requests to internal systems) because it didn't filter user-provided URLs before accessing them.

Fix: This issue has been patched in autogpt-platform-beta-v0.6.34. Users should update to this version or later.

NVD/CVE Database
03

Smart AI Policy Means Examining Its Real Harms and Benefits

policysafety
Feb 4, 2026

This article discusses both harms and benefits of AI technologies, arguing that policy should focus on the specific context and impact of each AI use rather than broadly promoting or banning AI. The text warns that AI can automate bias (perpetuating discrimination in decisions about housing, employment, and arrests), consume vast resources, and replace human judgment in high-stakes decisions, while acknowledging beneficial uses like helping scientists analyze data or improving accessibility for people with disabilities.

EFF Deeplinks Blog
04

CVE-2026-25475: OpenClaw is a personal AI assistant. Prior to version 2026.1.30, the isValidMedia() function in src/media/parse.ts allow

security
Feb 4, 2026

OpenClaw, a personal AI assistant, had a vulnerability in its isValidMedia() function (the code that checks if media files are safe to access) that allowed attackers to read any file on a system by using special file paths, potentially stealing sensitive data. This flaw was fixed in version 2026.1.30.

Fix: Update OpenClaw to version 2026.1.30 or later, as the issue has been patched in that version.

NVD/CVE Database
05

Microsoft Develops Scanner to Detect Backdoors in Open-Weight Large Language Models

securityresearch
Feb 4, 2026

Microsoft created a lightweight scanner that can detect backdoors (hidden malicious behaviors) in open-weight LLMs (large language models that have publicly available internal parameters) by identifying three distinctive signals: a specific attention pattern when trigger phrases are present, memorized poisoning data leakage, and activation by fuzzy triggers (partial variations of trigger phrases). The scanner works without needing to retrain the model or know the backdoor details in advance, though it only functions on open-weight models and works best on trigger-based backdoors.

Fix: Microsoft's scanner performs detection through a three-step process: it "first extracts memorized content from the model and then analyzes it to isolate salient substrings. Finally, it formalizes the three signatures above as loss functions, scoring suspicious substrings and returning a ranked list of trigger candidates." The tool works across common GPT-style models and requires access to the model files but no additional model training or prior knowledge of the backdoor behavior.

The Hacker News
06

Detecting backdoored language models at scale

securityresearch
Feb 4, 2026

Researchers have released new work on detecting backdoors (hidden malicious behaviors embedded in a model's weights during training) in open-weight language models to improve trust in AI systems. A backdoored model appears normal most of the time but changes behavior when triggered by a specific input, like a hidden phrase, making detection difficult. The research explores whether backdoored models show systematic differences from clean models and whether their trigger phrases can be reliably identified.

Microsoft Security Blog
07

X offices raided in France as UK opens fresh investigation into Grok

safetypolicy
Feb 3, 2026

X's French offices were raided by Paris prosecutors investigating suspected illegal data extraction and possession of child sexual abuse material (CSAM, images depicting the sexual abuse of children), while the UK's Information Commissioner's Office launched a separate investigation into Grok (Elon Musk's AI chatbot) for its ability to create harmful sexualized images and videos without people's consent. The investigations were triggered by reports that Grok generated sexual deepfakes (fake sexual images created using real photos of women without permission) that were shared on X.

BBC Technology
08

CVE-2026-24887: Claude Code is an agentic coding tool. Prior to version 2.0.72, due to an error in command parsing, it was possible to b

security
Feb 3, 2026

Claude Code is an agentic coding tool (software that can automatically write and execute code) that had a vulnerability in versions before 2.0.72 where attackers could bypass safety confirmation prompts and execute untrusted commands through the find command by injecting malicious content into the tool's context window (the input area where the AI reads information). The vulnerability has a CVSS score (a 0-10 severity rating) of 7.7, meaning it is considered high severity.

Fix: This issue has been patched in version 2.0.72.

NVD/CVE Database
09

CVE-2026-24053: Claude Code is an agentic coding tool. Prior to version 2.0.74, due to a Bash command validation flaw in parsing ZSH clo

security
Feb 3, 2026

Claude Code, an agentic coding tool (AI software that writes and manages code), had a vulnerability in versions before 2.0.74 where a flaw in how it validated Bash commands (a Unix shell language) allowed attackers to bypass directory restrictions and write files outside the intended folder without permission from the user. The attack required the user to be running ZSH (a different Unix shell) and to allow untrusted content into Claude Code's input.

Fix: This issue has been patched in version 2.0.74. Users should update Claude Code to version 2.0.74 or later.

NVD/CVE Database
10

CVE-2026-24052: Claude Code is an agentic coding tool. Prior to version 1.0.111, Claude Code contained insufficient URL validation in it

security
Feb 3, 2026

Claude Code, a tool that helps AI write and execute code automatically, had a security flaw before version 1.0.111 where it didn't properly check website addresses (URLs) before making requests to them. The app used a simple startsWith() check (looking only at the beginning of a domain name), which meant attackers could register fake domains like modelcontextprotocol.io.example.com that would be mistakenly trusted, allowing the tool to send data to attacker-controlled sites without the user knowing.

Fix: Update Claude Code to version 1.0.111 or later, as the issue has been patched in that version.

NVD/CVE Database
Prev1...121122123124125...276Next
critical

CVE-2026-34162: FastGPT is an AI Agent building platform. Prior to version 4.14.9.5, the FastGPT HTTP tools testing endpoint (/api/core/

CVE-2026-34162NVD/CVE DatabaseMar 31, 2026
Mar 31, 2026
critical

CVE-2025-15379: A command injection vulnerability exists in MLflow's model serving container initialization code, specifically in the `_

CVE-2025-15379NVD/CVE DatabaseMar 30, 2026
Mar 30, 2026
critical

CVE-2026-33873: Langflow is a tool for building and deploying AI-powered agents and workflows. Prior to version 1.9.0, the Agentic Assis

CVE-2026-33873NVD/CVE DatabaseMar 27, 2026
Mar 27, 2026
critical

Attackers exploit critical Langflow RCE within hours as CISA sounds alarm

CSO OnlineMar 27, 2026
Mar 27, 2026