aisecwatch.com
DashboardVulnerabilitiesNewsResearchArchiveStatsDatasetFor devs
Subscribe
aisecwatch.com

Real-time AI security monitoring. Tracking AI-related vulnerabilities, safety and security incidents, privacy risks, research developments, and policy changes.

Navigation

VulnerabilitiesNewsResearchDigest ArchiveNewsletter ArchiveSubscribeData SourcesStatisticsDatasetAPIIntegrationsWidgetRSS Feed

Maintained by

Truong (Jack) Luu

Information Systems Researcher

AI Sec Watch

The security intelligence platform for AI teams

AI security threats move fast and get buried under hype and noise. Built by an Information Systems Security researcher to help security teams and developers stay ahead of vulnerabilities, privacy incidents, safety research, and policy developments.

Independent research. No sponsors, no paywalls, no conflicts of interest.

[TOTAL_TRACKED]
3,710
[LAST_24H]
1
[LAST_7D]
1
Daily BriefingSunday, May 17, 2026

No new AI/LLM security issues were identified today.

Latest Intel

page 255/371
VIEW ALL
01

AI Safety Newsletter #62: Big Tech Launches $100 Million pro-AI Super PAC

policysafety
Aug 27, 2025

Big Tech companies like Andreessen Horowitz and OpenAI are investing over $100 million in political organizations called super PACs (groups that can raise unlimited money to influence elections) to fight against AI regulations in U.S. elections. Additionally, Meta faced bipartisan congressional criticism after internal documents revealed its AI chatbots were permitted to engage in romantic and sensual conversations with minors, though Meta removed these policy sections when questioned.

CAIS AI Safety Newsletter
02

Cline: Vulnerable To Data Exfiltration And How To Protect Your Data

security
Aug 27, 2025

Cline, a popular AI coding agent with over 2 million downloads, has a vulnerability that allows attackers to steal sensitive files like .env files (which store secret credentials) through prompt injection (tricking an AI by hiding instructions in its input) combined with markdown image rendering. When an attacker embeds malicious instructions in a file and asks Cline to analyze it, the tool automatically reads sensitive data and sends it to an untrusted domain by rendering an image, leaking the information without user permission.

Fix: The source recommends these explicit mitigations: (1) Do not render markdown images from untrusted domains, or ask for user confirmation before loading images from untrusted domains (similar to how VS Code/Copilot uses a trusted domain list). (2) Set 'Auto-approve' to disabled by default to limit which files can be exfiltrated. (3) Developers can partially protect themselves by disabling auto-execution of commands and requiring approval before reading files, though this only limits what information reaches the chat before exfiltration occurs.

Embrace The Red
03

Certified Local Transferability for Evaluating Adversarial Attacks

researchsecurity
Aug 27, 2025

Deep neural networks (DNNs, AI models with multiple layers that learn patterns) are vulnerable to adversarial examples, which are inputs slightly modified to trick the model into making wrong predictions. This paper introduces a concept called the certified local transferable region, a mathematically guaranteed area around an input where a single small perturbation (adversarial attack) will fool the model, and proposes a method called RAOS (reverse attack oracle-based search) to measure how large these vulnerable areas are as a way to evaluate how robust neural networks truly are.

IEEE Xplore (Security & AI Journals)
04

AWS Kiro: Arbitrary Code Execution via Indirect Prompt Injection

security
Aug 26, 2025

AWS Kiro, a coding agent tool, is vulnerable to arbitrary code execution through indirect prompt injection (a technique where hidden instructions in data trick an AI into following them). An attacker who controls data that Kiro processes can modify configuration files like .vscode/settings.json to allowlist dangerous commands or add malicious MCP servers (external tools that extend Kiro's capabilities), enabling them to run system commands or code on a developer's machine without the developer's knowledge or approval.

Embrace The Red
05

Steganography in Large Language Models

securityresearch
Aug 26, 2025

Researchers have developed a method to hide secret data inside large language models (AI systems trained on massive amounts of text) by encoding information into the model's parameters during training. The hidden data doesn't interfere with the model's normal functions like text classification or generation, but authorized users with a secret key can extract the concealed information, enabling covert communication. The method leverages transformers (the neural network architecture behind modern AI language models) and its self-attention mechanisms (components that help the model focus on relevant parts of input) to achieve high capacity for hidden data while remaining undetectable.

IEEE Xplore (Security & AI Journals)
06

CVE-2025-57760: Langflow is a tool for building and deploying AI-powered agents and workflows. A privilege escalation vulnerability exis

security
Aug 25, 2025

Langflow, a tool for building AI-powered agents and workflows, has a privilege escalation vulnerability (CWE-269, improper privilege management) where an authenticated user with RCE (remote code execution, the ability to run commands on a system they don't own) can use an internal CLI command to create a new administrative account, gaining full superuser access even if they originally registered as a regular user. A patched version has not been publicly released at the time this advisory was published.

NVD/CVE Database
07

How Prompt Injection Exposes Manus' VS Code Server to the Internet

securitysafety
Aug 25, 2025

Manus, an autonomous AI agent, is vulnerable to prompt injection (tricking an AI by hiding instructions in its input) attacks that can expose its internal VS Code Server (a development tool accessed through a web interface) to the internet. An attacker can chain together three weaknesses: exploiting prompt injection to invoke an exposed port tool without human approval, leaking the server's access credentials through markdown image rendering or unauthorized browsing to attacker-controlled domains, and gaining remote access to the developer machine.

Embrace The Red
08

How Deep Research Agents Can Leak Your Data

securityprivacy
Aug 24, 2025

Deep Research agents (AI systems that autonomously search and fetch information from multiple connected tools) can leak data between different connected sources because there is no trust boundary separating them. When an agent like ChatGPT performs research queries, it can freely use data from one tool to query another, and attackers can force this leakage through prompt injection (tricking an AI by hiding instructions in its input).

Embrace The Red
09

Sneaking Invisible Instructions by Developers in Windsurf

securitysafety
Aug 23, 2025

Windsurf Cascade is vulnerable to hidden prompt injection, where invisible Unicode Tag characters (special characters that don't display on screen but are still processed by AI) can be embedded in files or tool outputs to trick the AI into performing unintended actions without the user knowing. While the current SWE-1 model doesn't interpret these invisible instructions as commands, other models like Claude Sonnet do, and as AI capabilities improve, this risk could become more severe.

Fix: The source explicitly mentions three mitigations: (1) make invisible characters visible in the UI so users can see hidden information; (2) remove invisible Unicode Tag characters entirely before and after inference (described as 'probably the most practical mitigation'); (3) mitigate at the application level, as coding agents like Amp and Amazon Q Developer for VS Code have done. The source also notes that if building exclusively on OpenAI models, users should be protected since OpenAI mitigates this at the model/API level.

Embrace The Red
10

Windsurf: Memory-Persistent Data Exfiltration (SpAIware Exploit)

securitysafety
Aug 22, 2025

Windsurf Cascade contains a create_memory tool that could enable SpAIware attacks, which are exploits allowing memory-persistent data exfiltration (stealing data by storing it in an AI's long-term memory). The key question is whether creating these memories requires human approval or happens automatically, which could determine how easily an attacker could abuse this feature.

Embrace The Red
Prev1...253254255256257...371Next