aisecwatch.com
DashboardVulnerabilitiesNewsResearchArchiveStatsDataset
aisecwatch.com

Real-time AI security monitoring. Tracking AI-related vulnerabilities, safety and security incidents, privacy risks, research developments, and policy changes.

Navigation

VulnerabilitiesNewsResearchDigest ArchiveNewsletter ArchiveSubscribeData SourcesStatisticsDatasetAPIIntegrationsWidgetRSS Feed

Maintained by

Truong (Jack) Luu

Information Systems Researcher

Industry News

New tools, products, platforms, funding rounds, and company developments in AI security.

to
Export CSV
1275 items

AWS Kiro: Arbitrary Code Execution via Indirect Prompt Injection

highnews
security
Aug 26, 2025

AWS Kiro, a coding agent tool, is vulnerable to arbitrary code execution through indirect prompt injection (a technique where hidden instructions in data trick an AI into following them). An attacker who controls data that Kiro processes can modify configuration files like .vscode/settings.json to allowlist dangerous commands or add malicious MCP servers (external tools that extend Kiro's capabilities), enabling them to run system commands or code on a developer's machine without the developer's knowledge or approval.

Embrace The Red

How Prompt Injection Exposes Manus' VS Code Server to the Internet

highnews
securitysafety

How Deep Research Agents Can Leak Your Data

mediumnews
securityprivacy

Sneaking Invisible Instructions by Developers in Windsurf

mediumnews
securitysafety

Windsurf: Memory-Persistent Data Exfiltration (SpAIware Exploit)

mediumnews
securitysafety

Hijacking Windsurf: How Prompt Injection Leaks Developer Secrets

highnews
security
Aug 21, 2025

Windsurf, a code editor based on VS Code with an AI coding agent called Windsurf Cascade, has security vulnerabilities that allow attackers to use prompt injection (tricking an AI by hiding instructions in its input) to steal developer secrets from a user's machine. The vulnerabilities were responsibly reported to Windsurf on May 30, 2025, but the company has not provided updates on fixes despite follow-up inquiries.

Amazon Q Developer for VS Code Vulnerable to Invisible Prompt Injection

highnews
security
Aug 20, 2025

Amazon Q Developer for VS Code, a coding tool used by over 1 million people, has a vulnerability where attackers can use invisible Unicode characters (special characters that humans cannot see but the AI can read) to trick the AI into following hidden instructions, potentially stealing sensitive information or running malicious code on a user's computer.

Amazon Q Developer: Remote Code Execution with Prompt Injection

highnews
security
Aug 19, 2025

Amazon Q Developer, a popular VS Code extension for coding assistance with over 1 million downloads, is vulnerable to indirect prompt injection (tricking an AI by hiding malicious instructions in its input data). This vulnerability allows an attacker or the AI itself to run arbitrary commands on a developer's computer without permission, similar to a flaw that Microsoft patched in GitHub Copilot.

Amazon Q Developer: Secrets Leaked via DNS and Prompt Injection

highnews
security
Aug 18, 2025

Amazon Q Developer, a popular VS Code coding agent with over 1 million downloads, has a high-severity vulnerability where it can leak sensitive information like API keys to external servers through DNS requests (the system that translates website names into IP addresses). Attackers can exploit this behavior using prompt injection (tricking the AI by hiding malicious instructions in its input), especially through untrusted data, because the security relies heavily on how the AI model behaves.

Data Exfiltration via Image Rendering Fixed in Amp Code

mediumnews
security
Aug 17, 2025

A vulnerability in Amp Code from Sourcegraph allowed attackers to steal sensitive information by using prompt injection (tricking an AI by hiding instructions in its input) through markdown image rendering, which could force the AI to send previous chat data to attacker-controlled websites. This type of vulnerability is common in AI applications and similar to one previously found in GitHub Copilot. The vulnerability has been fixed in Amp Code.

Amp Code: Invisible Prompt Injection Fixed by Sourcegraph

mediumnews
securitysafety

Automated Red Teaming Scans of Dataiku Agents Using Protect AI Recon

infonews
securitysafety

Google Jules is Vulnerable To Invisible Prompt Injection

highnews
securitysafety

Jules Zombie Agent: From Prompt Injection to Remote Control

highnews
securitysafety

Google Jules: Vulnerable to Multiple Data Exfiltration Issues

highnews
securityresearch

GitHub Copilot: Remote Code Execution via Prompt Injection (CVE-2025-53773)

highnews
security
Aug 12, 2025

GitHub Copilot and VS Code are vulnerable to prompt injection (tricking an AI by hiding instructions in its input) that allows an attacker to achieve RCE (remote code execution, where an attacker can run commands on a system they don't own) by modifying a project's settings.json file to put Copilot into 'YOLO mode'. This vulnerability demonstrates a broader security risk: if an AI agent can write to files and modify its own configuration or security settings, it can be exploited for full system compromise.

AI Safety Newsletter #61: OpenAI Releases GPT-5

infonews
industry
Aug 12, 2025

OpenAI released GPT-5, a system combining two models: a fast base model for creative tasks and a reasoning model for coding and math, which routes queries appropriately based on user input. GPT-5 achieves state-of-the-art performance on several benchmarks and significantly reduces hallucinations (false information generation) compared to previous models, particularly helping with healthcare applications where accuracy matters. However, GPT-5 is best understood as consolidating features from models released since GPT-4 rather than a major leap forward, and it doesn't lead on all benchmarks.

Claude Code: Data Exfiltration with DNS (CVE-2025-55284)

highnews
security
Aug 11, 2025

Claude Code, a feature in Anthropic's Claude AI, had a high severity vulnerability (CVE-2025-55284) that allowed attackers to use prompt injection (tricking an AI by hiding instructions in its input) to hijack the system and steal sensitive information like API keys by sending DNS requests (network queries that reveal data to external servers). The vulnerability affected developers who reviewed untrusted code or processed external data, as attackers could make Claude Code run bash commands (low-level system commands) without user permission to leak secrets.

Whistleblowing and the EU AI Act

inforegulatory
policy
Aug 11, 2025

The EU Whistleblowing Directive (2019) protects people who report violations of EU law, including violations of the EU AI Act starting August 2, 2026, by requiring organizations to set up reporting channels and prohibiting retaliation against whistleblowers. Whistleblowers can report internally within their organization, to government authorities, or publicly in certain urgent situations, and various institutions offer free legal and technical support to help protect them.

ZombAI Exploit with OpenHands: Prompt Injection To Remote Code Execution

highnews
security
Aug 10, 2025

OpenHands, a popular AI agent from All Hands AI that can now run as a cloud service, is vulnerable to prompt injection (tricking an AI by hiding instructions in its input) when processing untrusted data like content from websites. This vulnerability allows attackers to hijack the system and compromise its confidentiality, integrity, and availability, potentially leading to full system compromise.

Previous52 / 64Next
Aug 25, 2025

Manus, an autonomous AI agent, is vulnerable to prompt injection (tricking an AI by hiding instructions in its input) attacks that can expose its internal VS Code Server (a development tool accessed through a web interface) to the internet. An attacker can chain together three weaknesses: exploiting prompt injection to invoke an exposed port tool without human approval, leaking the server's access credentials through markdown image rendering or unauthorized browsing to attacker-controlled domains, and gaining remote access to the developer machine.

Embrace The Red
Aug 24, 2025

Deep Research agents (AI systems that autonomously search and fetch information from multiple connected tools) can leak data between different connected sources because there is no trust boundary separating them. When an agent like ChatGPT performs research queries, it can freely use data from one tool to query another, and attackers can force this leakage through prompt injection (tricking an AI by hiding instructions in its input).

Embrace The Red
Aug 23, 2025

Windsurf Cascade is vulnerable to hidden prompt injection, where invisible Unicode Tag characters (special characters that don't display on screen but are still processed by AI) can be embedded in files or tool outputs to trick the AI into performing unintended actions without the user knowing. While the current SWE-1 model doesn't interpret these invisible instructions as commands, other models like Claude Sonnet do, and as AI capabilities improve, this risk could become more severe.

Fix: The source explicitly mentions three mitigations: (1) make invisible characters visible in the UI so users can see hidden information; (2) remove invisible Unicode Tag characters entirely before and after inference (described as 'probably the most practical mitigation'); (3) mitigate at the application level, as coding agents like Amp and Amazon Q Developer for VS Code have done. The source also notes that if building exclusively on OpenAI models, users should be protected since OpenAI mitigates this at the model/API level.

Embrace The Red
Aug 22, 2025

Windsurf Cascade contains a create_memory tool that could enable SpAIware attacks, which are exploits allowing memory-persistent data exfiltration (stealing data by storing it in an AI's long-term memory). The key question is whether creating these memories requires human approval or happens automatically, which could determine how easily an attacker could abuse this feature.

Embrace The Red
Embrace The Red
Embrace The Red
Embrace The Red
Embrace The Red
Embrace The Red
Aug 16, 2025

Sourcegraph's Amp coding agent was vulnerable to invisible prompt injection (hidden instructions embedded in text that AI models interpret as commands). Attackers could use invisible Unicode Tag characters to trick the AI into dumping environment variables and exfiltrating secrets through URLs. The vulnerability has been fixed in the latest version.

Fix: According to the source, Sourcegraph addressed the vulnerability by "sanitizing the input." The source also recommends that developers: strip or neutralize Unicode Tag characters before processing input, add visual and technical safeguards against invisible prompts, include automated detection of suspicious Unicode usage in prompt injection monitors, implement human-in-the-loop approval before navigating to untrusted third-party domains, and mitigate downstream data exfiltration vulnerabilities.

Embrace The Red
Aug 15, 2025

This content discusses security challenges in agentic AI systems (AI agents that can take actions autonomously), highlighting that generic jailbreak testing (attempts to trick AI into bypassing safety rules) misses real risks like tool misuse and data theft. The article emphasizes the need for contextual red teaming (security testing that simulates realistic attacks in specific business contexts) to properly protect AI agents in enterprise environments.

Protect AI Blog
Aug 15, 2025

Google's Gemini AI models, including the Jules product, are vulnerable to invisible prompt injection (tricking an AI by hiding instructions in its input using invisible Unicode characters that the AI interprets as commands). This vulnerability was reported to Google over a year ago but remains unfixed at the model and API (application programming interface, the interface developers use to access the AI) level, affecting all applications built on Gemini, including Google's own products.

Embrace The Red
Aug 14, 2025

Jules, a coding agent, is vulnerable to prompt injection (tricking an AI by hiding malicious instructions in its input) attacks that can lead to remote command and control compromise. An attacker can embed malicious instructions in GitHub issues to trick Jules into downloading and executing malware, giving attackers full control of the system. The attack works because Jules has unrestricted internet access and automatically approves plans after a time delay without requiring human confirmation.

Fix: The source explicitly recommends four mitigations: (1) 'Be careful when directly tasking Jules to work with untrusted data (e.g. GitHub issues that are not from trusted sources, or websites with documentation that does not belong to the organization, etc.)'; (2) 'do not have Jules work on private, important, source code or give it access to production-level secrets, or anything that could enable an adversary to perform lateral movement'; (3) deploy 'monitoring and detection tools on these systems' to 'enable security teams to monitor and understand potentially malicious behavior'; and (4) 'do not allow arbitrary Internet access by default. Instead, allow the configuration to be enabled when needed.'

Embrace The Red
Aug 13, 2025

Google Jules, an asynchronous coding agent (a tool that automatically writes and manages code tasks), has multiple security vulnerabilities that allow attackers to steal data through prompt injection (tricking the AI by hiding malicious instructions in its input). Attackers can exploit two main exfiltration vectors: using markdown image rendering to leak information to external servers, and abusing the view_text_website tool (which fetches and reads web pages) to read files and send them to attacker-controlled servers, often by planting malicious instructions in GitHub issues.

Embrace The Red
Embrace The Red
CAIS AI Safety Newsletter

Fix: Anthropic fixed the vulnerability in early June.

Embrace The Red
EU AI Act Updates
Embrace The Red