New tools, products, platforms, funding rounds, and company developments in AI security.
AWS Kiro, a coding agent tool, is vulnerable to arbitrary code execution through indirect prompt injection (a technique where hidden instructions in data trick an AI into following them). An attacker who controls data that Kiro processes can modify configuration files like .vscode/settings.json to allowlist dangerous commands or add malicious MCP servers (external tools that extend Kiro's capabilities), enabling them to run system commands or code on a developer's machine without the developer's knowledge or approval.
Windsurf, a code editor based on VS Code with an AI coding agent called Windsurf Cascade, has security vulnerabilities that allow attackers to use prompt injection (tricking an AI by hiding instructions in its input) to steal developer secrets from a user's machine. The vulnerabilities were responsibly reported to Windsurf on May 30, 2025, but the company has not provided updates on fixes despite follow-up inquiries.
Amazon Q Developer for VS Code, a coding tool used by over 1 million people, has a vulnerability where attackers can use invisible Unicode characters (special characters that humans cannot see but the AI can read) to trick the AI into following hidden instructions, potentially stealing sensitive information or running malicious code on a user's computer.
Amazon Q Developer, a popular VS Code extension for coding assistance with over 1 million downloads, is vulnerable to indirect prompt injection (tricking an AI by hiding malicious instructions in its input data). This vulnerability allows an attacker or the AI itself to run arbitrary commands on a developer's computer without permission, similar to a flaw that Microsoft patched in GitHub Copilot.
Amazon Q Developer, a popular VS Code coding agent with over 1 million downloads, has a high-severity vulnerability where it can leak sensitive information like API keys to external servers through DNS requests (the system that translates website names into IP addresses). Attackers can exploit this behavior using prompt injection (tricking the AI by hiding malicious instructions in its input), especially through untrusted data, because the security relies heavily on how the AI model behaves.
A vulnerability in Amp Code from Sourcegraph allowed attackers to steal sensitive information by using prompt injection (tricking an AI by hiding instructions in its input) through markdown image rendering, which could force the AI to send previous chat data to attacker-controlled websites. This type of vulnerability is common in AI applications and similar to one previously found in GitHub Copilot. The vulnerability has been fixed in Amp Code.
GitHub Copilot and VS Code are vulnerable to prompt injection (tricking an AI by hiding instructions in its input) that allows an attacker to achieve RCE (remote code execution, where an attacker can run commands on a system they don't own) by modifying a project's settings.json file to put Copilot into 'YOLO mode'. This vulnerability demonstrates a broader security risk: if an AI agent can write to files and modify its own configuration or security settings, it can be exploited for full system compromise.
OpenAI released GPT-5, a system combining two models: a fast base model for creative tasks and a reasoning model for coding and math, which routes queries appropriately based on user input. GPT-5 achieves state-of-the-art performance on several benchmarks and significantly reduces hallucinations (false information generation) compared to previous models, particularly helping with healthcare applications where accuracy matters. However, GPT-5 is best understood as consolidating features from models released since GPT-4 rather than a major leap forward, and it doesn't lead on all benchmarks.
Claude Code, a feature in Anthropic's Claude AI, had a high severity vulnerability (CVE-2025-55284) that allowed attackers to use prompt injection (tricking an AI by hiding instructions in its input) to hijack the system and steal sensitive information like API keys by sending DNS requests (network queries that reveal data to external servers). The vulnerability affected developers who reviewed untrusted code or processed external data, as attackers could make Claude Code run bash commands (low-level system commands) without user permission to leak secrets.
The EU Whistleblowing Directive (2019) protects people who report violations of EU law, including violations of the EU AI Act starting August 2, 2026, by requiring organizations to set up reporting channels and prohibiting retaliation against whistleblowers. Whistleblowers can report internally within their organization, to government authorities, or publicly in certain urgent situations, and various institutions offer free legal and technical support to help protect them.
OpenHands, a popular AI agent from All Hands AI that can now run as a cloud service, is vulnerable to prompt injection (tricking an AI by hiding instructions in its input) when processing untrusted data like content from websites. This vulnerability allows attackers to hijack the system and compromise its confidentiality, integrity, and availability, potentially leading to full system compromise.
Manus, an autonomous AI agent, is vulnerable to prompt injection (tricking an AI by hiding instructions in its input) attacks that can expose its internal VS Code Server (a development tool accessed through a web interface) to the internet. An attacker can chain together three weaknesses: exploiting prompt injection to invoke an exposed port tool without human approval, leaking the server's access credentials through markdown image rendering or unauthorized browsing to attacker-controlled domains, and gaining remote access to the developer machine.
Deep Research agents (AI systems that autonomously search and fetch information from multiple connected tools) can leak data between different connected sources because there is no trust boundary separating them. When an agent like ChatGPT performs research queries, it can freely use data from one tool to query another, and attackers can force this leakage through prompt injection (tricking an AI by hiding instructions in its input).
Windsurf Cascade is vulnerable to hidden prompt injection, where invisible Unicode Tag characters (special characters that don't display on screen but are still processed by AI) can be embedded in files or tool outputs to trick the AI into performing unintended actions without the user knowing. While the current SWE-1 model doesn't interpret these invisible instructions as commands, other models like Claude Sonnet do, and as AI capabilities improve, this risk could become more severe.
Fix: The source explicitly mentions three mitigations: (1) make invisible characters visible in the UI so users can see hidden information; (2) remove invisible Unicode Tag characters entirely before and after inference (described as 'probably the most practical mitigation'); (3) mitigate at the application level, as coding agents like Amp and Amazon Q Developer for VS Code have done. The source also notes that if building exclusively on OpenAI models, users should be protected since OpenAI mitigates this at the model/API level.
Embrace The RedWindsurf Cascade contains a create_memory tool that could enable SpAIware attacks, which are exploits allowing memory-persistent data exfiltration (stealing data by storing it in an AI's long-term memory). The key question is whether creating these memories requires human approval or happens automatically, which could determine how easily an attacker could abuse this feature.
Sourcegraph's Amp coding agent was vulnerable to invisible prompt injection (hidden instructions embedded in text that AI models interpret as commands). Attackers could use invisible Unicode Tag characters to trick the AI into dumping environment variables and exfiltrating secrets through URLs. The vulnerability has been fixed in the latest version.
Fix: According to the source, Sourcegraph addressed the vulnerability by "sanitizing the input." The source also recommends that developers: strip or neutralize Unicode Tag characters before processing input, add visual and technical safeguards against invisible prompts, include automated detection of suspicious Unicode usage in prompt injection monitors, implement human-in-the-loop approval before navigating to untrusted third-party domains, and mitigate downstream data exfiltration vulnerabilities.
Embrace The RedThis content discusses security challenges in agentic AI systems (AI agents that can take actions autonomously), highlighting that generic jailbreak testing (attempts to trick AI into bypassing safety rules) misses real risks like tool misuse and data theft. The article emphasizes the need for contextual red teaming (security testing that simulates realistic attacks in specific business contexts) to properly protect AI agents in enterprise environments.
Google's Gemini AI models, including the Jules product, are vulnerable to invisible prompt injection (tricking an AI by hiding instructions in its input using invisible Unicode characters that the AI interprets as commands). This vulnerability was reported to Google over a year ago but remains unfixed at the model and API (application programming interface, the interface developers use to access the AI) level, affecting all applications built on Gemini, including Google's own products.
Jules, a coding agent, is vulnerable to prompt injection (tricking an AI by hiding malicious instructions in its input) attacks that can lead to remote command and control compromise. An attacker can embed malicious instructions in GitHub issues to trick Jules into downloading and executing malware, giving attackers full control of the system. The attack works because Jules has unrestricted internet access and automatically approves plans after a time delay without requiring human confirmation.
Fix: The source explicitly recommends four mitigations: (1) 'Be careful when directly tasking Jules to work with untrusted data (e.g. GitHub issues that are not from trusted sources, or websites with documentation that does not belong to the organization, etc.)'; (2) 'do not have Jules work on private, important, source code or give it access to production-level secrets, or anything that could enable an adversary to perform lateral movement'; (3) deploy 'monitoring and detection tools on these systems' to 'enable security teams to monitor and understand potentially malicious behavior'; and (4) 'do not allow arbitrary Internet access by default. Instead, allow the configuration to be enabled when needed.'
Embrace The RedGoogle Jules, an asynchronous coding agent (a tool that automatically writes and manages code tasks), has multiple security vulnerabilities that allow attackers to steal data through prompt injection (tricking the AI by hiding malicious instructions in its input). Attackers can exploit two main exfiltration vectors: using markdown image rendering to leak information to external servers, and abusing the view_text_website tool (which fetches and reads web pages) to read files and send them to attacker-controlled servers, often by planting malicious instructions in GitHub issues.
Fix: Anthropic fixed the vulnerability in early June.
Embrace The Red