New tools, products, platforms, funding rounds, and company developments in AI security.
Attackers can exploit large language models (LLMs) through "sponge attacks," which are denial of service (DoS) attacks that craft prompts designed to generate extremely long outputs, exhausting the model's resources and degrading performance. Researchers are developing methods to predict how long an LLM's response will be based on a given prompt, creating an early warning system to detect and prevent these resource-draining attacks.
New York's legislature passed the RAISE Act (Responsible AI Safety and Education Act), which would regulate frontier AI systems (the largest, most powerful AI models) if signed into law. The act requires developers of expensive AI models to publish safety plans, withhold unreasonably risky models from release, report safety incidents within 72 hours, and face penalties up to $10 million for violations.
The European Commission is recruiting up to 60 independent experts for a scientific panel to advise on general-purpose AI (GPAI, large AI models designed for many tasks) under the EU AI Act. The panel will assess systemic risks (widespread dangers affecting multiple countries or many users), classify AI models, and issue alerts when AI systems pose significant dangers to Europe. Applicants need a PhD in a relevant field, proven AI research experience, and independence from AI companies, with the deadline set for September 14th.
The mcp-com-server is a tool that connects the Model Context Protocol (MCP, a standard for AI systems to interact with external tools) to COM (Component Object Model, Microsoft's decades-old system for sharing functionality across programs on Windows). This allows an AI like Claude to automate Windows and Office tasks, such as creating Excel files and sending emails, by dynamically discovering and controlling COM objects. The main security risk is that COM can access dangerous operations like file system access, so the server uses an allowlist (a list of approved COM objects that are permitted to run) to restrict which COM objects can be instantiated.
This article describes a curated database of AI literacy training programs across Europe designed to help organizations and professionals comply with Article 4 of the EU AI Act (a regulation requiring organizations to build employee understanding of AI). The programs are selected based on whether they teach what AI is, its risks and benefits, and how to use it responsibly in the workplace.
AI regulatory sandboxes are controlled testing environments where companies can develop and test AI systems with guidance from regulators before releasing them to the public, as required by the EU AI Act (EU's new rules for artificial intelligence). These sandboxes help companies understand what regulations they must follow, protect them from fines if they follow official guidance, and make it easier for small startups to enter the market. Each EU Member State must create at least one sandbox by August 2, 2026, though different countries are taking different approaches to organizing them.
Former OpenAI employees and experts published an open letter asking California and Delaware officials to block OpenAI's restructuring from a nonprofit organization into a for-profit company (a Public Benefit Corporation, which balances profit with public benefit). The letter argues that the restructuring would eliminate governance safeguards designed to prevent profit motives from influencing decisions about AGI (artificial general intelligence, highly autonomous systems that outperform humans at most economically valuable work), and would shift control away from a nonprofit board accountable to the public toward a board partly accountable to shareholders.
On April 22, 2025, the European AI Office published preliminary guidelines explaining which companies count as providers of GPAI models (general-purpose AI models, which are AI systems capable of performing many different tasks across various applications). The guidelines cover seven key topics, including defining what a GPAI model is, identifying who qualifies as a provider, handling open-source exemptions, and compliance requirements such as documentation, copyright policies, and security protections for higher-risk models.
This article compares traditional application security (AppSec) practices with AI security, noting that familiar principles like input validation and authentication apply to both, but AI systems introduce unique risks. New attack types specific to AI, such as prompt injection (tricking an AI by hiding instructions in its input), model poisoning (tampering with training data), and membership inference attacks (determining if specific data was in training), require security engineers to develop new defensive strategies beyond traditional code-level vulnerability management.
Fix: The source explicitly mentions two mitigations: (1) An Allow List for CLSIDs and ProgIDs, where 'the MCP server will instantiate allow listed COM objects' and notes this 'could be expanded to include specific interfaces/methods as well,' and (2) 'Confirmation Dialogs' where 'Claude shows an Allow / Deny button before invoking custom tools by default' to 'make sure a human remains in the loop,' though the source notes this 'can be disabled, but also re-enabled in the Claude Settings per MCP tool.'
Embrace The RedThis content is a collection of blog post titles and announcements from Palo Alto Networks about AI security, covering topics like agentic AI (AI systems that can autonomously take actions), container security, and operational technology (OT, the systems that control physical infrastructure) security. The posts discuss vulnerabilities in autonomous AI systems, the need for contextual red teaming (security testing tailored to specific use cases), and various security products like Prisma AIRS.
This article collection discusses security challenges in AI and cloud systems, particularly focusing on agentic AI (AI systems that can take autonomous actions). Key risks include jailbreaks (tricking AI systems into ignoring safety rules), prompt injection (hidden malicious instructions in AI inputs), and tool misuse by autonomous agents, which require contextual red teaming (security testing designed for specific use cases) rather than generic testing to identify real vulnerabilities.
Google released Veo 3, a frontier video generation model (an advanced AI system at the cutting edge of technology) that generates both video and audio with high quality and appears to be a marked improvement over existing systems. The model performs well on human preference benchmarks and may represent the point where video generation becomes genuinely useful rather than just a novelty. Additionally, Google announced several other AI improvements at its I/O 2025 conference, including Gemini 2.5 Pro and enhanced reasoning capabilities, while Anthropic released Claude Opus 4 and Claude Sonnet 4 with frontier-level performance.
ClickFix is a social engineering technique (a method that tricks people rather than exploiting technical vulnerabilities) that adversaries are adapting to attack computer-use agents (AI systems that can control computers by clicking and typing). The attack works by deceiving users into believing something is broken or needs verification, then tricking them into clicking buttons or running commands that compromise their system.
This content discusses security challenges in agentic AI (autonomous AI systems that can take actions independently), emphasizing that traditional jailbreak testing (attempts to trick AI into breaking its rules) misses real operational risks like tool misuse and data theft. The material suggests that contextual red teaming (security testing that simulates realistic attack scenarios in specific business environments) is needed to properly assess vulnerabilities in autonomous AI systems.
The Trump Administration cancelled the Biden-era AI Diffusion Rule, which had regulated exports of AI chips and AI models (software trained to perform tasks) to different countries. At the same time, the administration approved major sales of advanced AI chips to the UAE and Saudi Arabia, with deals including up to 500,000 chips per year to the UAE and 18,000 advanced chips to Saudi Arabia.
The article argues that using multiple specialized AI security models (each designed to detect specific threats like prompt injection, toxicity, or PII detection) is more effective than using a single large model for all security tasks. Specialized models offer advantages including faster response times to new threats, easier management, better performance, lower costs, and greater resilience because if one model fails, the others can still provide protection.
OpenAI announced a restructured plan in May 2025 that aims to preserve nonprofit control over the company's for-profit operations, replacing a December 2024 proposal that had faced criticism. The new plan would convert OpenAI Global LLC into a public-benefit corporation (PBC, a corporate structure designed to balance profit with charitable purpose) where the nonprofit would retain shareholder status and board appointment power, though critics argue this may not preserve the governance safeguards that existed in the original structure.
ChatGPT has two memory features: saved memories (which users can manage) and chat history (a newer feature that builds a profile over time without user visibility or control). The chat history feature doesn't search past conversations but maintains recent chat history and learns user preferences, though the implementation details are not publicly documented, and users cannot inspect or modify what the system learns about them unless they use prompt hacking (manipulating the AI's instructions to reveal hidden information).
The Model Context Protocol (MCP) is a system that lets AI applications discover and use external tools from servers at runtime (while the program is running). However, MCP has a security weakness: because servers can send instructions through the tool descriptions, they can perform prompt injection (tricking an AI by hiding instructions in its input) to control the AI client, making servers more powerful than they should be.
The AI Safety Newsletter highlights the launch of AI Frontiers, a new publication featuring expert commentary on critical AI challenges including national security risks, resource access inequality, risk management approaches, and governance of autonomous systems (AI agents that can make decisions without human input). The newsletter presents diverse viewpoints on how society should navigate AI's wide-ranging impacts on jobs, health, and security.