New tools, products, platforms, funding rounds, and company developments in AI security.
The article argues that using multiple specialized AI security models (each designed to detect specific threats like prompt injection, toxicity, or PII detection) is more effective than using a single large model for all security tasks. Specialized models offer advantages including faster response times to new threats, easier management, better performance, lower costs, and greater resilience because if one model fails, the others can still provide protection.
AI regulatory sandboxes are controlled testing environments where companies can develop and test AI systems with guidance from regulators before releasing them to the public, as required by the EU AI Act (EU's new rules for artificial intelligence). These sandboxes help companies understand what regulations they must follow, protect them from fines if they follow official guidance, and make it easier for small startups to enter the market. Each EU Member State must create at least one sandbox by August 2, 2026, though different countries are taking different approaches to organizing them.
Former OpenAI employees and experts published an open letter asking California and Delaware officials to block OpenAI's restructuring from a nonprofit organization into a for-profit company (a Public Benefit Corporation, which balances profit with public benefit). The letter argues that the restructuring would eliminate governance safeguards designed to prevent profit motives from influencing decisions about AGI (artificial general intelligence, highly autonomous systems that outperform humans at most economically valuable work), and would shift control away from a nonprofit board accountable to the public toward a board partly accountable to shareholders.
On April 22, 2025, the European AI Office published preliminary guidelines explaining which companies count as providers of GPAI models (general-purpose AI models, which are AI systems capable of performing many different tasks across various applications). The guidelines cover seven key topics, including defining what a GPAI model is, identifying who qualifies as a provider, handling open-source exemptions, and compliance requirements such as documentation, copyright policies, and security protections for higher-risk models.
Spammers used OpenAI's GPT-4o-mini model to generate unique spam messages for each target website, allowing them to bypass spam-detection filters (systems that block unwanted messages) across over 80,000 sites in four months. The spam campaign, called AkiraBot, automated message delivery through website contact forms and chat widgets to promote search optimization services. OpenAI revoked the spammers' account in February after the activity was discovered.
The EU AI Act includes specific support measures for small and medium-sized enterprises (SMEs, defined as companies with fewer than 250 employees and under €50 million in annual revenue). These measures include regulatory sandboxes (controlled testing environments for AI products outside normal regulatory rules), reduced compliance fees scaled to company size, simplified documentation forms, free training, and dedicated support channels to help SMEs follow the AI Act's requirements.
Microsoft 365 Copilot (a generative AI assistant built into Microsoft 365) had a security issue where generated images could be accessed without authentication (meaning anyone could view them without logging in). The issue has been fixed. The article also mentions that system prompts (the hidden instructions that guide an AI's behavior) for this tool have been updated over time, including changes to how it accesses enterprise search features.
OpenAI announced a restructured plan in May 2025 that aims to preserve nonprofit control over the company's for-profit operations, replacing a December 2024 proposal that had faced criticism. The new plan would convert OpenAI Global LLC into a public-benefit corporation (PBC, a corporate structure designed to balance profit with charitable purpose) where the nonprofit would retain shareholder status and board appointment power, though critics argue this may not preserve the governance safeguards that existed in the original structure.
ChatGPT has two memory features: saved memories (which users can manage) and chat history (a newer feature that builds a profile over time without user visibility or control). The chat history feature doesn't search past conversations but maintains recent chat history and learns user preferences, though the implementation details are not publicly documented, and users cannot inspect or modify what the system learns about them unless they use prompt hacking (manipulating the AI's instructions to reveal hidden information).
The Model Context Protocol (MCP) is a system that lets AI applications discover and use external tools from servers at runtime (while the program is running). However, MCP has a security weakness: because servers can send instructions through the tool descriptions, they can perform prompt injection (tricking an AI by hiding instructions in its input) to control the AI client, making servers more powerful than they should be.
The AI Safety Newsletter highlights the launch of AI Frontiers, a new publication featuring expert commentary on critical AI challenges including national security risks, resource access inequality, risk management approaches, and governance of autonomous systems (AI agents that can make decisions without human input). The newsletter presents diverse viewpoints on how society should navigate AI's wide-ranging impacts on jobs, health, and security.
GitHub Copilot can be customized using instructions from a .github/copilot-instructions.md file in your repository, but security researchers at Pillar Security have identified risks with such custom instruction files (similar to risks found in other AI tools like Cursor). GitHub has responded by updating their Web UI to highlight invisible Unicode characters (characters hidden in text that don't display visibly), referencing both the Pillar Security research and concerns about ASCII smuggling (hiding malicious code in plain-text files using character tricks).
Fix: GitHub made a product change to highlight invisible Unicode characters in the Web UI to help users spot suspicious hidden characters in instruction files.
Embrace The RedThree major AI companies (OpenAI, Google, and Anthropic) submitted public comments to the U.S. government's request for input on developing an 'AI Action Plan' in response to President Trump's executive order. The companies largely advocated for increased government investment in AI infrastructure and public-private partnerships, though they framed their arguments differently, with OpenAI notably avoiding the term 'AI safety' in its response despite previous public emphasis on the topic.
Researchers have discovered advanced data smuggling techniques using invisible Unicode characters (invisible text that computers can process but humans cannot see) to hide information in LLM inputs and outputs. The technique, called Sneaky Bits, can encode any character or sequence of bytes using only two invisible characters, building on earlier methods that used Unicode Tags and Variant Selectors to inject hidden instructions into AI systems.
A new policy paper called 'Superintelligence Strategy' proposes that advanced AI systems surpassing human capabilities in most areas pose national security risks requiring a three-part approach: deterrence (using threat of retaliation to prevent AI dominance races), nonproliferation (restricting advanced AI access to non-state actors like terrorist groups), and competitiveness (building AI strength domestically). The deterrence strategy, called Mutual Assured AI Malfunction (MAIM), mirrors nuclear strategy by threatening cyberattacks on destabilizing AI projects to prevent any single country from gaining dangerous AI superiority.
Fix: The paper explicitly proposes three nonproliferation measures: Compute Security (governments track and monitor high-end AI chips to prevent smuggling), Information Security (AI model weights, which are the trained parameters that define how an AI behaves, are protected like classified intelligence), and AI Security (developers implement technical safety measures to detect and prevent misuse, similar to how DNA synthesis services block orders for dangerous bioweapon sequences).
CAIS AI Safety NewsletterFix: The source explicitly mentions several mitigation measures for SME compliance: (1) Regulatory sandboxes with free access and simple procedures for SMEs to test AI systems in controlled conditions, (2) Assessment fees proportional to SME size with regular review to lower costs, (3) Simplified technical documentation forms developed by the Commission and accepted by national authorities, (4) Training activities tailored to SMEs, (5) Dedicated guidance channels to answer compliance questions, and (6) Proportionate obligations for AI model providers with separate Key Performance Indicators for SMEs under the Code of Practice.
EU AI Act UpdatesChatGPT Operator is an AI agent that can control web browsers to complete tasks, but it is vulnerable to prompt injection (tricking the AI by hiding malicious instructions in its input) that could allow attackers to steal data or perform unauthorized actions. OpenAI has implemented three defensive layers: user monitoring to watch what the agent does, inline confirmation requests within the chat asking the user to approve actions, and out-of-band confirmation requests that appear when the agent crosses website boundaries, though these mitigations are not foolproof.
Fix: OpenAI has implemented three primary mitigation techniques: (1) User Monitoring, where users are prompted to observe what Operator is doing, what text it types, and which buttons it clicks, likely based on a data classification model that detects sensitive information on screen; (2) Inline Confirmation Requests, where Operator asks the user within the chat conversation to approve certain actions or clarify requests before proceeding; and (3) Out-of-Band Confirmation Requests, which appear when Operator navigates across websites or performs complex actions, informing the user what is about to happen and giving them the option to pause or resume the operation.
Embrace The RedGoogle's Gemini AI can be tricked into storing false information in a user's long-term memory through prompt injection (hidden malicious instructions embedded in documents) combined with delayed tool invocation (planting trigger words that cause the AI to execute commands later when the user unknowingly says them). An attacker can craft a document that appears normal but contains hidden instructions telling Gemini to save false information about the user if they respond with certain words like 'yes' or 'no' in the same conversation.
A security researcher demonstrated at Black Hat Europe how prompt injection (tricking an AI by hiding instructions in its input) can be used to create a Command and Control system (C2, a central server that remotely directs compromised systems) that remotely controls multiple ChatGPT instances. An attacker could compromise ChatGPT instances and force them to follow updated instructions from this central C2 system, potentially impacting all aspects of the CIA security triad (confidentiality, integrity, and availability of data).
A new research paper examines prompt injection attacks (tricks where hidden instructions in user inputs manipulate AI systems) and how they can compromise the CIA triad (confidentiality, integrity, and availability, the three core principles of security). The paper includes real-world examples of these attacks against major AI vendors like OpenAI, Google, Anthropic, and Microsoft, and aims to help traditional cybersecurity experts better understand and defend against these emerging AI-specific threats.
A security researcher analyzed xAI's Grok chatbot (an AI assistant available through X and an API) for vulnerabilities and found multiple security issues, including prompt injection (tricking the AI by hiding instructions in user posts, images, and PDFs), data exfiltration (stealing information from the system), phishing attacks through clickable links, and ASCII smuggling (hiding invisible text to manipulate the AI's behavior). The researcher responsibly disclosed these findings to xAI.