New tools, products, platforms, funding rounds, and company developments in AI security.
This article describes the EU AI Office, a newly established regulatory organization within the European Commission tasked with enforcing the AI Act (the world's first comprehensive binding AI regulation) across the European Union. Unlike other AI safety institutes in other countries, the EU AI Office has actual enforcement powers to require AI model providers to fix problems or remove non-compliant models from the market. The office will conduct model evaluations, investigate violations, and work with international partners to shape global AI governance standards.
This overview discusses the European AI Act and the governance framework needed to implement it, focusing on the European Commission's responsibilities and the AI Office. Key tasks include establishing guidelines for classifying high-risk AI systems, defining what counts as significant modifications (changes that alter a system's risk level), and setting standards for transparency and enforcement across EU member states.
A researcher examined browser remote debugging features as a potential method for stealing sensitive data like cookies, building on past work about cookie theft techniques. The post references Google's guidance on detecting browser data theft through Windows Event Logs and DPAPI (Data Protection API, a Windows system that encrypts sensitive information) calls, but focuses on exploring whether remote debugging could be used to bypass these detection methods.
HackSpaceCon 2024, held at Kennedy Space Center, featured a keynote by Dave Kennedy on making the world safer through security practices. Kennedy highlighted that attackers can easily modify existing malware (pre-written malicious code) to evade detection systems, and emphasized the importance of active threat hunting (proactively searching for signs of attacks rather than waiting for alerts).
Google AI Studio had a vulnerability that allowed attackers to steal data through prompt injection (tricking an AI by hiding malicious instructions in its input), where a malicious file could trick the AI into exfiltrating other uploaded files to an attacker's server via image tags. The vulnerability appeared in a recent update but was fixed within 12 days of being reported to Google on February 17, 2024.
The European Commission is hiring AI specialists to work in the AI Office, which will enforce the EU's AI Act by overseeing compliance of general-purpose AI models (large AI systems available to the public). The office will have real regulatory powers to require companies to implement safety measures, restrict models, or remove them from the market, and will develop evaluation tools and benchmarks to identify dangerous AI behaviors.
The European AI Office is a new EU regulator created to oversee general purpose AI (GPAI) models and systems, which are AI systems designed to perform a wide range of tasks, across all 27 EU Member States under the AI Act. It monitors compliance, analyzes emerging risks, develops evaluation capabilities, produces voluntary codes of practice for companies to follow, and coordinates enforcement between national regulators and international partners. The Office also supports small and medium businesses with compliance resources and oversees regulatory sandboxes, which are controlled environments where companies can test AI systems before full deployment.
ASCII Smuggler is a tool that hides text within regular content using Unicode characters, and this update adds new features like optional rendering of Unicode Tags (special markers that show where hidden text begins and ends), URL decoding of input, flexible output modes to either highlight or isolate hidden text, and improved mobile compatibility with a better user interface.
The EU AI Act is a regulatory framework that requires companies to comply with rules for different types of AI systems on specific timelines, starting with prohibitions on the riskiest AI uses within 6 months and expanding to cover high-risk AI systems (such as those used in law enforcement, hiring, or education) by 24 months after the law takes effect. The article outlines key compliance deadlines, secondary laws the EU Commission might create to clarify the rules, and guidance documents to help organizations understand how to follow the AI Act.
The EU AI Act classifies AI systems by risk level, from prohibited (like social scoring systems that manipulate behavior) to minimal risk (unregulated). High-risk AI systems, such as those used in critical decisions affecting people's lives, face strict regulations requiring developers to provide documentation, conduct testing, and monitor for problems. General-purpose AI (large language models that can do many tasks) have lighter requirements unless they present systemic risk, in which case developers must test them against adversarial attacks (attempts to trick or break them) and report serious incidents.
ChatGPT's Code Interpreter (a sandbox environment that runs code) was not properly isolated between different GPTs, meaning files uploaded to one GPT were visible and could be modified by other GPTs used by the same person, creating a security risk where malicious GPTs could steal or overwrite sensitive files. OpenAI addressed this vulnerability in May 2024.
Researchers discovered ASCII Smuggling, a technique using Unicode Tags Block characters (special Unicode codes that mirror ASCII but stay invisible in UI elements) to hide prompt injections (tricky instructions hidden in AI input) that large language models interpret as regular text. This attack is particularly dangerous for LLMs because they can both read these hidden messages and generate them in responses, enabling more sophisticated attacks beyond traditional methods like XSS (cross-site scripting, injecting malicious code into websites) and SSRF (server-side request forgery, tricking a server into making unauthorized requests).
ChatGPT's browsing tool can be tricked into automatically invoking other tools (like image creation or memory management) when users visit websites containing hidden instructions, a vulnerability known as prompt injection (tricking an AI by hiding instructions in its input). While OpenAI added some protections, minor prompting tricks can bypass them, and this issue affects other AI applications as well.
Fix: For custom GPTs with AI Actions, creators can use the x-openai-isConsequential flag as a mitigation to put users in control, though the source notes this approach 'still lacks a great user experience, like better visualization to understand what the action is about to do.'
Embrace The RedFix: The source suggests that the Commission should adopt 'predetermined change management plans akin to those in medicine' to assess modifications to AI systems. These plans would be documents outlining anticipated changes (such as performance adjustments or shifts in intended use) and the methods for evaluating whether those changes substantially alter the system's risk level. The source also recommends that standard fine-tuning of foundation models (training adjustments to pre-existing AI models) should not be considered a significant modification unless safety layers are removed or other actions clearly increase risk.
EU AI Act UpdatesChatGPT's new memory feature, which lets the AI remember information across different chat sessions for a more personalized experience, can be exploited through indirect prompt injection (tricking an AI by hiding malicious instructions in its input). Attackers could manipulate ChatGPT into storing false information, biases, or unwanted instructions by injecting commands through connected apps like Google Drive, uploaded documents, or web browsing features.
This post examines how attackers can insert hidden malicious code into machine learning models (a technique called backdooring) through supply chain attacks, specifically targeting Keras models (a popular framework for building AI systems). The authors demonstrate this attack and then explore tools that can detect when a model has been compromised in this way.
Google's NotebookLM is a tool that lets users upload files for an AI to analyze, but it's vulnerable to prompt injection (tricking the AI by hiding instructions in uploaded files) that can manipulate the AI's responses and expose what users see. The tool also has a data exfiltration vulnerability (attackers stealing information) when processing untrusted files, and there is currently no known way to prevent these attacks, meaning users cannot fully trust the AI's responses when working with files from unknown sources.
Fix: The issue was fixed by Google and did not reproduce after the company heard back about the report 12 days later (by approximately February 29, 2024). The ticket was closed as 'Duplicate' on March 3, 2024, suggesting the vulnerability may have also been caught through internal testing.
Embrace The RedUnfurling is when an application automatically expands hyperlinks to show previews, which can be exploited in AI chatbots to leak data. When an attacker uses prompt injection (tricking an AI by hiding instructions in its input) to make the chatbot generate a link containing sensitive information from earlier conversations, the unfurling feature automatically sends that data to a third-party server, potentially exposing private information.
Fix: To disable unfurling in Slack Apps, modify the message creation function to include unfurl settings in the JSON object: set "unfurl_links": False and "unfurl_media": False when creating the message, as shown in the example code: def create_message(text): message = { "text": text, "unfurl_links": False, "unfurl_media": False } return json.dumps(message)
Embrace The RedAttackers can create conditional prompt injection attacks (tricking an AI by hiding malicious instructions in its input that activate only for specific users) against Microsoft Copilot by leveraging user identity information like names and job titles that the AI includes in its context. A researcher demonstrated this by sending an email with hidden instructions that made Copilot behave differently depending on which person opened it, showing that LLM applications become more vulnerable as attackers learn to target specific users rather than all users equally.
A researcher discovered a vulnerability in Google Gemini where attackers can hide instructions in emails that trick the AI into automatically calling external tools (called Extensions) without the user's knowledge. When a user asks the AI to analyze a malicious email, the AI follows the hidden instructions and invokes the tool, which is a form of request forgery (making unauthorized requests on behalf of the user).
Fix: OpenAI addressed this vulnerability in May 2024. Additionally, the source recommends: 'Disable Code Interpreter in private GPTs with private knowledge files (as they will be accessible to other GPTs)' and notes that 'when creating a new GPT Code Interpreter is off by default' as one change OpenAI made. Users should avoid uploading sensitive files to Code Interpreter and use third-party GPTs with caution, especially those with Code Interpreter enabled.
Embrace The RedFix: As a developer, a possible mitigation is to remove Unicode Tags Block text on the way in and out (meaning filter it both when users send input to your LLM and when the LLM sends responses back to users). Additionally, test your own LLM applications for this new attack vector to identify vulnerabilities.
Embrace The RedA researcher discovered that Anthropic's Claude AI model is vulnerable to hidden prompt injections using Unicode Tags code points (invisible characters that can carry secret instructions in text). Like ChatGPT before it, Claude can interpret these hidden instructions and follow them, even though users cannot see them on their screen. The researcher reported the issue to Anthropic, but the ticket was closed without further details provided.