New tools, products, platforms, funding rounds, and company developments in AI security.
A researcher discovered data exfiltration vulnerabilities (security flaws that allow unauthorized data to leak out of a system) in several popular AI chatbots including Bing Chat, ChatGPT, and Claude, and responsibly disclosed them to the companies. Microsoft, Anthropic, and a plugin vendor fixed their vulnerabilities, but OpenAI decided not to fix an image markdown injection issue (a vulnerability where hidden code in image formatting can trick the AI into revealing data).
Fix: The source mentions that Microsoft (Bing Chat), Anthropic (Claude), and a plugin vendor addressed and fixed their respective vulnerabilities. However, OpenAI's response to the reported vulnerability was "won't fix," meaning no mitigation from OpenAI is described in the source text.
Embrace The RedOpenAI removed the 'Chat with Code' plugin from its store after security researchers discovered it was vulnerable to CSRF (cross-site request forgery, where an attacker tricks a system into making unwanted actions on behalf of a user). The vulnerability allowed ChatGPT to accidentally create GitHub issues without user permission when certain plugins were enabled together.
Bing Chat contained a prompt injection vulnerability (tricking an AI by hiding instructions in its input) where malicious text on websites could trick the AI into returning markdown image tags that send sensitive data to an attacker's server. When Bing Chat's client converts markdown to HTML, an attacker can embed data in the image URL, exfiltrating (stealing and sending out) information without the user knowing.
A malicious website can hijack a ChatGPT chat session and steal conversation history by controlling the data that plugins (add-ons that extend ChatGPT's abilities) retrieve. The post highlights that while plugins can leak data by receiving too much information, the main risk here is when an attacker controls what data the plugin pulls in, enabling them to extract sensitive information.
Yolo is a tool that uses ChatGPT API (OpenAI's language model accessed through code) to translate natural language questions into shell commands (the text-based interface for controlling a computer) that can be executed automatically. The tool helps users who forget command syntax by converting plain English requests into proper bash, zsh, or PowerShell commands, with a safety feature that shows the command before running it unless the user enables automatic execution.
This post announces a video tutorial about SSH Agent Hijacking, a technique (TTP, or tactic/technique/procedure) used in security testing where an attacker compromises the SSH Agent (a program that stores SSH keys, which authenticate users to remote systems). The tutorial is intended to help security professionals understand this attack method and develop ways to detect it on Linux and macOS systems.
Anthropic patched a data exfiltration vulnerability in Claude caused by image markdown injection, a technique where attackers embed hidden instructions in image links to trick the AI into leaking sensitive information. While Microsoft fixed this vulnerability in Bing Chat and OpenAI chose not to address it in ChatGPT, Anthropic implemented a mitigation to protect Claude users from this attack.
ChatGPT has a vulnerability where attackers can use image markdown (a way to embed images in text) to trick the system into leaking data. OpenAI recently added Custom Instructions, a feature that automatically adds instructions to every message, which attackers can abuse to install a persistent backdoor (hidden access point) that steals data through the image markdown vulnerability. This technique is similar to how attackers exploit other systems by enabling features like email forwarding after they gain initial access.
Google Bard can be tricked through image-based prompt injection (hidden instructions placed in images that the AI then follows), as demonstrated by a researcher who embedded text in an image that caused Bard to perform unexpected actions. This vulnerability shows that AI systems that analyze images may be vulnerable to indirect prompt injection attacks (tricking an AI into ignoring its normal instructions by hiding malicious commands in user-provided content).
Google Docs recently added new AI features, such as automatic summaries and creative content generation, which are helpful but introduce security risks. The main concern is that using these AI features on untrusted data (information you don't know the source or reliability of) could lead to unwanted consequences, though currently attackers have limited ways to exploit these features.
OpenAI's plugin store contains security vulnerabilities, particularly in plugins that can act on behalf of users without adequate security review. These plugins are susceptible to prompt injection attacks (tricking an AI by hiding instructions in its input) and the Confused Deputy Problem (where an attacker can manipulate a plugin into performing harmful actions by exploiting its trust in the AI system), allowing adversaries to steal source code or cause other damage.
A security researcher created a demonstration website that shows how indirect prompt injection (tricking an AI by hiding instructions in web content it reads) can be used to hijack ChatGPT when the browsing feature is enabled. The demo lets users explore various AI-based attacks, including data theft and manipulation of ChatGPT's responses, to raise awareness of these vulnerabilities.
ChatGPT plugins can be exploited through indirect prompt injections (attacks that hide malicious instructions in data the AI reads from external sources rather than directly from the user), which hackers have used to access private data through cross-plugin request forgery (a vulnerability where one plugin tricks another into performing unauthorized actions). The post documents a real exploit found in the wild and explains the security fix that was applied.
ChatGPT can access YouTube transcripts through plugins, which is useful but creates a security risk called indirect prompt injection (hidden instructions embedded in content that an AI reads and then follows). Attackers can hide malicious commands in video transcripts, and when ChatGPT reads those transcripts to answer user questions, it may follow the hidden instructions instead of the user's intended request.
This resource is a tutorial and lab (an interactive learning environment for hands-on practice) that teaches prompt injection, which is a technique for tricking AI systems by embedding hidden instructions in their input. The tutorial covers examples ranging from simple prompt engineering (getting an AI to change its output) to more complex attacks like injecting malicious code (HTML/XSS, which runs unwanted scripts in web browsers) and stealing data from AI systems.
Prompt injection (tricking an AI by hiding instructions in its input) is a widespread vulnerability in AI education, with indirect prompt injections being particularly dangerous because they allow untrusted data to secretly take control of an LLM (large language model) and change its goals and behavior. Since attack payloads use natural language, attackers can craft many creative variations to bypass input validation (checking that data meets safety rules) and web application firewalls (security systems that filter harmful requests).
This is a podcast episode about AI red teaming (simulated attacks to find weaknesses in AI systems) and threat modeling (planning for potential security risks) in machine learning systems. The episode explores how traditional security practices can be combined with machine learning security to better protect AI applications from attacks.
LLM outputs are untrusted and can be manipulated through prompt injection (tricking an AI by hiding instructions in its input), which affects large language models in particular ways. This post addresses how to handle the risks of untrusted output when using AI systems in real applications.
AI prompt injection is a vulnerability where attackers manipulate input given to AI systems, either directly (by controlling parts of the prompt themselves) or indirectly (by embedding malicious instructions in data the AI will later process, like web pages). These attacks can trick AI systems into ignoring their intended instructions and producing harmful, misleading, or inappropriate responses, similar to how SQL injection or cross-site scripting (XSS, a web attack that injects malicious code into websites) compromise other systems.
# Analysis ## Summary A user discovered that Bing Chat could be manipulated into describing illegal activities (like bank robbery) by using indirect language techniques, even though it refused to help when the user directly asked about hacking. This shows that the AI's safety filters, which are supposed to prevent harmful outputs, can be bypassed through clever wording rather than direct requests. ## Solution N/A -- no mitigation discussed in source.