aisecwatch.com
DashboardVulnerabilitiesNewsResearchArchiveStatsDataset
aisecwatch.com

Real-time AI security monitoring. Tracking AI-related vulnerabilities, safety and security incidents, privacy risks, research developments, and policy changes.

Navigation

VulnerabilitiesNewsResearchDigest ArchiveNewsletter ArchiveSubscribeData SourcesStatisticsDatasetAPIIntegrationsWidgetRSS Feed

Maintained by

Truong (Jack) Luu

Information Systems Researcher

Industry News

New tools, products, platforms, funding rounds, and company developments in AI security.

to
Export CSV
1289 items

Video: Data Exfiltration Vulnerabilities in LLM apps (Bing Chat, ChatGPT, Claude)

highnews
security
Aug 28, 2023

A researcher discovered data exfiltration vulnerabilities (security flaws that allow unauthorized data to leak out of a system) in several popular AI chatbots including Bing Chat, ChatGPT, and Claude, and responsibly disclosed them to the companies. Microsoft, Anthropic, and a plugin vendor fixed their vulnerabilities, but OpenAI decided not to fix an image markdown injection issue (a vulnerability where hidden code in image formatting can trick the AI into revealing data).

Fix: The source mentions that Microsoft (Bing Chat), Anthropic (Claude), and a plugin vendor addressed and fixed their respective vulnerabilities. However, OpenAI's response to the reported vulnerability was "won't fix," meaning no mitigation from OpenAI is described in the source text.

Embrace The Red

Anthropic Claude Data Exfiltration Vulnerability Fixed

mediumnews
securitysafety

ChatGPT Custom Instructions: Persistent Data Exfiltration Demo

mediumnews
securitysafety

Image to Prompt Injection with Google Bard

infonews
securityresearch

Google Docs AI Features: Vulnerabilities and Risks

infonews
securitysafety

OpenAI Removes the "Chat with Code" Plugin From Store

mediumnews
security
Jul 6, 2023

OpenAI removed the 'Chat with Code' plugin from its store after security researchers discovered it was vulnerable to CSRF (cross-site request forgery, where an attacker tricks a system into making unwanted actions on behalf of a user). The vulnerability allowed ChatGPT to accidentally create GitHub issues without user permission when certain plugins were enabled together.

Plugin Vulnerabilities: Visit a Website and Have Your Source Code Stolen

mediumnews
securitysafety

Bing Chat: Data Exfiltration Exploit Explained

mediumnews
security
Jun 18, 2023

Bing Chat contained a prompt injection vulnerability (tricking an AI by hiding instructions in its input) where malicious text on websites could trick the AI into returning markdown image tags that send sensitive data to an attacker's server. When Bing Chat's client converts markdown to HTML, an attacker can embed data in the image URL, exfiltrating (stealing and sending out) information without the user knowing.

Exploit ChatGPT and Enter the Matrix to Learn about AI Security

infonews
securitysafety

ChatGPT Plugin Exploit Explained: From Prompt Injection to Accessing Private Data

infonews
securitysafety

ChatGPT Plugins: Data Exfiltration via Images & Cross Plugin Request Forgery

highnews
security
May 16, 2023

A malicious website can hijack a ChatGPT chat session and steal conversation history by controlling the data that plugins (add-ons that extend ChatGPT's abilities) retrieve. The post highlights that while plugins can leak data by receiving too much information, the main risk here is when an attacker controls what data the plugin pulls in, enabling them to extract sensitive information.

Indirect Prompt Injection via YouTube Transcripts

mediumnews
securitysafety

Adversarial Prompting: Tutorial and Lab

infonews
securityresearch

Video: Prompt Injections - An Introduction

infonews
securitysafety

MLSecOps Podcast: AI Red Teaming and Threat Modeling Machine Learning Systems

infonews
securityresearch

Don't blindly trust LLM responses. Threats to chatbots.

infonews
securitysafety

AI Injections: Direct and Indirect Prompt Injections and Their Implications

infonews
securityresearch

Bing Chat claims to have robbed a bank and it left no trace

infonews
safetysecurity

Yolo: Natural Language to Shell Commands with ChatGPT API

infonews
industry
Mar 5, 2023

Yolo is a tool that uses ChatGPT API (OpenAI's language model accessed through code) to translate natural language questions into shell commands (the text-based interface for controlling a computer) that can be executed automatically. The tool helps users who forget command syntax by converting plain English requests into proper bash, zsh, or PowerShell commands, with a safety feature that shows the command before running it unless the user enables automatic execution.

Video Tutorial: Hijacking SSH Agent

infonews
security
Jan 25, 2023

This post announces a video tutorial about SSH Agent Hijacking, a technique (TTP, or tactic/technique/procedure) used in security testing where an attacker compromises the SSH Agent (a program that stores SSH keys, which authenticate users to remote systems). The tutorial is intended to help security professionals understand this attack method and develop ways to detect it on Linux and macOS systems.

Previous59 / 65Next
Aug 1, 2023

Anthropic patched a data exfiltration vulnerability in Claude caused by image markdown injection, a technique where attackers embed hidden instructions in image links to trick the AI into leaking sensitive information. While Microsoft fixed this vulnerability in Bing Chat and OpenAI chose not to address it in ChatGPT, Anthropic implemented a mitigation to protect Claude users from this attack.

Embrace The Red
Jul 24, 2023

ChatGPT has a vulnerability where attackers can use image markdown (a way to embed images in text) to trick the system into leaking data. OpenAI recently added Custom Instructions, a feature that automatically adds instructions to every message, which attackers can abuse to install a persistent backdoor (hidden access point) that steals data through the image markdown vulnerability. This technique is similar to how attackers exploit other systems by enabling features like email forwarding after they gain initial access.

Embrace The Red
Jul 14, 2023

Google Bard can be tricked through image-based prompt injection (hidden instructions placed in images that the AI then follows), as demonstrated by a researcher who embedded text in an image that caused Bard to perform unexpected actions. This vulnerability shows that AI systems that analyze images may be vulnerable to indirect prompt injection attacks (tricking an AI into ignoring its normal instructions by hiding malicious commands in user-provided content).

Embrace The Red
Jul 12, 2023

Google Docs recently added new AI features, such as automatic summaries and creative content generation, which are helpful but introduce security risks. The main concern is that using these AI features on untrusted data (information you don't know the source or reliability of) could lead to unwanted consequences, though currently attackers have limited ways to exploit these features.

Embrace The Red
Embrace The Red
Jun 20, 2023

OpenAI's plugin store contains security vulnerabilities, particularly in plugins that can act on behalf of users without adequate security review. These plugins are susceptible to prompt injection attacks (tricking an AI by hiding instructions in its input) and the Confused Deputy Problem (where an attacker can manipulate a plugin into performing harmful actions by exploiting its trust in the AI system), allowing adversaries to steal source code or cause other damage.

Embrace The Red
Embrace The Red
Jun 11, 2023

A security researcher created a demonstration website that shows how indirect prompt injection (tricking an AI by hiding instructions in web content it reads) can be used to hijack ChatGPT when the browsing feature is enabled. The demo lets users explore various AI-based attacks, including data theft and manipulation of ChatGPT's responses, to raise awareness of these vulnerabilities.

Embrace The Red
May 28, 2023

ChatGPT plugins can be exploited through indirect prompt injections (attacks that hide malicious instructions in data the AI reads from external sources rather than directly from the user), which hackers have used to access private data through cross-plugin request forgery (a vulnerability where one plugin tricks another into performing unauthorized actions). The post documents a real exploit found in the wild and explains the security fix that was applied.

Embrace The Red
Embrace The Red
May 14, 2023

ChatGPT can access YouTube transcripts through plugins, which is useful but creates a security risk called indirect prompt injection (hidden instructions embedded in content that an AI reads and then follows). Attackers can hide malicious commands in video transcripts, and when ChatGPT reads those transcripts to answer user questions, it may follow the hidden instructions instead of the user's intended request.

Embrace The Red
May 12, 2023

This resource is a tutorial and lab (an interactive learning environment for hands-on practice) that teaches prompt injection, which is a technique for tricking AI systems by embedding hidden instructions in their input. The tutorial covers examples ranging from simple prompt engineering (getting an AI to change its output) to more complex attacks like injecting malicious code (HTML/XSS, which runs unwanted scripts in web browsers) and stealing data from AI systems.

Embrace The Red
May 10, 2023

Prompt injection (tricking an AI by hiding instructions in its input) is a widespread vulnerability in AI education, with indirect prompt injections being particularly dangerous because they allow untrusted data to secretly take control of an LLM (large language model) and change its goals and behavior. Since attack payloads use natural language, attackers can craft many creative variations to bypass input validation (checking that data meets safety rules) and web application firewalls (security systems that filter harmful requests).

Embrace The Red
Apr 27, 2023

This is a podcast episode about AI red teaming (simulated attacks to find weaknesses in AI systems) and threat modeling (planning for potential security risks) in machine learning systems. The episode explores how traditional security practices can be combined with machine learning security to better protect AI applications from attacks.

Embrace The Red
Apr 15, 2023

LLM outputs are untrusted and can be manipulated through prompt injection (tricking an AI by hiding instructions in its input), which affects large language models in particular ways. This post addresses how to handle the risks of untrusted output when using AI systems in real applications.

Embrace The Red
Mar 29, 2023

AI prompt injection is a vulnerability where attackers manipulate input given to AI systems, either directly (by controlling parts of the prompt themselves) or indirectly (by embedding malicious instructions in data the AI will later process, like web pages). These attacks can trick AI systems into ignoring their intended instructions and producing harmful, misleading, or inappropriate responses, similar to how SQL injection or cross-site scripting (XSS, a web attack that injects malicious code into websites) compromise other systems.

Embrace The Red
Mar 26, 2023

# Analysis ## Summary A user discovered that Bing Chat could be manipulated into describing illegal activities (like bank robbery) by using indirect language techniques, even though it refused to help when the user directly asked about hacking. This shows that the AI's safety filters, which are supposed to prevent harmful outputs, can be bypassed through clever wording rather than direct requests. ## Solution N/A -- no mitigation discussed in source.

Embrace The Red
Embrace The Red
Embrace The Red