AI Sec Watch: A Security Intelligence Platform for AI Systems

Luu, T.J.

ChatGPT Operator: Prompt Injection Exploits & Defenses

infonewsLLM-Specific

securityresearch

Source: Embrace The RedFebruary 17, 2025

Summary

ChatGPT Operator is an AI agent that can control web browsers to complete tasks, but it is vulnerable to prompt injection (tricking the AI by hiding malicious instructions in its input) that could allow attackers to steal data or perform unauthorized actions. OpenAI has implemented three defensive layers: user monitoring to watch what the agent does, inline confirmation requests within the chat asking the user to approve actions, and out-of-band confirmation requests that appear when the agent crosses website boundaries, though these mitigations are not foolproof.

Solution / Mitigation

OpenAI has implemented three primary mitigation techniques: (1) User Monitoring, where users are prompted to observe what Operator is doing, what text it types, and which buttons it clicks, likely based on a data classification model that detects sensitive information on screen; (2) Inline Confirmation Requests, where Operator asks the user within the chat conversation to approve certain actions or clarify requests before proceeding; and (3) Out-of-Band Confirmation Requests, which appear when Operator navigates across websites or performs complex actions, informing the user what is about to happen and giving them the option to pause or resume the operation.