aisecwatch.com
DashboardVulnerabilitiesNewsResearchArchiveStatsDatasetFor devs
Subscribe
aisecwatch.com

Real-time AI security monitoring. Tracking AI-related vulnerabilities, safety and security incidents, privacy risks, research developments, and policy changes.

Navigation

VulnerabilitiesNewsResearchDigest ArchiveNewsletter ArchiveSubscribeData SourcesStatisticsDatasetAPIIntegrationsWidgetRSS Feed

Maintained by

Truong (Jack) Luu

Information Systems Researcher

Digest Archive

Daily BriefingSunday, June 7, 2026

No new AI/LLM security issues were identified today.

Daily BriefingSaturday, June 6, 2026
>

OpenAI Launches Lockdown Mode to Counter Prompt Injection: OpenAI introduced Lockdown Mode, a security feature that disables web browsing, image retrieval, deep research, and agent capabilities to mitigate prompt injection attacks (when malicious instructions hidden in webpages or uploaded content manipulate an AI's responses). The feature reduces but does not eliminate exfiltration risks, as attacks may still occur through cached content or uploaded files.

>

AI Agent Discovers 21 Zero-Days in FFmpeg as Chrome Patches Record 429 Bugs: An AI security agent uncovered 21 previously unknown vulnerabilities (zero-days, or security flaws unknown to the public) in FFmpeg, a widely-used media library, while Google released Chrome 149 with 429 security patches in a single update. The development underscores how AI tools are accelerating vulnerability discovery, creating pressure on security teams to remediate flaws at unprecedented speed.

>

Meta's AI Account Support Tool Exploited for Account Takeover: Meta's AI-powered account support tool, launched in March to automate password resets and other functions, has been discovered and exploited by attackers to take over user accounts. The vulnerability represents a concerning case of AI automation introducing new attack vectors in critical authentication systems.

>

Persistent Backdoor Attack Targets Personalized Federated Learning: Researchers identified a stealthy backdoor attack (hidden malicious code inserted into AI systems) specifically designed to compromise personalized federated learning (a privacy-focused method where multiple computers train an AI model together without sharing raw data). The attack is engineered to evade detection and persist across training cycles.

Daily BriefingFriday, June 5, 2026
>

NSA Deploys Anthropic Engineers to Integrate Mythos for Cyber Operations: Anthropic has reportedly sent engineers to the NSA to help the intelligence agency deploy Mythos, a cybersecurity-focused AI model, marking a significant reversal after the Department of Defense previously banned NSA use of Anthropic's technology over supply-chain concerns related to mass surveillance and autonomous weapons restrictions.

>

Critical Prompt Injection Flaws in Amazon Q and Claude Code Expose Developer Environments: Multiple high-severity vulnerabilities affect AI coding assistants, including Amazon Q and Kiro IDE suffering from prompt injection (tricking AI by hiding malicious instructions in files) that could execute unauthorized commands, while Claude Code's GitHub Action leaked credentials by processing untrusted content without proper sandboxing, and separately stores OAuth tokens in plaintext vulnerable to silent exfiltration via malicious npm packages.

Daily BriefingThursday, June 4, 2026
>

Hugging Face Transformers RCE Bypasses Security Settings: A high-severity vulnerability in the widely used Hugging Face Transformers library allows attackers to execute malicious code even when developers enable trust_remote_code=false protections, which are meant to block remote code execution. The attack hides instructions in a fake configuration parameter that leaves no traces, affecting versions 4.56.0 through 5.2.x in a library downloaded millions of times weekly.

>

Gemini Voice Assistant Hijacked via Messaging Notifications: Researchers discovered a critical vulnerability where attackers could inject malicious commands into Google's Gemini voice assistant through WhatsApp, Slack, or SMS notifications using a technique called Fake Context Alignment, allowing control of smart home devices and calls without user knowledge. Google patched the prompt injection (tricking an AI by hiding instructions in its input) vulnerability in November 2025 with improved content classifiers.

Daily BriefingWednesday, June 3, 2026
>

Malicious AI Skills Flooding Public Marketplaces: Public marketplaces for AI skills (specialized add-ons that extend AI agent capabilities) are being flooded with malicious packages that steal passwords and data, and security scanners designed to detect these threats can be bypassed in under an hour through simple modifications because they rely on static detection methods.

>

Microsoft Ships Sandbox for Autonomous AI Agents: Microsoft released Microsoft Execution Container (MXC), a sandbox (an isolated environment that restricts what a program can do) designed to limit what autonomous AI agents can access as companies deploy agents that independently run code and access files in development workflows.

>
Daily BriefingTuesday, June 2, 2026
>

LibreChat Leaks Secrets via Environment Variable Injection: LibreChat, a ChatGPT-like application supporting multiple AI providers, has a critical vulnerability (CVE-2026-32625) in versions up to 0.8.3 where it unsafely replaces environment variable placeholders when validating user-provided server URLs. An authenticated attacker can create a malicious server configuration that tricks LibreChat into sending sensitive secrets like encryption keys and database credentials to an attacker-controlled server, compromising the entire installation without needing admin access.

>

Malicious npm Package Targets OpenAI Codex Users: Attackers published a malicious npm package called codexui-android that appeared to be a legitimate tool for OpenAI Codex users but secretly stole authentication tokens and sent them to an external server, reaching about 27,000 weekly downloads before detection. The attack exploited a supply chain gap where malicious code was hidden in the distributed package but not visible in the public source code repository, reflecting broader vulnerability in AI software security where developer tokens provide persistent access to accounts.

Daily BriefingMonday, June 1, 2026
>

OpenAI Codex Tokens Stolen in npm Supply Chain Attack: Attackers compromised the popular npm package codexui-android and embedded malicious code that steals authentication tokens from OpenAI Codex users, including refresh tokens that never expire and allow indefinite impersonation. This supply chain compromise gives attackers persistent access to victims' code-generation AI accounts and any systems those accounts can reach.

>

Anthropic Opens Mythos Access to EU Security Agency: Anthropic is granting the EU's ENISA access to its Mythos model, an advanced AI system that excels at finding security flaws in software, after months of requests driven by concerns that bad actors could use it to exploit thousands of previously unknown vulnerabilities. The move reflects growing tension between giving defenders powerful tools and preventing offensive misuse.

Daily BriefingSunday, May 31, 2026
>

Fashion Industry Adopts AI-Generated Models at Scale: Retailers are replacing human models and photographers with generative AI (machine learning systems that create new images) for product imagery, raising questions about disclosure standards and workforce displacement in commercial photography.

Daily BriefingSaturday, May 30, 2026
>

Anthropic Details Multi-Layer Agent Containment Architecture: Anthropic published technical documentation on their containment strategy for Claude, combining process sandboxes (isolated execution environments), virtual machines, filesystem boundaries, and egress controls (preventing unauthorized data transfer) to prevent credential theft and data exfiltration even when adversaries attempt to exploit the AI model or user inputs.

>

Model X-Ray Detects Malware Hidden in AI Weights: Researchers developed a technique using few-shot learning (training with minimal examples) to identify malicious code embedded in model weights (the numerical parameters of trained AI systems), addressing a novel attack vector where adversaries conceal malware inside AI models that evades conventional detection.

Daily BriefingFriday, May 29, 2026
>

Critical Sandbox Escapes in PraisonAI Allow Full System Compromise: Multiple critical vulnerabilities in PraisonAI (an AI agent framework) allow authenticated and unauthenticated attackers to execute arbitrary commands on host systems. The flaws include sandbox escape via Python builtins access (CVE-2026-47392), unauthenticated RCE (remote code execution, where attackers run commands on systems they don't own) through an unsafe eval() tool in agent-to-agent servers (CVE-2026-47391), and API deployments with authentication disabled by default (CVE-2026-47393).

>

RAGFlow Template Injection Enables Remote Command Execution: RAGFlow, an open-source RAG (retrieval-augmented generation, where AI pulls external documents to answer questions) engine, contains a Jinja2 template injection vulnerability in version 0.24.0 and earlier that lets any registered user execute arbitrary OS commands by creating a malicious Canvas workflow (CVE-2026-45312).

1 / 12Older
>

Cross-Workspace IDOR in PraisonAI Allows Unauthorized Agent Access: PraisonAI Platform contains an insecure direct object reference (IDOR, a flaw where users access resources by guessing IDs) vulnerability (CVE-2026-47419) that allows any authenticated user to read, modify, or delete AI agents across all workspaces because the system verifies workspace membership but never checks if the target agent belongs to that specific workspace.

>

MCP Server Kubernetes Vulnerability Enables Bearer Token Theft via Flag Injection: The `kubectl_generic` tool in `mcp-server-kubernetes` accepts unvalidated kubectl flags, allowing attackers to inject `--server` and `--insecure-skip-tls-verify` parameters that redirect Kubernetes bearer tokens (authentication credentials) to attacker-controlled servers when privileged operators follow AI agent instructions embedded in logs, enabling full cluster compromise (CVE-2026-47250).

>

Claude Code GitHub Action Enabled Repository Hijacking: A security flaw in Anthropic's Claude Code GitHub Action allowed attackers to compromise repositories by opening a single malicious GitHub issue that exploited broken permission checks and indirect prompt injection, potentially stealing credentials and poisoning the action for downstream projects.

>

Microsoft Updates AI Agent Failure Taxonomy After Year of Red Teaming: Microsoft's AI Red Team released an updated taxonomy of failure modes in agentic AI systems (AI that can autonomously perform tasks) based on 12 months of real-world testing, adding seven new categories including agentic supply chain compromise, goal hijacking, and inter-agent trust escalation driven by widespread vulnerabilities in open-source frameworks and tool ecosystems.

Google Gemini Notification Hijacking Patched: A vulnerability in Google Gemini's Android voice assistant allowed attackers to embed malicious commands in notifications from apps like WhatsApp or Slack through prompt injection (hiding harmful instructions in input data), enabling them to manipulate the assistant's responses and actions without installing malware on the device. Google has already patched this flaw with no evidence of real-world exploitation.

>

UK Regulators Force Google to Offer AI Opt-Out: The UK's Competition and Markets Authority ruled that Google must allow website publishers to opt out of AI Search features like AI Overviews (summaries generated by AI) and prevent their content from being used to train Google's models, giving publishers more control over how AI systems use their content.

>

OpenMed RCE Through Malicious Model Loading: OpenMed versions before 1.5.2 have a critical remote code execution vulnerability (CVE-2026-47117, RCE allows attackers to run commands on the affected system) in how it loads privacy-filter models. The vulnerability exists because the software uses overly broad pattern matching on user-supplied model names, allowing unauthenticated attackers to trick it into loading malicious code from external sources that gets executed with the same permissions as the OpenMed service.

>

Anthropic Expands Mythos Vulnerability Scanning to 150 Organizations: Anthropic is expanding Project Glasswing, which uses its Claude Mythos AI model to find software vulnerabilities, to 150 additional organizations across 15+ countries in critical sectors like power, water, and healthcare. Since the initial launch with 50 partners, Project Glasswing participants have discovered over 10,000 high or critical-level security flaws using the AI tool.

>

Flowise MCP Implementation Enables Remote Code Execution: Flowise, an open-source platform for building self-hosted AI assistants, contains a critical RCE (remote code execution, where an attacker can run commands on a system they don't own) vulnerability in its Model Context Protocol implementation that allows attackers to execute arbitrary commands by importing a malicious chatflow. Flowise's attempted input validation patches have proven ineffective at stopping the exploit.

>

CrowdStrike Launches AI Discovery for Shadow AI Control: CrowdStrike released AI Discovery and Governance for Falcon for IT to help organizations find and control unsanctioned AI tools running across their infrastructure, addressing the risk that shadow AI (locally deployed models and tools running without centralized oversight) inherits existing permissions and expands the attack surface without visibility.

>

Vatican Addresses AI Harms with Anthropic Participation: Pope Leo XIV issued major guidance on AI risks including job displacement and weaponization, with Anthropic co-founder Chris Olah participating in the Vatican ceremony amid criticism that such partnerships may produce superficial messaging rather than substantive examination of AI technology's impact.

>

Attackers Deploy LLM Agents for Post-Exploitation After Marimo Breach: An attacker exploited CVE-2026-39987, a critical vulnerability in Marimo notebook software allowing unauthenticated RCE, then deployed an LLM agent (an autonomous AI system that can plan and execute tasks) to steal cloud credentials and database information. This represents a notable shift from traditional scripted attacks, as the AI agent can adapt to unexpected obstacles in real-time.

>

ChatGPT Share Links Weaponized in Malware and Phishing Campaigns: Attackers are exploiting ChatGPT's share feature to host fake outage pages on legitimate OpenAI URLs, distributing malware through Google ads in the "LLMShare" campaign. Separately, the ChatGPhish vulnerability allows attackers to embed malicious instructions in web pages that ChatGPT processes and renders as trusted phishing links when users request summaries, potentially leaking IP addresses and bypassing desktop security filters.

>

Vibe Coding Explosion Creates Massive Shadow IT Security Gap: Over 2,000 exposed applications built by employees using AI-driven development platforms (vibe coding, where non-programmers build working apps by describing what they want) were discovered containing sensitive data across major companies, published on the public internet without access controls. Traditional security tools like EDR (endpoint detection and response) and DLP (data loss prevention) fail to detect these cloud-to-cloud connections because they were designed for different threat models.