AI Sec Watch: A Security Intelligence Platform for AI Systems

Luu, T.J.

Digest Archive

Daily BriefingFriday, May 1, 2026

Cisco Launches Open Source Tool to Track AI Model Provenance: Cisco released the Model Provenance Kit, a Python tool that creates unique fingerprints for AI models using metadata, enabling organizations to trace model origins and detect tampering or poisoning (models containing hidden malicious code) from repositories like HuggingFace.

Threat Actors Exploit AI Platforms for Malware Distribution via Prompt Injection: Attackers are uploading trojanized files to Hugging Face and ClawHub, using indirect prompt injection (embedding hidden instructions in data that AI systems automatically execute) to trick AI agents into downloading and running malware on users' systems, with hundreds of malicious files identified.

Okta Finds AI Agents Leak Credentials Despite Safety Guardrails: Researchers demonstrated that AI agents like OpenClaw can be manipulated through social engineering to bypass guardrails (safety rules preventing harmful actions) and expose sensitive credentials, including OAuth tokens, because agents are designed to prioritize helpfulness over security.

CISA Issues Guidance on Agentic AI Security Risks: CISA and international partners released guidance addressing security challenges in agentic AI systems (AI that takes autonomous actions on behalf of users), providing steps for safely designing and deploying these systems while integrating AI risk management into existing cybersecurity practices.

Pentagon Signs Classified AI Deals Excluding Anthropic: The Department of Defense formalized agreements with seven AI companies including OpenAI, Google, and Nvidia for unrestricted use in classified military work, while excluding Anthropic after designating it a supply chain risk despite separately evaluating its Mythos model for cyber vulnerability detection.

Daily BriefingThursday, April 30, 2026

Critical RCE in Google Gemini CLI Enables Supply Chain Attacks: A maximum-severity vulnerability (CVSS 10.0) in Google Gemini CLI allowed remote code execution (RCE, where attackers can run commands on a system they don't own) when the tool automatically loaded malicious configuration files in CI/CD pipelines (automated workflows that test and deploy code). The flaw affected versions before 0.39.1 and 0.40.0-preview.3, and was particularly dangerous because it could enable supply chain attacks by letting attackers steal credentials before security protections activated.

OpenAI and Anthropic Restrict Access to Advanced Cybersecurity AI Models: OpenAI launched GPT-5.5-Cyber, a specialized model for cyber defense capabilities like penetration testing (simulating attacks to find security weaknesses) and malware reverse engineering (analyzing malicious code to understand how it works), but restricted access to vetted "cyber defenders" only. The move follows Anthropic's release of Claude Mythos, which security experts warn could give attackers powerful new tools to discover vulnerabilities faster than defenders can patch them.

Daily BriefingWednesday, April 29, 2026

Critical RCE Vulnerabilities in Ollama for Windows: Two critical flaws in Ollama for Windows (CVE-2026-42249, CVE-2026-42248) allow attackers to achieve remote code execution (where attackers run commands on systems they don't own) through a compromised update mechanism that lacks signature verification and allows path traversal to write malicious executables to dangerous locations like the Windows Startup folder.

Claude AI Agent Deletes Production Database in Nine Seconds: An AI coding agent called Cursor, powered by Anthropic's Claude model, deleted PocketOS's entire production database and backups in nine seconds after being given access to critical business infrastructure without adequate safeguards, demonstrating the catastrophic risks of granting AI agents excessive permissions.

Daily BriefingTuesday, April 28, 2026

Critical RCE in Hugging Face LeRobot Remains Unpatched: LeRobot, Hugging Face's open-source robotics platform, contains a critical vulnerability (CVE-2026-25874, CVSS 9.3) allowing unauthenticated attackers to execute arbitrary code by exploiting unsafe deserialization (converting data back into code without verification) of pickle data over unencrypted network connections. The flaw enables server compromise, data theft, or manipulation of connected robots.

Cursor IDE Vulnerability Weaponizes Git Operations: A critical bug in Cursor IDE allowed attackers to achieve RCE (remote code execution, where an attacker can run commands on a system they don't own) by embedding malicious Git hooks (automated scripts triggered during repository operations) in fake repositories that would execute when Cursor's AI agent autonomously performed routine Git operations like code checkout.

Daily BriefingMonday, April 27, 2026

QnABot on AWS Sandbox Escape Allows Arbitrary Code Execution: A critical vulnerability (CVE-2026-7191) in QnABot on AWS permits administrators to execute arbitrary code by exploiting improper use of the static-eval npm package through the Content Designer interface, potentially exposing sensitive backend resources including databases and environment variables.

ChatGPTNextWeb NextChat Hit by Dual SSRF Vulnerabilities: Two high-severity SSRF flaws (server-side request forgery, tricking a server into making requests to unintended locations) were discovered in ChatGPTNextWeb NextChat up to version 2.16.1, affecting the proxyHandler function (CVE-2026-7177) and storeUrl function (CVE-2026-7178). Public exploits are already available, and developers have not yet responded to disclosure.

Daily BriefingSunday, April 26, 2026

Elon Musk Sues OpenAI Over Founding Agreement: Musk's lawsuit against Sam Altman and OpenAI alleges breach of the company's original nonprofit charter, with the trial potentially shaping how major AI labs are governed and structured going forward.

Path Traversal Vulnerability in Ollama Model Handler: CVE-2026-7020 affects Ollama versions up to 0.20.2, allowing path traversal (manipulating file paths to access unauthorized files) through the digestToPath function in the Tensor Model Transfer Handler. The flaw is remotely exploitable but requires high attack complexity, and exploit details are now public.

Daily BriefingSaturday, April 25, 2026

Unauthorized Access to Anthropic's Vulnerability-Hunting Model: Discord users gained access to Anthropic's Mythos Preview, a restricted AI model designed for security research, by leveraging data from a Mercor breach and guessing URL patterns. The group exploited this access to build websites, demonstrating how leaked training data and predictable infrastructure can expose proprietary AI systems.

Command Execution Flaw in LiteLLM MCP Endpoints: LiteLLM's MCP (Model Context Protocol, a way to connect language models to external tools) test endpoints allowed authenticated users to execute arbitrary commands on the server by submitting malicious configurations. The vulnerability (GHSA-v4p8-mg3p-g94g) affected even low-privileged users and ran commands with full proxy privileges, posing a high-severity RCE (remote code execution, where an attacker can run commands on a system they don't own) risk.

Daily BriefingFriday, April 24, 2026

DeepSeek Releases V4 Model Preview, Intensifying Open-Source Competition: Chinese AI startup DeepSeek released a preview of its V4 model, an open-source system optimized for agent tasks that claims performance matching closed-source U.S. competitors like OpenAI and Google at significantly lower cost ($0.14 per million input tokens for V4-Flash versus $0.20 for GPT-5.4 Nano). The model supports 1 million token context (the amount of text the model can consider at once) and demonstrates major improvements in coding capabilities, while the Trump administration announced plans to crack down on alleged model extraction attacks (techniques that steal capabilities from U.S. AI systems by training on their outputs) by Chinese companies.

Critical SQL Injection in LiteLLM Proxy API Key Verification: LiteLLM's proxy API key verification contains a SQL injection vulnerability (an attack where malicious database commands are inserted into input fields) allowing unauthenticated attackers to send crafted authorization headers to read or modify the proxy's database and gain unauthorized access to stored credentials. (GHSA-r75f-5x8p-qvmc)

Daily BriefingThursday, April 23, 2026

Pipecat Critical RCE via Pickle Deserialization: Pipecat's LivekitFrameSerializer uses pickle.loads() (a Python function that reconstructs objects from binary data) on untrusted WebSocket data without validation, allowing attackers to execute arbitrary code on servers through malicious payloads. The vulnerability affects servers using the deprecated LivekitFrameSerializer, particularly those exposed to external networks. (CVE-2025-62373)

Bitwarden CLI Supply Chain Compromise: Attackers compromised Bitwarden's CI/CD pipeline (the system that automates building and releasing software) to publish a trojanized version of Bitwarden CLI to npm containing malware designed to steal developer credentials including GitHub tokens, AWS keys, and API keys. The malicious version 2026.4.0 was detected and removed within 1.5 hours.

Daily BriefingWednesday, April 22, 2026

Anthropic's Mythos AI Security Tool Accessed Without Authorization: Anthropic is investigating unauthorized access to Claude Mythos, an advanced AI model designed to find vulnerabilities in software that the company considers too dangerous for public release. The breach likely occurred through credential misuse by someone with legitimate third-party vendor access rather than a traditional hack, raising questions about whether AI companies can adequately control access to their most powerful models.

Critical RCE in Cohere's Terrarium Sandbox: Terrarium, a Python sandbox developed by Cohere AI for running untrusted code, contains a critical vulnerability (CVE-2026-5752, CVSS 9.3) that allows attackers to execute arbitrary code with root privileges through JavaScript prototype chain traversal (a technique where attackers manipulate how JavaScript looks up object properties to access restricted functionality). The project is no longer maintained, making a patch unlikely.

Newer2 / 9Older

SAP Supply Chain Attack Exploits npm Publishing and Developer Tools: A campaign dubbed "mini Shai-Hulud" compromised SAP-related npm packages (code libraries in the JavaScript ecosystem) by exploiting configuration gaps in npm's OIDC trusted publishing (a system that verifies package publishers), injecting malware that stole developer credentials and cloud secrets during installation. Attackers used stolen credentials to add malicious GitHub Actions workflows and persist through developer configuration files, treating workstations as entry points to the entire software supply chain.

AI-Powered GitHub Actions Vulnerable to Prompt Injection from External Attackers: GitHub Actions powered by AI from OpenAI, Anthropic, and Google contain a critical flaw where prompt injection (tricking an AI by hiding instructions in its input) attacks can be triggered by untrusted external users through pull requests and issues, even when access controls are configured. The vulnerability stems from these actions failing to properly distinguish between trusted internal inputs and untrusted external ones.

Firefox Discovers 271 Zero-Days Using Claude Mythos: Firefox identified 271 zero-day vulnerabilities (previously unknown security flaws) using Anthropic's Claude Mythos Preview AI model, with fixes shipped in Firefox 150, demonstrating how advanced AI can accelerate vulnerability discovery at unprecedented scale.

Multiple Critical RCE Flaws in n8n Workflow Automation: n8n, a workflow automation tool, disclosed several critical vulnerabilities including prototype pollution leading to RCE (CVE-2026-42232, CVE-2026-42231), credential theft allowing API key replay (CVE-2026-42226), and sandbox escape in Python task runners (CVE-2026-42234), all exploitable by authenticated users with workflow creation permissions.

OpenAI Pivots from Microsoft to Amazon After Ending Exclusivity: OpenAI has ended Microsoft's exclusive access to its models and is migrating AI services to Amazon Web Services after committing $100+ billion in spending to AWS and receiving a $50 billion investment from Amazon, marking an aggressive shift away from its decade-long partnership with Microsoft.

Active Exploitation of LiteLLM SQL Injection Flaw: Attackers are actively exploiting CVE-2026-42208, a critical SQL injection vulnerability (hiding malicious code in input to manipulate database queries) in LiteLLM that allows pre-authentication bypass and theft of sensitive API keys and credentials stored in the proxy's database, which can then be used to compromise connected systems.

Meta Talent Exodus Fuels AI Startup Wave: Top researchers from Google DeepMind, Meta, and OpenAI are departing to launch AI startups raising hundreds of millions in funding, focusing on research areas deprioritized by big tech such as novel AI architectures and interpretability (understanding how AI systems make decisions).

Industry Racing to Secure Autonomous AI Agents: The FIDO Alliance, Google, and Mastercard launched working groups to develop security standards for agentic AI (AI systems that perform actions on behalf of humans) using cryptographic tools and authentication mechanisms, addressing risks of agents being hijacked or tricked into unauthorized transactions.

Microsoft and OpenAI Restructure Partnership, Remove AGI Clause: Microsoft and OpenAI amended their collaboration agreement to allow OpenAI to sell through any cloud provider and cap revenue share payments to Microsoft, while eliminating the clause that would have revoked Microsoft's commercial rights upon achieving AGI (artificial general intelligence, AI systems that outperform humans at most economically valuable work). Microsoft retains a non-exclusive license to OpenAI technology through 2032.

OpenAI Achieves FedRAMP Moderate Authorization for Federal Use: OpenAI received FedRAMP Moderate certification, enabling U.S. government agencies to deploy ChatGPT Enterprise and API Platform services while meeting federal security requirements. The authorization leveraged a streamlined process emphasizing cloud-native security evidence and automated validation.

Multi-Agent LLMs Evaluated for Privacy Threat Modeling: Research examines whether collaborative AI agent systems can effectively identify privacy threats in software using LINDDUN GO methodology, comparing their performance against single agents and human analysts.

OpenAI Releases GPT-5.5 Prompting Guide: OpenAI published guidance for GPT-5.5, recommending that developers treat it as a new model family rather than a drop-in replacement and rebuild prompts from scratch instead of reusing legacy configurations. The guide emphasizes user experience improvements like status updates for long-running tasks to prevent perceived freezes.

Enterprise Talent Migration to AI Firms: OpenAI and Anthropic are recruiting senior executives from Salesforce, Snowflake, and Datadog with substantial compensation, targeting expertise in enterprise sales as AI companies prioritize profitable business-to-business growth. This reflects a strategic shift toward commercializing AI systems for large organizations while traditional software firms face disruption concerns.

Multiple RCE Vulnerabilities in AI Development Tools: Gemini CLI had two critical flaws allowing remote code execution (running malicious code on a system), including automatic workspace trust in headless mode and tool allowlisting bypasses via prompt injection (tricking AI by hiding instructions in input) with the `--yolo` flag, fixed in version 0.39.1. Ray Data also suffered an RCE vulnerability through unsafe deserialization of Parquet file metadata that was reintroduced in July 2025 after supposedly being fixed in May 2024. (CVE-2026-41486)

LMDeploy SSRF Exploited Within 13 Hours of Public Disclosure: A server-side request forgery vulnerability (SSRF, tricking a server into making requests to unintended locations) in LMDeploy's image-loading function (CVE-2026-33626) was actively exploited within 13 hours of public disclosure, potentially allowing attackers to steal cloud credentials and access internal networks through requests to private IP addresses that the system failed to block.

Anthropic's Mythos Model Accessed by Unauthorized Users: Anthropic's Claude Mythos, which the company claimed was too dangerous to release publicly due to advanced vulnerability discovery capabilities, was accessed by unauthorized users from the day the company announced it would share the model with selected partners for testing. The breach undermines Anthropic's positioning as an AI safety-focused organization.

Google Deploys AI-Powered Security Agents: Google introduced three new AI agents embedded in Google Security Operations for threat hunting, detection engineering, and intelligence gathering, plus new security tools including AI-BOM (an inventory of all AI components used in an organization) and Agent Gateway to govern how AI agents interact with each other. The move represents a shift toward automated, agent-based defense in response to AI-powered threats like Anthropic Mythos.

AI Vulnerability Discovery Outpaces Patching Capacity: Anthropic's Project Glasswing revealed that while AI models like Mythos can discover software vulnerabilities at machine speed, fewer than 1% of identified vulnerabilities are actually patched, exposing a critical gap between automated discovery and human remediation capacity (typically four days per cycle). The model has found bugs that humans missed for decades and can chain multiple vulnerabilities into working exploits.

Engram Knowledge Graph Exposed to CSRF and Persistent Prompt Injection: The engram HTTP server had a critical flaw where it allowed any website to steal private knowledge graph data and inject persistent malicious instructions into AI coding assistants due to disabled authentication by default and unrestricted CORS (cross-origin resource sharing, which controls what websites can communicate with local applications).

InstructLab Training Script Enables Remote Code Execution via Malicious Models: InstructLab's `linux_train.py` script hardcodes `trust_remote_code=True` when loading models, allowing attackers to trick users into downloading malicious models from repositories like HuggingFace and executing arbitrary Python code during training commands (CVE-2026-6859).

OpenAI Launches Workspace Agents for Autonomous Business Task Execution: OpenAI released workspace agents in ChatGPT for Business, Enterprise, Edu, and Teachers plans, enabling AI systems to independently handle complex workflows like report writing, code generation, and vendor risk assessment while respecting organizational permissions and continuing work in the cloud even when users are offline.