AI Sec Watch: A Security Intelligence Platform for AI Systems

Luu, T.J.

Digest Archive

Daily BriefingSaturday, April 18, 2026

Claude System Prompts Versioned as Git Repository: A researcher transformed Anthropic's published Claude system prompts (the hidden instructions that guide Claude's behavior) into a git repository with timestamped commits, enabling security practitioners to track how Claude's foundational directives have evolved across model versions using standard diff and log tools.

Survey Maps LLM Limitations and Failure Modes: A data-driven survey in ACM Computing Surveys systematically catalogs the known limitations and problems of large language models (LLMs, AI systems trained on massive text datasets), tracking how research into their weaknesses has evolved as the technology matures.

Daily BriefingFriday, April 17, 2026

FastGPT NoSQL Injection Enables Full Account Takeover: FastGPT, an AI agent building platform, has two critical vulnerabilities in versions before 4.14.9.5 allowing unauthenticated attackers to bypass login (CVE-2026-40351) and change any user's password (CVE-2026-40352) through NoSQL injection (inserting special database commands to manipulate queries). Both flaws let attackers gain complete control of administrator accounts without credentials.

Flowise Exposes API Keys and Enables Credential Abuse: Flowise version 3.0.13 leaks plaintext API keys, passwords, and credential IDs through public chatflow endpoints that return unsanitized data, while a separate flaw allows unauthenticated attackers to abuse stored credentials (like OpenAI or ElevenLabs API keys) through the text-to-speech endpoint, burning victims' API credits without authorization.

Anthropic's Claude Mythos Triggers Government Engagement and Access Expansion: Anthropic restricted access to Claude Mythos, an AI model capable of identifying and exploiting software vulnerabilities, to approximately 50 organizations through Project Glasswing rather than releasing it publicly. The White House is now working to authorize a modified version for federal agencies and scheduling meetings with Anthropic's CEO, while the company begins expanding access to UK banks despite concerns from finance leaders.

Daily BriefingThursday, April 16, 2026

Anthropic Releases Claude Opus 4.7 with Intentional Cybersecurity Limits: Anthropic launched Claude Opus 4.7, excelling at software engineering but deliberately weakened in cybersecurity capabilities compared to its Mythos Preview model. The company implemented automated safeguards to detect and block high-risk security requests, using this release to learn how to safely deploy more powerful models while limiting offensive capabilities.

AI Coding Assistants Vulnerable to Prompt Injection via Code Comments: Researchers discovered the "Comment and Control" attack affecting Claude Code, Gemini CLI, and GitHub Copilot Agents, where malicious instructions hidden in code comments trick AI systems into executing unauthorized actions. This prompt injection (tricking an AI by hiding instructions in its input) variant specifically exploits tools designed to help developers write code.

Daily BriefingWednesday, April 15, 2026

OpenAI Launches Specialized Cybersecurity Model with Expanded Defender Access: OpenAI released GPT-5.4-Cyber, a model purpose-built to help security teams identify and remediate vulnerabilities faster, while expanding its Trusted Access for Cyber program to thousands of defenders across hundreds of teams. The company acknowledged the dual-use risk that adversaries could reverse-engineer the model to discover exploitable flaws before patches are deployed, prompting strengthened defenses against jailbreaks (techniques to bypass safety restrictions) and adversarial prompt injections (tricking an AI by hiding malicious instructions in its input).

Critical RCE Vulnerabilities in AI Agent Platforms via MCP Server Exploitation: Two critical remote code execution (RCE, where an attacker can run commands on a system they don't own) vulnerabilities affect Windsurf 1.9544.26 (CVE-2026-30615) and LangChain-ChatChat 0.3.1 (CVE-2026-30617), both exploiting how these platforms handle MCP STDIO servers (a communication interface for running code). Attackers can inject malicious server configurations through prompt injection (hiding malicious instructions in AI input) or exposed management interfaces to execute arbitrary commands on victim systems without user knowledge.

Daily BriefingTuesday, April 14, 2026

OpenAI Launches GPT-5.4-Cyber Model Amid Mythos Controversy: OpenAI announced GPT-5.4-Cyber, a cybersecurity-focused AI model, alongside a three-part risk management strategy, positioning itself against Anthropic's restricted-release Mythos model. Anthropic confirmed it briefed the Trump administration on Mythos, which the company deems too dangerous for public release due to its advanced vulnerability exploitation capabilities that exceed most human expertise.

Command Injection in GitHub Copilot and VS Code: CVE-2026-23653 is a command injection vulnerability (a flaw where an attacker can insert malicious commands into input that gets executed) in GitHub Copilot and Visual Studio Code that allows an authorized attacker to disclose information over a network. The vulnerability stems from improper neutralization of special command elements.

Daily BriefingMonday, April 13, 2026

OpenAI Revokes macOS Certificates After Supply Chain Compromise: OpenAI discovered that a GitHub Actions workflow used to sign its macOS applications downloaded a malicious version of the Axios library containing the WAVESHAPER.V2 backdoor on March 31. The company is revoking its signing certificates as a precaution and requiring all macOS users to update to newly-signed versions by May 8, 2026, after which older apps will lose update support.

Critical Code Execution Flaw in Keras 3.13.0: CVE-2026-1462 allows attackers to execute arbitrary code when loading models in Keras 3.13.0, even with `safe_mode=True` enabled (a setting designed to prevent unsafe operations). The vulnerability exists because the `TFSMLayer` class loads external TensorFlow SavedModels without validating file paths or configuration data.

Daily BriefingSunday, April 12, 2026

Anthropic Withholds AI Model Over Cybersecurity Concerns: Anthropic announced it has developed a powerful AI model called Mythos but will not release it publicly, citing cybersecurity risks. The move attracted attention from government officials, though skeptics question whether security concerns or investment strategy motivated the decision.

Missing Authentication in chatgpt-on-wechat CowAgent: Two high-severity vulnerabilities (CVE-2026-6126 and CVE-2026-6129) affect zhayujie chatgpt-on-wechat CowAgent up to version 2.0.4, both involving missing authentication (failure to verify user identity) on administrative HTTP endpoints and Agent Mode Service. Exploits are publicly available, allowing remote attackers to access control interfaces without credentials, and developers have not yet responded.

Daily BriefingSaturday, April 11, 2026

Anthropic's Claude Code Dominates Enterprise AI Conversation: At a major industry conference, Anthropic's coding agent (a tool that autonomously generates, edits, and reviews code) has eclipsed OpenAI as the focus among executives and investors, generating over $2.5 billion in annualized revenue since its May 2025 launch. The company's narrow focus on coding capabilities rather than product sprawl has accelerated enterprise adoption despite ongoing legal tensions with the Department of Defense.

Spotify Confronts Large-Scale AI Impersonation Campaign: AI-generated music is being uploaded to Spotify under the names of legitimate artists, including prominent musicians like Jason Moran and Drake, prompting the platform to remove over 75 million spammy tracks in the past year. Spotify is developing a pre-publication review tool that will allow artists to approve releases before they appear on the platform, addressing what amounts to identity fraud at scale.

Daily BriefingFriday, April 10, 2026

Anthropic's Mythos Model Sparks Government-Led Banking Sector Security Briefing: U.S. Treasury Secretary Scott Bessent and Federal Reserve Chair Jerome Powell convened major bank CEOs to discuss cybersecurity risks from Anthropic's Claude Mythos, an AI model capable of autonomously discovering zero-day vulnerabilities (previously unknown security flaws) and creating working exploits in operating systems and browsers. Anthropic is limiting access through Project Glasswing while warning that similar capabilities will likely become publicly available within 12-18 months, requiring security teams to adopt new AI-focused defense strategies.

Critical RCE in PraisonAI Browser Server Enables Session Hijacking: PraisonAI's browser bridge server accepts WebSocket connections (two-way communication channels between client and server) without authentication checks, allowing network attackers to hijack legitimate browser extension sessions and intercept all commands and responses without credentials. (GHSA-8x8f-54wf-vv92, critical severity)

Daily BriefingThursday, April 9, 2026

OpenAI Pauses UK Stargate Project Over Regulatory and Energy Concerns: OpenAI halted its planned deployment of up to 8,000 GPUs (specialized hardware for training and running AI models) in the U.K., citing high industrial energy costs and uncertainty around new regulations governing AI's use of copyrighted material. This is a significant setback for Britain's strategy to position itself as an AI development leader.

Critical RCE in PraisonAIAgents Multi-Agent System: PraisonAIAgents versions before 1.5.128 contain a critical vulnerability where user-controlled commands pass directly to subprocess.run() with shell=True, allowing attackers to inject shell metacharacters (special characters like pipes and semicolons that the shell interprets as instructions) and execute arbitrary code. Attackers who gain file-write access through prompt injection (tricking an AI by hiding malicious instructions in its input) can modify configuration files to achieve automatic code execution. (CVE-2026-40111)

Newer8 / 14Older