Claude System Prompts Versioned as Git Repository: A researcher transformed Anthropic's published Claude system prompts (the hidden instructions that guide Claude's behavior) into a git repository with timestamped commits, enabling security practitioners to track how Claude's foundational directives have evolved across model versions using standard diff and log tools.
Survey Maps LLM Limitations and Failure Modes: A data-driven survey in ACM Computing Surveys systematically catalogs the known limitations and problems of large language models (LLMs, AI systems trained on massive text datasets), tracking how research into their weaknesses has evolved as the technology matures.
FastGPT NoSQL Injection Enables Full Account Takeover: FastGPT, an AI agent building platform, has two critical vulnerabilities in versions before 4.14.9.5 allowing unauthenticated attackers to bypass login (CVE-2026-40351) and change any user's password (CVE-2026-40352) through NoSQL injection (inserting special database commands to manipulate queries). Both flaws let attackers gain complete control of administrator accounts without credentials.
Flowise Exposes API Keys and Enables Credential Abuse: Flowise version 3.0.13 leaks plaintext API keys, passwords, and credential IDs through public chatflow endpoints that return unsanitized data, while a separate flaw allows unauthenticated attackers to abuse stored credentials (like OpenAI or ElevenLabs API keys) through the text-to-speech endpoint, burning victims' API credits without authorization.
Anthropic's Claude Mythos Triggers Government Engagement and Access Expansion: Anthropic restricted access to Claude Mythos, an AI model capable of identifying and exploiting software vulnerabilities, to approximately 50 organizations through Project Glasswing rather than releasing it publicly. The White House is now working to authorize a modified version for federal agencies and scheduling meetings with Anthropic's CEO, while the company begins expanding access to UK banks despite concerns from finance leaders.
Anthropic Releases Claude Opus 4.7 with Intentional Cybersecurity Limits: Anthropic launched Claude Opus 4.7, excelling at software engineering but deliberately weakened in cybersecurity capabilities compared to its Mythos Preview model. The company implemented automated safeguards to detect and block high-risk security requests, using this release to learn how to safely deploy more powerful models while limiting offensive capabilities.
AI Coding Assistants Vulnerable to Prompt Injection via Code Comments: Researchers discovered the "Comment and Control" attack affecting Claude Code, Gemini CLI, and GitHub Copilot Agents, where malicious instructions hidden in code comments trick AI systems into executing unauthorized actions. This prompt injection (tricking an AI by hiding instructions in its input) variant specifically exploits tools designed to help developers write code.
OpenAI Launches Specialized Cybersecurity Model with Expanded Defender Access: OpenAI released GPT-5.4-Cyber, a model purpose-built to help security teams identify and remediate vulnerabilities faster, while expanding its Trusted Access for Cyber program to thousands of defenders across hundreds of teams. The company acknowledged the dual-use risk that adversaries could reverse-engineer the model to discover exploitable flaws before patches are deployed, prompting strengthened defenses against jailbreaks (techniques to bypass safety restrictions) and adversarial prompt injections (tricking an AI by hiding malicious instructions in its input).
Critical RCE Vulnerabilities in AI Agent Platforms via MCP Server Exploitation: Two critical remote code execution (RCE, where an attacker can run commands on a system they don't own) vulnerabilities affect Windsurf 1.9544.26 (CVE-2026-30615) and LangChain-ChatChat 0.3.1 (CVE-2026-30617), both exploiting how these platforms handle MCP STDIO servers (a communication interface for running code). Attackers can inject malicious server configurations through prompt injection (hiding malicious instructions in AI input) or exposed management interfaces to execute arbitrary commands on victim systems without user knowledge.
OpenAI Launches GPT-5.4-Cyber Model Amid Mythos Controversy: OpenAI announced GPT-5.4-Cyber, a cybersecurity-focused AI model, alongside a three-part risk management strategy, positioning itself against Anthropic's restricted-release Mythos model. Anthropic confirmed it briefed the Trump administration on Mythos, which the company deems too dangerous for public release due to its advanced vulnerability exploitation capabilities that exceed most human expertise.
Command Injection in GitHub Copilot and VS Code: CVE-2026-23653 is a command injection vulnerability (a flaw where an attacker can insert malicious commands into input that gets executed) in GitHub Copilot and Visual Studio Code that allows an authorized attacker to disclose information over a network. The vulnerability stems from improper neutralization of special command elements.
OpenAI Revokes macOS Certificates After Supply Chain Compromise: OpenAI discovered that a GitHub Actions workflow used to sign its macOS applications downloaded a malicious version of the Axios library containing the WAVESHAPER.V2 backdoor on March 31. The company is revoking its signing certificates as a precaution and requiring all macOS users to update to newly-signed versions by May 8, 2026, after which older apps will lose update support.
Critical Code Execution Flaw in Keras 3.13.0: CVE-2026-1462 allows attackers to execute arbitrary code when loading models in Keras 3.13.0, even with `safe_mode=True` enabled (a setting designed to prevent unsafe operations). The vulnerability exists because the `TFSMLayer` class loads external TensorFlow SavedModels without validating file paths or configuration data.
Anthropic Withholds AI Model Over Cybersecurity Concerns: Anthropic announced it has developed a powerful AI model called Mythos but will not release it publicly, citing cybersecurity risks. The move attracted attention from government officials, though skeptics question whether security concerns or investment strategy motivated the decision.
Missing Authentication in chatgpt-on-wechat CowAgent: Two high-severity vulnerabilities (CVE-2026-6126 and CVE-2026-6129) affect zhayujie chatgpt-on-wechat CowAgent up to version 2.0.4, both involving missing authentication (failure to verify user identity) on administrative HTTP endpoints and Agent Mode Service. Exploits are publicly available, allowing remote attackers to access control interfaces without credentials, and developers have not yet responded.
Anthropic's Claude Code Dominates Enterprise AI Conversation: At a major industry conference, Anthropic's coding agent (a tool that autonomously generates, edits, and reviews code) has eclipsed OpenAI as the focus among executives and investors, generating over $2.5 billion in annualized revenue since its May 2025 launch. The company's narrow focus on coding capabilities rather than product sprawl has accelerated enterprise adoption despite ongoing legal tensions with the Department of Defense.
Spotify Confronts Large-Scale AI Impersonation Campaign: AI-generated music is being uploaded to Spotify under the names of legitimate artists, including prominent musicians like Jason Moran and Drake, prompting the platform to remove over 75 million spammy tracks in the past year. Spotify is developing a pre-publication review tool that will allow artists to approve releases before they appear on the platform, addressing what amounts to identity fraud at scale.
Anthropic's Mythos Model Sparks Government-Led Banking Sector Security Briefing: U.S. Treasury Secretary Scott Bessent and Federal Reserve Chair Jerome Powell convened major bank CEOs to discuss cybersecurity risks from Anthropic's Claude Mythos, an AI model capable of autonomously discovering zero-day vulnerabilities (previously unknown security flaws) and creating working exploits in operating systems and browsers. Anthropic is limiting access through Project Glasswing while warning that similar capabilities will likely become publicly available within 12-18 months, requiring security teams to adopt new AI-focused defense strategies.
Critical RCE in PraisonAI Browser Server Enables Session Hijacking: PraisonAI's browser bridge server accepts WebSocket connections (two-way communication channels between client and server) without authentication checks, allowing network attackers to hijack legitimate browser extension sessions and intercept all commands and responses without credentials. (GHSA-8x8f-54wf-vv92, critical severity)
OpenAI Pauses UK Stargate Project Over Regulatory and Energy Concerns: OpenAI halted its planned deployment of up to 8,000 GPUs (specialized hardware for training and running AI models) in the U.K., citing high industrial energy costs and uncertainty around new regulations governing AI's use of copyrighted material. This is a significant setback for Britain's strategy to position itself as an AI development leader.
Critical RCE in PraisonAIAgents Multi-Agent System: PraisonAIAgents versions before 1.5.128 contain a critical vulnerability where user-controlled commands pass directly to subprocess.run() with shell=True, allowing attackers to inject shell metacharacters (special characters like pipes and semicolons that the shell interprets as instructions) and execute arbitrary code. Attackers who gain file-write access through prompt injection (tricking an AI by hiding malicious instructions in its input) can modify configuration files to achieve automatic code execution. (CVE-2026-40111)
Cursor AI Vulnerability Chain Exposes Developer Devices: A security flaw in Cursor AI could allow attackers to gain shell access (the ability to run commands on a computer) to developer machines by chaining indirect prompt injection (hiding malicious instructions in data the AI reads), sandbox bypass (escaping the restricted environment), and Cursor's remote tunnel feature.
Critical Marimo Vulnerability Exploited Within 10 Hours of Disclosure: Attackers began exploiting CVE-2026-39987, a critical RCE (remote code execution, where attackers can run commands on systems they don't own) in the Marimo Python notebook tool, deploying NKAbuse malware from Hugging Face Spaces. The malware steals credentials and enables remote system control, with attackers using fake application names to trick users into downloading it.
New ATHR Platform Automates Vishing Attacks with AI Voice Agents: ATHR is a cybercrime platform that combines AI voice agents and human operators to automate vishing attacks (voice phishing, tricking people into revealing passwords over the phone), targeting credentials from Google, Microsoft, and other services. The platform handles the complete attack chain from fake security emails to AI-driven impersonation calls that extract verification codes, significantly lowering the technical skill required to launch voice phishing campaigns.
Critical Nginx UI Authentication Bypass Under Active Exploitation: A critical authentication bypass flaw in Nginx UI (CVE-2026-33032) exposes an unprotected /mcp_message endpoint that allows unauthenticated attackers to invoke privileged actions and fully compromise web servers by modifying configuration files. The vulnerability, added to support Model Context Protocol (MCP, a system that lets web servers communicate with AI models), is being actively exploited in the wild with over 2,600 publicly exposed instances at risk.
Microsoft and Salesforce Patch Critical Prompt Injection Data Exfiltration Flaws: Microsoft Copilot Studio and Salesforce Agentforce both fixed prompt injection vulnerabilities (attacks where malicious instructions are hidden in user input to trick AI systems) that allowed attackers to override agent instructions and exfiltrate sensitive customer data including names, addresses, and phone numbers to external servers. The flaws exploited the inability of these AI agents to distinguish between trusted system instructions and untrusted user input.
Real-World AI Agent Attacks Documented in OWASP Q1 2026 Report: An OWASP report covering Q1 2026 documents a shift from theoretical to operational AI exploits, including a Mexican government breach where attackers used Claude to automate reconnaissance and exploitation, compromising 150 GB of sensitive data, alongside multiple incidents involving prompt injection (tricking AI by hiding malicious instructions in its input), privilege abuse, and supply-chain attacks targeting AI agent identities and permissions.
Arbitrary Code Execution in OpenAI Codex CLI: CVE-2025-61260 affects OpenAI Codex CLI v0.23.0 and earlier, allowing attackers to execute arbitrary code by planting malicious configuration files (.env and .codex/config.toml) in repositories. When users run the codex command in a compromised repository, the tool automatically loads these files without permission prompts, triggering embedded attacker commands.
Security Urgency as AI Models Compress Exploit Timelines: AI models are accelerating cyberattacks by dramatically shortening the window between vulnerability discovery and exploitation, with some intrusions now occurring in under 30 seconds. Industry analysts warn that defenders face a limited window to operationalize AI defenses before attackers gain a decisive speed advantage.
Violent Attack on OpenAI CEO Highlights AI Safety Tensions: An individual threw a Molotov cocktail at Sam Altman's residence and later attacked OpenAI headquarters, motivated by beliefs that AI poses an existential threat to humanity. Altman responded by calling for reduced hostile rhetoric in the AI industry.
Anthropic's Mythos Model Autonomously Discovers and Exploits Vulnerabilities: Anthropic released Claude Mythos Preview, an AI model capable of autonomously finding complex vulnerabilities by chaining multiple bugs and writing working exploits without human assistance. The company is withholding public access while running Project Glasswing to identify and patch vulnerabilities before attackers can exploit them, though experts warn the defender advantage will shrink as similar capabilities proliferate.
OpenAI Expands Policy Influence Efforts: OpenAI opened a Washington DC office with space for non-profits and policymakers and is funding policy papers and think tanks as part of a broader industry strategy to improve AI's public image amid growing disapproval in polls.
LiteLLM Remote Code Execution via Guardrails Endpoint: LiteLLM versions through April 8, 2026 contain a vulnerability allowing remote attackers to execute arbitrary code through bytecode rewriting (modifying compiled code) at the /guardrails/test_custom_code endpoint, potentially enabling internet-based attackers to take control of affected systems. (CVE-2026-40217, high severity)
AI Browser Extensions Present Enterprise Blind Spot: AI browser extensions are 60% more likely to have known vulnerabilities, 3 times more likely to access cookies, and 6 times more likely to increase permissions over time compared to regular extensions, yet they bypass traditional security monitoring tools like DLP (data loss prevention, which blocks sensitive information from leaving networks) while 99% of enterprise users have at least one installed with minimal organizational visibility.
PraisonAI WebSocket Endpoint Enables Unauthorized API Credit Drain: PraisonAI versions before 4.5.128 have an unauthenticated /media-stream WebSocket endpoint (a connection protocol for real-time communication) that automatically opens OpenAI API sessions using the server's credentials. With no connection or message limits, attackers can exhaust server resources and deplete victims' OpenAI API credits. (CVE-2026-40116)
LangChain F-String Template Validation Bypass Allows Code Execution: LangChain versions before 0.3.84 and 1.2.28 failed to properly validate f-string templates (a Python feature for inserting variables into text), allowing dangerous expressions including attribute access and nested replacement fields hidden in format specifiers that should have been blocked. This could enable attackers to access unintended data or execute unwanted code. (CVE-2026-40087)