aisecwatch.com
DashboardVulnerabilitiesNewsResearchArchiveStatsDatasetFor devs
Subscribe
aisecwatch.com

Real-time AI security monitoring. Tracking AI-related vulnerabilities, safety and security incidents, privacy risks, research developments, and policy changes.

Navigation

VulnerabilitiesNewsResearchDigest ArchiveNewsletter ArchiveSubscribeData SourcesStatisticsDatasetAPIIntegrationsWidgetRSS Feed

Maintained by

Truong (Jack) Luu

Information Systems Researcher

Digest Archive

Daily BriefingThursday, May 28, 2026
>

Google Launches AI Threat Defense Platform Against Autonomous Attacks: Google Cloud announced AI Threat Defense, a security platform that uses AI to autonomously detect and stop AI-powered cyberattacks, combining threat intelligence with code repair tools powered by Gemini to deploy fixes within minutes instead of days. The platform addresses a growing gap where agentic attacks (cyberattacks powered by AI that learn and adapt autonomously) move faster than human security teams can respond.

>

Russia-Linked GreyVibe Exploits ChatGPT and Gemini Across Full Attack Lifecycle: A Russia-linked threat group called GreyVibe has been targeting Ukrainian military and government entities since August 2025 using ChatGPT, Gemini, and other AI tools to generate phishing lures (fake websites and emails impersonating legitimate organizations) and likely develop custom malware including LegionRelay remote access trojan and FallSpy Android spyware. While less sophisticated than elite state actors, the group demonstrates how AI tools allow attackers to operate well above their baseline technical skill level.

>

Critical RCE in vLLM Resurfaces Despite Earlier Patch Attempts: vLLM version 0.14.1 contains a vulnerability where the `trust_remote_code` parameter remains hardcoded to True in two model files even when users attempt to disable it, allowing RCE (remote code execution, where attackers can run harmful code on your system) through malicious models downloaded from HuggingFace. This represents a partial fix attempt for two earlier vulnerabilities that failed to fully resolve the issue. (CVE-2026-4944, high severity)

>

Nono Sandbox Escape Enables AI Agent Breakout via D-Bus: A sandbox escape vulnerability in nono, a sandboxing tool using Landlock and seccomp (Linux security features that restrict program capabilities), allows processes to break containment by communicating with systemd D-Bus sockets used for inter-process communication. An AI agent or untrusted tool with bash access could exploit this to write files or execute commands outside the sandbox with full user permissions. (CVE-2026-47128, medium severity)

Daily BriefingWednesday, May 27, 2026
>

Anthropic Ships Code Sandbox and Security Plugin for Claude: Anthropic released a self-hosted sandbox allowing Claude Managed Agents (AI systems that perform tasks autonomously) to execute code in user-controlled environments, plus a security guidance plugin that scans for vulnerabilities as developers write code, reducing security issues by 30-40% in internal testing.

>

Critical RCE Vulnerabilities Hit AI Agent Frameworks: IBM Langflow OSS versions 1.0.0 through 1.9.1 contain a path traversal flaw (CVE-2026-7524) enabling RCE (remote code execution, where attackers run commands on systems they don't own) through improper symbolic link validation, while Langroid pre-0.63.0 allows prompt injection (hiding malicious instructions in input data) to trigger SQL commands that achieve RCE on database servers (CVE-2026-25879).

Daily BriefingTuesday, May 26, 2026
>

Microsoft Copilot Cowork Data Exfiltration Flaw: Microsoft Copilot Cowork had a vulnerability where attackers could use prompt injection (tricking the AI by hiding instructions in its input) to make agents send unapproved emails containing OneDrive download links, enabling file theft through external images that leak data when opened.

>

Critical GitLab MCP Server Authentication Bypass: GitLab MCP Server (a tool that lets AI agents interact with GitLab) exposed an unauthenticated endpoint accessible from any network before version 0.6.0, allowing remote attackers to modify repositories using the server operator's stored credentials. (CVE-2026-44895, critical severity)

>
Daily BriefingMonday, May 25, 2026
>

Anthropic's Mythos AI Uncovers 23,000 Software Vulnerabilities in One Month: Anthropic's Claude Mythos, an AI model trained for code analysis and cyberattack development, identified over 23,000 potential vulnerabilities across 1,000+ open source projects through Project Glasswing, with 1,726 confirmed and over 1,000 rated high or critical severity. The company is now preparing to release Mythos publicly with guardrails (safety restrictions built into the model) after initially withholding it due to security concerns, while maintainers struggle to patch the flood of discovered bugs faster than AI can find them.

>

Researchers Call for System-Level Security Controls for AI Agents: A new analysis argues that securing AI agents requires treating models as fundamentally untrusted components and enforcing security at the system level, similar to how operating systems sandbox processes. The researchers found that all eleven real-world AI agent attacks they examined violated the secure information flow principle (controlling how sensitive data moves through a system), and recommend applying traditional security principles like least privilege and complete mediation rather than relying on model robustness.

Daily BriefingSunday, May 24, 2026
>

OpenAI Partners with Major Brazilian Publishers for ChatGPT Content: OpenAI has integrated journalism from Folha de S.Paulo and Grupo UOL into ChatGPT, allowing 900 million weekly users to access attributed summaries and articles with links to original reporting. This marks another step in OpenAI's strategy to embed trusted news sources directly into AI-powered interfaces.

>

Evolving Jailbreak Techniques Target Chatbot Personality Systems: Attackers are developing more sophisticated methods to exploit chatbot personalities and bypass safety guardrails (content filters and behavioral constraints built into AI systems), moving beyond simple prompt-based jailbreaks (social engineering attacks that trick AI into ignoring safety rules) that required no technical skill.

Daily BriefingSaturday, May 23, 2026
>

Anthropic's Claude Mythos Surfaces Over 10,000 Critical Vulnerabilities: Project Glasswing, powered by Anthropic's Claude Mythos Preview model, has automatically discovered more than 10,000 high-severity flaws in widely-used software since launching last month, with 97 already patched and 88 advisories published. The company acknowledges that automated discovery dramatically outpaces remediation capacity, creating a significant backlog challenge for defenders.

>

Trump Administration Abandons Pre-Release AI Safety Reviews: President Trump scrapped a planned mandate requiring government safety reviews of new AI models before public release, pivoting hours before signing to prioritize American competitiveness over security vetting. The reversal signals a regulatory approach favoring speed and competition with China despite expert warnings about potential security risks.

Daily BriefingFriday, May 22, 2026
>

Microsoft Embeds Agentic AI in Enterprise Browser with Data Controls: Microsoft is testing AI agents (systems that can autonomously complete multi-step tasks) in Edge for Business to automate routine work like form filling and data gathering, with security features that keep AI prompts within the company's Microsoft 365 tenant (private cloud environment), block sensitive data uploads, and enable corporate auditing.

>

Critical Unauthorized Command Execution in Kiro CLI (CVE-2026-9255): Kiro CLI, a tool that lets developers use AI to run code and shell commands, contains a high-severity flaw where it fails to validate input sources before authorizing tool execution, allowing an attacker on the same machine to trigger arbitrary commands by sending crafted data through stdin (the standard input stream).

Daily BriefingThursday, May 21, 2026
>

Microsoft Releases Open-Source Tools for AI Agent Safety: Microsoft launched Rampart and Clarity, two open-source tools designed to identify safety issues in AI agents (software systems that act autonomously) during development. Rampart automates repeated testing for threats like prompt injection (tricking an AI by hiding instructions in its input) and unsafe tool use, while Clarity helps engineers document and validate design assumptions before building.

>

Critical Privilege Escalation in LiteLLM Proxy: LiteLLM versions before 1.83.10 contain a vulnerability (CVE-2026-47102) allowing users to elevate their role to proxy_admin through the /user/update endpoint, granting full system control including access to all users, teams, and API keys. Even users with org_admin privileges can exploit this flaw without chaining additional attacks.

Daily BriefingWednesday, May 20, 2026
>

NVIDIA Triton Inference Server Hit by Multiple Critical Vulnerabilities: NVIDIA disclosed several high and critical severity flaws in Triton Inference Server, including two authentication bypass vulnerabilities (CVE-2026-24207, CVE-2026-24206) that could allow attackers to execute code, escalate privileges, or steal data, plus additional flaws enabling integer overflows, out-of-bounds reads, and path traversal attacks across the DALI backend and core server components.

>

GitHub Confirms Major Internal Source Code Breach via Malicious VS Code Extension: GitHub disclosed that attackers compromised an employee's device through a poisoned VS Code extension (a malicious add-on for the code editor), leading to theft of code from approximately 3,800 internal repositories. The breach was contained quickly with no confirmed customer data exposure.

Daily BriefingTuesday, May 19, 2026
>

Guardrails-ai Supply Chain Attack Detected and Neutralized Within Hours: A malicious version of guardrails-ai (a library for validating LLM outputs) was published to PyPI but removed within two hours with no evidence of data theft, demonstrating both the risk of supply chain attacks (where attackers corrupt widely-used packages) and the effectiveness of rapid detection. (CVE-2026-45758)

>

Critical RCE in MLflow Assistant Exploits Cross-Origin Validation Flaw: MLflow 3.9.0 contains a critical vulnerability where improper origin validation in /ajax-api endpoints allows attackers to execute arbitrary commands through the Claude Code sub-agent by sending cross-origin requests from a malicious webpage. (CVE-2026-2611)

>
Newer4 / 14Older
>

Starlette Authentication Bypass Exposes FastAPI-Based AI Tools: A flaw in Starlette (CVE-2026-48710), the framework powering FastAPI, allows attackers to bypass authentication by sending malformed Host header characters, tricking the framework into parsing request paths differently than the server and potentially enabling SSRF (server-side request forgery, making the server request data from unintended locations) or RCE on affected systems.

>

AI Safety Benchmarks Fail Against Multi-Turn Attacks: Cisco research reveals that popular AI models from OpenAI, Anthropic, and Google are far more vulnerable to iterative attacks using role-playing and gradual escalation across conversation turns than single-prompt safety benchmarks suggest, as current standardized tests fail to capture real-world adversarial techniques.

NVIDIA Transformers4Rec Deserialization Vulnerability: NVIDIA Transformers4Rec for Linux contains a high-severity flaw where attackers can exploit improper deserialization (unsafe processing of stored data formats) of untrusted data to achieve code execution, data tampering, and information disclosure. (CVE-2026-24162)

>

GitHub Megalodon Campaign Compromises 5,500 Repositories: Attackers used stolen credentials to inject malicious commits into over 5,500 GitHub repositories, modifying GitHub Actions workflows to exfiltrate cloud credentials and SSH keys through base64-encoded bash payloads disguised as routine maintenance activity.

>

India Mandates 12-Hour Patching Window for Critical Flaws: CERT-In issued guidelines requiring organizations to patch critical internet-facing vulnerabilities within 12 hours, citing evidence that attackers now use LLMs (large language models, AI systems trained on large amounts of text) to automate vulnerability discovery and exploitation faster than traditional methods.

>

AI-Powered Bug Discovery Outpaces Patching Capacity: The surge in AI-generated vulnerability reports is overwhelming bug bounty programs and software maintainers, creating a cybersecurity imbalance where finding bugs is now far easier than fixing them. This shift is forcing companies to reconsider the traditional 90-day responsible disclosure window (the agreed-upon time between finding a bug and publicly revealing it) and accelerate patch (fix) deployment timelines.

>

Scotland's Data Center Policy Misaligned with AI Energy Demands: Scotland's 2022 green data center policy, designed before the ChatGPT era, fails to account for the substantial carbon emissions generated by AI workloads, raising concerns about the environmental impact of the country's AI investment attraction strategy.

>

Google's Gemini Now Generates Realistic Deepfake Video from Any Input: Google's latest Gemini model demonstrates effective video generation from simple prompts, with experiments producing convincing synthetic media (deepfakes, or AI-generated videos designed to appear authentic) of everyday objects. The accessibility and quality of these tools intensify concerns about distinguishing legitimate content from misleading AI-generated media.

>

First Fully AI-Driven Multi-Agency Breach Documented in Mexico: Between December 2025 and February 2026, a single attacker used AI as the core attack tool (not just an assistant) to compromise nine Mexican government agencies, exfiltrating tax records, civil registry data, patient files, and electoral system information in what researchers call a shift from experimental to operational AI-powered intrusions.

>

Google Integrates Auto-Remediation Agent into Governed Platform: Google is folding CodeMender, an AI agent that autonomously discovers and patches software vulnerabilities using Gemini reasoning models (AI capable of complex problem-solving), into its broader Agent Platform rather than offering it standalone, signaling enterprise demand for security automation within centralized identity and monitoring frameworks.

>

Multiple Critical Vulnerabilities in AI Infrastructure Sandboxes: BoxLite, a sandbox service for running untrusted code in isolated virtual machines, has two critical flaws that bypass security protections. CVE-2026-46695 allows malicious code to remount read-only directories as read-write due to unenforced restrictions, while CVE-2026-46703 enables path traversal (accessing files outside intended boundaries) through unvalidated symlinks during container image extraction, potentially leading to remote code execution on the host system.

>

LMDeploy Hardcoded Setting Enables Arbitrary Code Execution: LMDeploy version 0.12.3 and earlier hardcodes `trust_remote_code=True` when loading models from HuggingFace, allowing arbitrary code execution (running any commands) if an attacker can control the model path. The setting permits executing custom Python code from downloaded models, giving attackers the privileges of the LMDeploy server process (CVE-2026-46432).

>

OpenAI Progress on 80-Year-Old Mathematics Problem: OpenAI's AI model disproved the long-standing assumption that square grids provide the optimal solution to the planar unit distance problem by discovering new mathematical arrangements that perform better. While mathematicians validated the work, humans were significantly involved in refining the AI's original proof.

>

Anthropic Quietly Patches Claude Code Sandbox Escape Vulnerability: Anthropic silently fixed a SOCKS5 hostname null-byte injection vulnerability in Claude Code's network sandbox (a restricted environment controlling where the AI can send data) that could have allowed attackers to bypass security controls and exfiltrate sensitive information, releasing the patch in version 2.1.88 on March 31 without public disclosure or CVE assignment.

>

Microsoft Open-Sources RAMPART and Clarity for AI Agent Security Testing: Microsoft released two open-source tools to help developers secure AI agents during development: RAMPART, a testing framework for finding vulnerabilities like cross-prompt injections (when untrusted data reaches an AI indirectly through sources like emails) and data exfiltration (unauthorized data leakage), and Clarity, a planning tool that guides security-focused design decisions before coding begins.

>

Diffusers Library Vulnerable to TOCTOU Remote Code Execution Bypass: The `diffusers` package contains a time-of-check-time-of-use (TOCTOU, where verification happens at one moment but actual data comes from a different moment) vulnerability in its model loading function that allows attackers to bypass the `trust_remote_code` security check by updating a HuggingFace repository between two download calls, enabling arbitrary code execution without user approval (CVE-2026-45804).

Multiple MCP Servers Exposed to RCE via Authentication Failures: Several Model Context Protocol (MCP, a standard for AI tool integration) implementations shipped with critical flaws, including 9router's 40+ unprotected API endpoints allowing unauthenticated command execution, PenPot's REPL server binding to 0.0.0.0 with no authentication on its /execute endpoint (CVE-2026-45805), and auth-fetch-mcp's unvalidated URL handling enabling SSRF (server-side request forgery, tricking servers into making unintended requests) and disk exfiltration.

>

OpenAI and Google Deploy Content Provenance Standards for Generated Media: OpenAI is implementing C2PA conformance (a cryptographic metadata standard for content authentication), partnering with Google to embed invisible SynthID watermarks in images, and releasing public verification tools, while Google simultaneously integrates deepfake detection into Chrome and Search to identify manipulated content.

>

Anthropic Tops CNBC Disruptor List as Industry Competition Intensifies: Anthropic ranked first on CNBC's 2026 Disruptor 50 list with reported 80x revenue growth, while separately facing a federal lawsuit after the DOD blacklisted it as a "supply chain risk" requiring defense contractors to stop using Claude models, a designation that judges questioned as potential "spectacular overreach."