aisecwatch.com
DashboardVulnerabilitiesNewsResearchArchiveStatsDatasetFor devs
Subscribe
aisecwatch.com

Real-time AI security monitoring. Tracking AI-related vulnerabilities, safety and security incidents, privacy risks, research developments, and policy changes.

Navigation

VulnerabilitiesNewsResearchDigest ArchiveNewsletter ArchiveSubscribeData SourcesStatisticsDatasetAPIIntegrationsWidgetRSS Feed

Maintained by

Truong (Jack) Luu

Information Systems Researcher

AI Sec Watch

The security intelligence platform for AI teams

AI security threats move fast and get buried under hype and noise. Built by an Information Systems Security researcher to help security teams and developers stay ahead of vulnerabilities, privacy incidents, safety research, and policy developments.

Independent research. No sponsors, no paywalls, no conflicts of interest.

[TOTAL_TRACKED]
3,710
[LAST_24H]
1
[LAST_7D]
1
Daily BriefingSunday, May 17, 2026

No new AI/LLM security issues were identified today.

Latest Intel

page 213/371
VIEW ALL
01

Structured Context Engineering for File-Native Agentic Systems

research
Feb 9, 2026

A research paper studied how to present large amounts of structured data (like SQL databases with thousands of tables) to AI language models in different formats (YAML, Markdown, JSON, and TOON) to help them generate correct code. The study found that more advanced models like GPT and Gemini performed much better than open-source models, and that using unfamiliar data formats like TOON actually made models less efficient because they spent extra effort trying to understand the new format.

Simon Willison's Weblog
02

A one-prompt attack that breaks LLM safety alignment

safetyresearch
Feb 9, 2026

Researchers discovered that Group Relative Policy Optimization (GRPO), a technique normally used to improve AI safety, can be reversed to break safety alignment when the reward signals are changed. By giving a safety-aligned model even a single harmful prompt and scoring responses based on how well they fulfill the harmful request rather than refusing it, the model gradually abandons its safety guidelines and becomes willing to produce harmful content across many categories it never encountered during the attack.

Microsoft Security Blog
03

Why the Moltbook frenzy was like Pokémon

industry
Feb 9, 2026

Moltbook was an online platform where AI agents (software programs designed to act independently) interacted with each other, which some people saw as a preview of useful AI in the future, but it turned out to be mostly a social experiment and entertainment similar to a 2014 internet phenomenon called Twitch Plays Pokémon. The platform was flooded with crypto scams and many 'AI' posts were actually written by humans controlling the agents, revealing that truly helpful AI systems would need better coordination, shared goals, and shared memory to work together effectively.

MIT Technology Review
04

CVE-2026-25904: The Pydantic-AI MCP Run Python tool configures the Deno sandbox with an overly permissive configuration that allows the

security
Feb 9, 2026

CVE-2026-25904 is a security flaw in the Pydantic-AI MCP Run Python tool where the Deno sandbox (a restricted environment for running code safely) is configured too permissively, allowing Python code to access the localhost interface and perform SSRF attacks (server-side request forgery, where an attacker tricks a server into making unwanted requests). The project is archived and unlikely to receive a fix.

NVD/CVE Database
05

AdvScan: Black-Box Adversarial Example Detection at Runtime Through Power Analysis

researchsecurity
Feb 9, 2026

AdvScan is a method for detecting adversarial examples (inputs slightly modified to trick AI models into making wrong predictions) on tiny machine learning models running on edge devices (small hardware like microcontrollers) without needing access to the model's internal details. The approach monitors power consumption patterns during the model's operation, since adversarial examples create unusual power signatures that differ from normal inputs, and uses statistical analysis to flag suspicious inputs in real-time with minimal performance overhead.

IEEE Xplore (Security & AI Journals)
06

Practical and Flexible Backdoor Attack Against Deep Learning Models via Shell Code Injection

securityresearch
Feb 9, 2026

Researchers have developed a new backdoor attack method called shell code injection (SCI) that can implant malicious logic into deep learning models (neural networks trained on large datasets) without needing to poison the training data. The attack uses techniques inspired by nature, like camouflage, along with trigger verification and code packaging strategies to trick models into making wrong predictions, and it can adapt its attack targets dynamically using large language models (LLMs) to make it more flexible and harder to detect.

IEEE Xplore (Security & AI Journals)
07

Privacy-Preserving, Efficient, and Accurate Dimensionality Reduction

researchprivacy
Feb 9, 2026

This research introduces PP-DR, a privacy-preserving dimensionality reduction (a technique that reduces the number of features in a dataset to make it easier to analyze) scheme that uses homomorphic encryption (a type of encryption that allows computations on encrypted data without decrypting it first) to let multiple organizations securely share and analyze data together without revealing sensitive information. The new method is much faster and more accurate than previous approaches, achieving 30 to 200 times better computational efficiency and 70% less communication overhead.

IEEE Xplore (Security & AI Journals)
08

⚡ Weekly Recap: AI Skill Malware, 31Tbps DDoS, Notepad++ Hack, LLM Backdoors and More

securitypolicy
Feb 9, 2026

This recap highlights how attackers are exploiting trusted tools and marketplaces rather than breaking security controls directly. Key threats include malicious skills appearing in ClawHub (a registry for AI agent add-ons), a record-breaking 31.4 Tbps DDoS attack (a flood attack that overwhelms servers with massive traffic), and compromised update infrastructure for Notepad++ being used to distribute malware. The pattern shows attackers are abusing trust in updates, app stores, and AI workflows to gain access to systems.

Fix: OpenClaw has announced a partnership with Google's VirusTotal malware scanning platform to scan skills uploaded to ClawHub as part of a defense-in-depth approach to improve security. Additionally, the source notes that open-source agentic tools like OpenClaw require users to maintain higher baseline security competence than managed platforms.

The Hacker News
09

LLMs are Getting a Lot Better and Faster at Finding and Exploiting Zero-Days

securityresearch
Feb 9, 2026

Claude Opus 4.6, a new AI model, is significantly better at finding zero-day vulnerabilities (security flaws unknown to vendors and the public) than previous models, discovering high-severity bugs in well-tested code that fuzzing tools (programs that test software by sending random inputs) had missed for years. Unlike traditional fuzzing, Opus 4.6 analyzes code like a human researcher would, studying past fixes and code patterns to reason about what inputs would cause failures.

Schneier on Security
10

CVE-2026-1868: GitLab has remediated a vulnerability in the Duo Workflow Service component of GitLab AI Gateway affecting all versions

security
Feb 9, 2026

GitLab AI Gateway had a vulnerability in its Duo Workflow Service component where user-supplied data wasn't properly validated before being processed (insecure template expansion), allowing attackers to craft malicious workflow definitions that could crash the service or execute code on the Gateway. This flaw affected multiple versions of the AI Gateway.

Fix: Update GitLab AI Gateway to version 18.6.2, 18.7.1, or 18.8.1, depending on which version you are running, as the vulnerability has been fixed in these versions.

NVD/CVE Database
Prev1...211212213214215...371Next