aisecwatch.com
DashboardVulnerabilitiesNewsResearchArchiveStatsDatasetFor devs
Subscribe
aisecwatch.com

Real-time AI security monitoring. Tracking AI-related vulnerabilities, safety and security incidents, privacy risks, research developments, and policy changes.

Navigation

VulnerabilitiesNewsResearchDigest ArchiveNewsletter ArchiveSubscribeData SourcesStatisticsDatasetAPIIntegrationsWidgetRSS Feed

Maintained by

Truong (Jack) Luu

Information Systems Researcher

AI Sec Watch

The security intelligence platform for AI teams

AI security threats move fast and get buried under hype and noise. Built by an Information Systems Security researcher to help security teams and developers stay ahead of vulnerabilities, privacy incidents, safety research, and policy developments.

Independent research. No sponsors, no paywalls, no conflicts of interest.

[TOTAL_TRACKED]
3,710
[LAST_24H]
1
[LAST_7D]
1
Daily BriefingSaturday, May 16, 2026

No new AI/LLM security issues were identified today.

Latest Intel

page 101/371
VIEW ALL
01

There are more AI health tools than ever—but how well do they work?

safetyindustry
Mar 30, 2026

Major tech companies including Microsoft, Amazon, and OpenAI have recently released AI health tools that use large language models (LLMs, AI systems trained on massive amounts of text to generate human-like responses) to answer medical questions and access user health records. While these tools are in high demand because many people struggle to access traditional healthcare, researchers emphasize that these products should be independently evaluated by outside experts before wide release, rather than relying solely on companies' own evaluations.

MIT Technology Review
02

Addressing the OWASP Top 10 Risks in Agentic AI with Microsoft Copilot Studio

securitypolicy
Mar 30, 2026

Agentic AI systems (autonomous AI that can retrieve data, invoke tools, and take actions using real permissions) are moving into production, but they introduce unique security risks because failures aren't limited to a single response—they can trigger automated sequences of actions with real-world consequences. The OWASP Top 10 for Agentic Applications (2026) identifies ten key risks in these systems, such as goal hijacking (where an agent's objectives are redirected through injected instructions) and tool misuse (where legitimate tools are exploited through unsafe chaining or ambiguous instructions).

Microsoft Security Blog
03

The Pentagon’s culture war tactic against Anthropic has backfired

policy
Mar 30, 2026

The Pentagon tried to punish AI company Anthropic by labeling it a supply chain risk (a designation that restricts who can do business with the government) after disagreements over a direct contract, but a California judge blocked this action. The judge found that the government's actions violated proper procedures and were really an attempt to punish Anthropic's ideology rather than address legitimate security concerns, with senior officials making public posts about the dispute before following legal processes.

MIT Technology Review
04

Okta’s CEO is betting big on AI agent identity

industrysafety
Mar 30, 2026

Okta, a company that manages login and security across business applications, is facing pressure from AI tools that could let companies build their own management systems instead of paying for Okta's service. CEO Todd McKinnon says the company is responding by adopting AI and LLMs (large language models, which are AI systems trained on massive amounts of text) to stay competitive and secure, and is focusing on a new opportunity: managing the identity and access of AI agents (automated AI systems that can take actions on their own) within corporations, not just human employees.

The Verge (AI)
05

Silent Drift: How LLMs Are Quietly Breaking Organizational Access Control

securitysafety
Mar 30, 2026

Large language models (LLMs, AI systems trained on massive amounts of text) can quickly generate complex access control code in languages like Rego and Cedar, but even small errors, such as a missing condition or a made-up attribute (hallucination, when an AI invents false information), can accidentally weaken an organization's least-privilege security model (a system where users get only the minimum permissions they need).

SecurityWeek
06

⚡ Weekly Recap: Telecom Sleeper Cells, LLM Jailbreaks, Apple Forces U.K. Age Checks and More

security
Mar 30, 2026

A critical flaw in Citrix NetScaler ADC and NetScaler Gateway (CVE-2026-3055, a CVSS score of 9.3 measuring severity on a 0-10 scale) is being actively exploited to leak sensitive information through insufficient input validation, a failure to properly check data before processing it. The vulnerability only affects systems configured as SAML Identity Providers (SAML IDPs, which are services that verify user identities). Additionally, a Chinese state-sponsored group called Red Menshen deployed stealthy kernel implants called BPFDoor deep in telecom networks worldwide to secretly monitor traffic without being detected.

Fix: Rapid7 has released a scanning script designed to detect known BPFDoor variants across Linux environments.

The Hacker News
07

PromptGuard: Soft Prompt-Guided Unsafe Content Moderation for Text-to-Image Models

safetyresearch
Mar 30, 2026

Text-to-image models (AI systems that generate pictures from written descriptions) can be misused to create unsafe content like sexually explicit or violent images. PromptGuard is a new safety technique that uses a soft prompt (a special text input optimized for safety that works within the model's internal text processing layer) to moderate unsafe requests and prevent the generation of such content while still producing high-quality normal images.

Fix: The source describes PromptGuard as the solution itself rather than a patch or update. The technique works by optimizing a safety soft prompt that functions as an implicit system prompt within the text-to-image model's embedding space, with a divide-and-conquer strategy that optimizes category-specific soft prompts and combines them into holistic safety guidance. Code and dataset are available at https://t2i-promptguard.github.io/

IEEE Xplore (Security & AI Journals)
08

Differentially Private Zeroth-Order Methods for Scalable Large Language Model Fine-Tuning

researchprivacy
Mar 30, 2026

This research proposes new methods for fine-tuning (customizing a trained AI model for specific tasks) large language models while protecting sensitive data using differential privacy (a technique that adds noise to data to prevent identifying individuals). The paper introduces DP-ZOSO and DP-ZOPO, which use zeroth-order gradient approximation (estimating how to improve the model without calculating exact mathematical directions) instead of traditional methods, making the process faster and more scalable while maintaining privacy protection.

IEEE Xplore (Security & AI Journals)
09

Rethinking Frequency Modeling: Tail-Aware Dynamic Adversarial Training for Long-Tailed Robustness

researchsafety
Mar 30, 2026

This research addresses a problem where adversarial training (a method to make AI models resistant to adversarial attacks, which are carefully crafted inputs designed to fool the model) works poorly when training data is imbalanced, meaning some classes have many examples while others have very few. The authors propose Tail-Aware Dynamic Adversarial Training (TAD-AT), which improves robustness by adjusting the training loss, attack strategy, and weight averaging to account for which classes are most vulnerable to attacks, rather than just how many examples exist per class.

Fix: The proposed mitigation is Tail-Aware Dynamic Adversarial Training (TAD-AT), which consists of three components: (1) a training loss that incorporates frequency- and accuracy-aware regularization to emphasize learning for vulnerable classes, (2) an attack that adjusts perturbations based on class-wise vulnerability to encourage robust feature learning, and (3) a weight average that adaptively controls the decay rate across classes to improve robust generalization and training stability. Code is available at https://github.com/bookman233/TADAT.

IEEE Xplore (Security & AI Journals)
10

When AI Trust Breaks: The ChatGPT Data Leakage Flaw That Redefined AI Vendor Security Trust

securityprivacy
Mar 30, 2026

Researchers discovered a vulnerability in ChatGPT that could leak sensitive user data (like medical records, financial information, and internal documents) from conversations without the user's knowledge or permission. Although OpenAI has since fixed the issue, the discovery highlights an important lesson: AI tools should not be automatically trusted to be secure just because they are popular or widely used.

Check Point Research
Prev1...99100101102103...371Next