aisecwatch.com
DashboardVulnerabilitiesNewsResearchArchiveStatsDataset
aisecwatch.com

Real-time AI security monitoring. Tracking AI-related vulnerabilities, safety and security incidents, privacy risks, research developments, and policy changes.

Navigation

VulnerabilitiesNewsResearchDigest ArchiveNewsletter ArchiveSubscribeData SourcesStatisticsDatasetAPIIntegrationsWidgetRSS Feed

Maintained by

Truong (Jack) Luu

Information Systems Researcher

Industry News

New tools, products, platforms, funding rounds, and company developments in AI security.

to
Export CSV
1275 items

The Cost of Being Wordy: Detecting Resource-Draining Prompts

infonews
securityresearch
Jun 17, 2025

Attackers can exploit large language models (LLMs) through "sponge attacks," which are denial of service (DoS) attacks that craft prompts designed to generate extremely long outputs, exhausting the model's resources and degrading performance. Researchers are developing methods to predict how long an LLM's response will be based on a given prompt, creating an early warning system to detect and prevent these resource-draining attacks.

Protect AI Blog

AI Safety Newsletter #57: The RAISE Act

inforegulatory
policy
Jun 17, 2025

New York's legislature passed the RAISE Act (Responsible AI Safety and Education Act), which would regulate frontier AI systems (the largest, most powerful AI models) if signed into law. The act requires developers of expensive AI models to publish safety plans, withhold unreasonably risky models from release, report safety incidents within 72 hours, and face penalties up to $10 million for violations.

Why Join the EU AI Scientific Panel?

inforegulatory
policy
Jun 16, 2025

The European Commission is recruiting up to 60 independent experts for a scientific panel to advise on general-purpose AI (GPAI, large AI models designed for many tasks) under the EU AI Act. The panel will assess systemic risks (widespread dangers affecting multiple countries or many users), classify AI models, and issue alerts when AI systems pose significant dangers to Europe. Applicants need a PhD in a relevant field, proven AI research experience, and independence from AI companies, with the deadline set for September 14th.

Security Spotlight: AppSec to AI, a Security Engineer's Journey

infonews
securityresearch

Hosting COM Servers with an MCP Server

mediumnews
security
Jun 9, 2025

The mcp-com-server is a tool that connects the Model Context Protocol (MCP, a standard for AI systems to interact with external tools) to COM (Component Object Model, Microsoft's decades-old system for sharing functionality across programs on Windows). This allows an AI like Claude to automate Windows and Office tasks, such as creating Excel files and sending emails, by dynamically discovering and controlling COM objects. The main security risk is that COM can access dangerous operations like file system access, so the server uses an allowlist (a list of approved COM objects that are permitted to run) to restrict which COM objects can be instantiated.

Balancing Velocity and Vulnerability with llamafile

infonews
securitysafety

Security Spotlight: Securing Cloud & AI Products with Guardrails

infonews
securitysafety

AI Safety Newsletter #56: Google Releases Veo 3

infonews
safetyindustry

AI ClickFix: Hijacking Computer-Use Agents Using ClickFix

infonews
securitysafety

AI Literacy Programs in Europe – Supporting Article 4 of the EU AI Act

inforegulatory
policy
May 23, 2025

This article describes a curated database of AI literacy training programs across Europe designed to help organizations and professionals comply with Article 4 of the EU AI Act (a regulation requiring organizations to build employee understanding of AI). The programs are selected based on whether they teach what AI is, its risks and benefits, and how to use it responsibly in the workplace.

Assessing the Security of 4 Popular AI Reasoning Models

infonews
securitysafety

AI Safety Newsletter #55: Trump Administration Rescinds AI Diffusion Rule, Allows Chip Sales to Gulf States

infonews
policyindustry

Specialized Models Beat Single LLMs for AI Security

infonews
securityresearch

AI Safety Newsletter #54: OpenAI Updates Restructure Plan

inforegulatory
policysafety

How ChatGPT Remembers You: A Deep Dive into Its Memory and Chat History Features

mediumnews
securityprivacy

MCP: Untrusted Servers and Confused Clients, Plus a Sneaky Exploit

infonews
securityresearch

AI Regulatory Sandbox Approaches: EU Member State Overview

inforegulatory
policy
May 2, 2025

AI regulatory sandboxes are controlled testing environments where companies can develop and test AI systems with guidance from regulators before releasing them to the public, as required by the EU AI Act (EU's new rules for artificial intelligence). These sandboxes help companies understand what regulations they must follow, protect them from fines if they follow official guidance, and make it easier for small startups to enter the market. Each EU Member State must create at least one sandbox by August 2, 2026, though different countries are taking different approaches to organizing them.

AI Safety Newsletter #53: An Open Letter Attempts to Block OpenAI Restructuring

inforegulatory
policy
Apr 29, 2025

Former OpenAI employees and experts published an open letter asking California and Delaware officials to block OpenAI's restructuring from a nonprofit organization into a for-profit company (a Public Benefit Corporation, which balances profit with public benefit). The letter argues that the restructuring would eliminate governance safeguards designed to prevent profit motives from influencing decisions about AGI (artificial general intelligence, highly autonomous systems that outperform humans at most economically valuable work), and would shift control away from a nonprofit board accountable to the public toward a board partly accountable to shareholders.

Providers of General-Purpose AI Models — What We Know About Who Will Qualify

inforegulatory
policy
Apr 25, 2025

On April 22, 2025, the European AI Office published preliminary guidelines explaining which companies count as providers of GPAI models (general-purpose AI models, which are AI systems capable of performing many different tasks across various applications). The guidelines cover seven key topics, including defining what a GPAI model is, identifying who qualifies as a provider, handling open-source exemptions, and compliance requirements such as documentation, copyright policies, and security protections for higher-risk models.

AI Safety Newsletter #51: AI Frontiers

infonews
policysafety
Previous54 / 64Next
CAIS AI Safety Newsletter
EU AI Act Updates
Jun 12, 2025

This article compares traditional application security (AppSec) practices with AI security, noting that familiar principles like input validation and authentication apply to both, but AI systems introduce unique risks. New attack types specific to AI, such as prompt injection (tricking an AI by hiding instructions in its input), model poisoning (tampering with training data), and membership inference attacks (determining if specific data was in training), require security engineers to develop new defensive strategies beyond traditional code-level vulnerability management.

Protect AI Blog

Fix: The source explicitly mentions two mitigations: (1) An Allow List for CLSIDs and ProgIDs, where 'the MCP server will instantiate allow listed COM objects' and notes this 'could be expanded to include specific interfaces/methods as well,' and (2) 'Confirmation Dialogs' where 'Claude shows an Allow / Deny button before invoking custom tools by default' to 'make sure a human remains in the loop,' though the source notes this 'can be disabled, but also re-enabled in the Claude Settings per MCP tool.'

Embrace The Red
Jun 4, 2025

This content is a collection of blog post titles and announcements from Palo Alto Networks about AI security, covering topics like agentic AI (AI systems that can autonomously take actions), container security, and operational technology (OT, the systems that control physical infrastructure) security. The posts discuss vulnerabilities in autonomous AI systems, the need for contextual red teaming (security testing tailored to specific use cases), and various security products like Prisma AIRS.

Protect AI Blog
May 28, 2025

This article collection discusses security challenges in AI and cloud systems, particularly focusing on agentic AI (AI systems that can take autonomous actions). Key risks include jailbreaks (tricking AI systems into ignoring safety rules), prompt injection (hidden malicious instructions in AI inputs), and tool misuse by autonomous agents, which require contextual red teaming (security testing designed for specific use cases) rather than generic testing to identify real vulnerabilities.

Protect AI Blog
May 28, 2025

Google released Veo 3, a frontier video generation model (an advanced AI system at the cutting edge of technology) that generates both video and audio with high quality and appears to be a marked improvement over existing systems. The model performs well on human preference benchmarks and may represent the point where video generation becomes genuinely useful rather than just a novelty. Additionally, Google announced several other AI improvements at its I/O 2025 conference, including Gemini 2.5 Pro and enhanced reasoning capabilities, while Anthropic released Claude Opus 4 and Claude Sonnet 4 with frontier-level performance.

CAIS AI Safety Newsletter
May 24, 2025

ClickFix is a social engineering technique (a method that tricks people rather than exploiting technical vulnerabilities) that adversaries are adapting to attack computer-use agents (AI systems that can control computers by clicking and typing). The attack works by deceiving users into believing something is broken or needs verification, then tricking them into clicking buttons or running commands that compromise their system.

Embrace The Red
EU AI Act Updates
May 21, 2025

This content discusses security challenges in agentic AI (autonomous AI systems that can take actions independently), emphasizing that traditional jailbreak testing (attempts to trick AI into breaking its rules) misses real operational risks like tool misuse and data theft. The material suggests that contextual red teaming (security testing that simulates realistic attack scenarios in specific business environments) is needed to properly assess vulnerabilities in autonomous AI systems.

Protect AI Blog
May 20, 2025

The Trump Administration cancelled the Biden-era AI Diffusion Rule, which had regulated exports of AI chips and AI models (software trained to perform tasks) to different countries. At the same time, the administration approved major sales of advanced AI chips to the UAE and Saudi Arabia, with deals including up to 500,000 chips per year to the UAE and 18,000 advanced chips to Saudi Arabia.

CAIS AI Safety Newsletter
May 13, 2025

The article argues that using multiple specialized AI security models (each designed to detect specific threats like prompt injection, toxicity, or PII detection) is more effective than using a single large model for all security tasks. Specialized models offer advantages including faster response times to new threats, easier management, better performance, lower costs, and greater resilience because if one model fails, the others can still provide protection.

Protect AI Blog
May 13, 2025

OpenAI announced a restructured plan in May 2025 that aims to preserve nonprofit control over the company's for-profit operations, replacing a December 2024 proposal that had faced criticism. The new plan would convert OpenAI Global LLC into a public-benefit corporation (PBC, a corporate structure designed to balance profit with charitable purpose) where the nonprofit would retain shareholder status and board appointment power, though critics argue this may not preserve the governance safeguards that existed in the original structure.

CAIS AI Safety Newsletter
May 5, 2025

ChatGPT has two memory features: saved memories (which users can manage) and chat history (a newer feature that builds a profile over time without user visibility or control). The chat history feature doesn't search past conversations but maintains recent chat history and learns user preferences, though the implementation details are not publicly documented, and users cannot inspect or modify what the system learns about them unless they use prompt hacking (manipulating the AI's instructions to reveal hidden information).

Embrace The Red
May 2, 2025

The Model Context Protocol (MCP) is a system that lets AI applications discover and use external tools from servers at runtime (while the program is running). However, MCP has a security weakness: because servers can send instructions through the tool descriptions, they can perform prompt injection (tricking an AI by hiding instructions in its input) to control the AI client, making servers more powerful than they should be.

Embrace The Red
EU AI Act Updates
CAIS AI Safety Newsletter
EU AI Act Updates
Apr 15, 2025

The AI Safety Newsletter highlights the launch of AI Frontiers, a new publication featuring expert commentary on critical AI challenges including national security risks, resource access inequality, risk management approaches, and governance of autonomous systems (AI agents that can make decisions without human input). The newsletter presents diverse viewpoints on how society should navigate AI's wide-ranging impacts on jobs, health, and security.

CAIS AI Safety Newsletter