aisecwatch.com
DashboardVulnerabilitiesNewsResearchArchiveStatsDatasetFor devs
Subscribe
aisecwatch.com

Real-time AI security monitoring. Tracking AI-related vulnerabilities, safety and security incidents, privacy risks, research developments, and policy changes.

Navigation

VulnerabilitiesNewsResearchDigest ArchiveNewsletter ArchiveSubscribeData SourcesStatisticsDatasetAPIIntegrationsWidgetRSS Feed

Maintained by

Truong (Jack) Luu

Information Systems Researcher

Browse All

All tracked items across vulnerabilities, news, research, incidents, and regulatory updates.

to
Export CSV
6399 items

Escape Raises $18 Million to Automate Pentesting

infonews
industry
Mar 10, 2026

Escape, a company that uses AI agents (software programs that act autonomously to complete tasks) to automate pentesting (simulated security attacks to find vulnerabilities), has raised $18 million in funding. The company plans to use this money to improve its AI capabilities and expand its teams.

SecurityWeek

How to Stop AI Data Leaks: A Webinar Guide to Auditing Modern Agentic Workflows

infonews
securitysafety

Family of child injured in Canada school shooting sues OpenAI

infonews
safetypolicy

Oracle earnings will show whether its expensive AI bet is starting to pay off

infonews
industry
Mar 10, 2026

Oracle is reporting earnings on Tuesday as investors try to determine whether its massive investment in AI infrastructure is profitable. The company raised $50 billion in financing (debt and equity) to build data centers, mainly to serve OpenAI, and bond investors are watching closely because Oracle had to borrow heavily compared to other major cloud computing companies, raising concerns about its financial health and credit rating.

Improving instruction hierarchy in frontier LLMs

inforesearchBlog Research
safety

Meta’s deepfake moderation isn’t good enough, says Oversight Board

mediumnews
safetypolicy

Auditing the Gatekeepers: Fuzzing "AI Judges" to Bypass Security Controls

infonews
securityresearch

New ways to learn math and science in ChatGPT

infonews
industry
Mar 10, 2026

ChatGPT has introduced new interactive visual explanations for over 70 math and science concepts, allowing learners to manipulate variables and see real-time effects on graphs and outcomes instead of just reading static explanations. Research suggests that this type of interactive, visual learning helps students build stronger conceptual understanding compared to traditional instruction. The feature is now available globally to all ChatGPT users across all plans.

Jailbreaking the F-35 Fighter Jet

infonews
security
Mar 10, 2026

This blog discusses the F-35 fighter jet and mentions claims that Israel has 'jailbroken' (modified the software to bypass manufacturer restrictions on) its version of the aircraft, the F-35I Adir, to operate independently from US control systems. The post explores the technical and political complications of modifying highly restricted military software, including concerns about backdoors (hidden access points that could allow unauthorized control), supply chain dependencies, and international trade consequences.

OpenAI to acquire Promptfoo to strengthen AI agent security testing

infonews
securityindustry

You Could Be Next

infonews
industrypolicy

Why access decisions are becoming the weakest link in identity security

infonews
security
Mar 10, 2026

Organizations often focus on authentication (proving who someone is) through tools like MFA (multi-factor authentication, requiring multiple verification methods) and SSO (single sign-on, a centralized login system), but the real security weakness is authorization—deciding what people should actually access. Many companies only govern a small fraction of their applications and systems, leaving legacy systems, test environments, and shadow IT tools outside formal security controls, which attackers deliberately target.

Nvidia plans open-source AI agent platform ‘NemoClaw’ for enterprises: Wired

infonews
industry
Mar 10, 2026

Nvidia is planning to launch NemoClaw, an open-source platform for AI agents (specialized AI tools that can reason, plan, and act independently on complex tasks) targeting enterprise companies like Salesforce and Google. The platform will allow these companies to deploy AI agents to perform work tasks and is expected to include security and privacy tools, with early access offered to partners who contribute to the project.

When AI safety constrains defenders more than attackers

mediumnews
securitysafety

Overseas 'content farms' creating political deepfakes uncovered

infonews
safetysecurity

Security-Tools für KI-Infrastrukturen – ein Kaufratgeber

infonews
securityindustry

OpenAI and Google employees rush to Anthropic’s defense in DOD lawsuit

inforegulatory
policyindustry

Oracle is building yesterday’s data centers with tomorrow’s debt

infonews
industry
Mar 9, 2026

AI chip technology is advancing faster than data centers can be built, creating a financial risk for companies like Oracle that are investing heavily in infrastructure. OpenAI has decided not to expand its partnership with Oracle's Texas data center because it wants access to newer Nvidia chips rather than the older generation (Blackwell processors) that will be ready in a year, highlighting how quickly AI hardware becomes outdated. This mismatch is particularly risky for Oracle, which is funding its $100 billion expansion primarily through debt rather than using cash from existing profitable businesses like its competitors do.

Employees across OpenAI and Google support Anthropic’s lawsuit against the Pentagon

infonews
policy
Mar 9, 2026

Anthropic, an AI company, filed a lawsuit against the Department of Defense after being labeled a supply chain risk (a government designation suggesting a company could threaten critical systems). Nearly 40 employees from competing AI companies OpenAI and Google, including prominent figures, filed a legal support document expressing concerns about this decision and its implications for AI technology.

'InstallFix' Attacks Spread Fake Claude Code Sites

mediumnews
security
Mar 9, 2026

Attackers are running a campaign called 'InstallFix' that uses malvertising (ads serving malware) combined with ClickFix tactics (fake warning popups that trick users into taking action) to direct people to fake websites pretending to be Claude, an AI coding assistant. The attack exploits how developers use AI tools and command-line interfaces (text-based programs that run on computers) to execute code.

Previous174 / 320Next
Mar 10, 2026

AI Agents (software programs that automatically perform tasks like sending emails or moving data) create security risks because they have broad access to sensitive information with little oversight, making them targets for hackers who can trick them into leaking company secrets. Traditional security tools were designed to protect human users, not autonomous digital workers, leaving AI agents largely invisible to security teams. The article promotes an upcoming webinar that promises to explain how hackers target these agents and how to secure them without overly restricting their capabilities.

The Hacker News
Mar 10, 2026

A family is suing OpenAI after their 12-year-old daughter was critically injured in a Canadian school shooting, claiming that OpenAI knew the suspect was planning an attack through ChatGPT conversations but failed to alert authorities. The suspect's account was banned in June 2025 after employees flagged messages about gun violence as indicating imminent harm, but police were never notified, and the suspect later opened a second account to continue planning.

Fix: According to OpenAI's statement, the company has implemented several changes: enlisting mental health and behavioral experts to assess cases, making the criteria for police referral more flexible, strengthening detection systems to prevent evasion of safeguards, and establishing a direct point of contact with Canadian law enforcement to quickly flag cases with potential for real-world violence. OpenAI's CEO also pledged to strengthen protocols on notifying police about potentially harmful interactions.

BBC Technology
CNBC Technology
research
Mar 10, 2026

AI systems receive instructions from multiple sources (system policies, developers, users, and online data), and models must learn to prioritize the most trustworthy ones to stay safe. When models treat untrusted instructions as authoritative, they can be tricked into revealing private information, following harmful requests, or falling victim to prompt injection (hidden malicious instructions hidden in input data). OpenAI's solution uses a clear instruction hierarchy (System > developer > user > tool) and trains models with IH-Challenge, a reinforcement learning dataset designed to teach models to follow high-priority instructions even when lower-priority ones conflict with them.

Fix: OpenAI's models are trained on a clear instruction hierarchy where System instructions have highest priority, followed by developer instructions, then user instructions, then tool outputs. The company also created IH-Challenge, a reinforcement learning training dataset that generates conversations with conflicting instructions where high-priority instructions are kept simple and objectively gradable, ensuring models learn to prioritize correctly without resorting to useless shortcuts like over-refusing benign requests.

OpenAI Blog
Mar 10, 2026

Meta's Oversight Board (a semi-independent group that advises Meta on content moderation) found that Meta's methods for detecting deepfakes (AI-generated fake videos or images) are not strong enough to stop misinformation from spreading quickly during conflicts like the Iran war. The Board is calling on Meta to improve how it identifies and labels AI-generated content on Facebook, Instagram, and Threads.

The Verge (AI)
Mar 10, 2026

Researchers discovered that AI judges (LLMs acting as automated security gatekeepers to enforce safety policies) can be manipulated through prompt injection (tricking an AI by hiding instructions in its input) using stealthy formatting symbols rather than obvious gibberish. They created a tool called AdvJudge-Zero, a fuzzer (software that finds vulnerabilities by testing with unexpected inputs), which automatically identifies innocent-looking character sequences that exploit the model's decision-making logic to bypass security controls.

Fix: Palo Alto Networks customers are better protected through Prisma AIRS and the Unit 42 AI Security Assessment service. Organizations concerned about potential compromise can contact the Unit 42 Incident Response team.

Palo Alto Unit 42
OpenAI Blog
Schneier on Security
Mar 10, 2026

OpenAI is acquiring Promptfoo, a company that builds testing tools for AI applications, to improve security checks for AI agents (autonomous systems that operate independently in business processes) as more companies deploy them in production. Promptfoo's tools test AI models against adversarial prompts (malicious inputs designed to trick the AI), including prompt injection (hiding instructions in user input to manipulate the AI) and jailbreak attempts, and check whether models follow safety guidelines. The acquisition reflects growing enterprise concern about AI vulnerabilities and a shift toward treating AI security testing as an essential part of AI development, similar to traditional application security practices.

Fix: According to the source, the solution involves integrating Promptfoo's technology into OpenAI Frontier, OpenAI's platform for building and operating AI coworkers. The source also describes a 'shift-left approach' to AI testing, where security evaluation is integrated early in the development stage to simulate vulnerabilities, and continuous evaluation occurs during real-time monitoring and prompt execution. Additionally, enterprises are embedding AI evaluation platforms into DevSecOps workflows (development and security operations processes) so that models, prompts, and agent behaviors can be tested continuously before and after deployment.

CSO Online
Mar 10, 2026

Katya, a freelance journalist turned content marketer, was recruited by Mercor to create training data for AI models by writing chatbot prompts and responses, work she initially enjoyed but which was abruptly canceled without warning. The article describes how machine learning (AI systems that improve by finding patterns in large amounts of data) relies on thousands of humans hired to generate and grade training examples, but gig workers like Katya face sudden project cancellations and job instability in this emerging industry.

The Verge (AI)
CSO Online
CNBC Technology
Mar 10, 2026

Enterprise AI systems deployed for security work are heavily restricted by safety guardrails (automated filters designed to prevent harmful outputs), while attackers freely use jailbroken models (AI systems with safety measures bypassed), open-source alternatives, and purpose-built malicious tools. This creates an asymmetry where defenders face routine refusals when requesting legitimate defensive content like phishing simulations or proof-of-concept code, while attackers can easily circumvent safety measures through prompt injection (tricking AI by hiding instructions in its input) and other well-documented techniques, giving them a significant operational advantage.

CSO Online
Mar 10, 2026

Overseas 'content farms' based in Vietnam are using AI to create fake videos and images of UK politicians, spreading them on Facebook to go viral and potentially earn money through the platform's monetization program. The fake content, called deepfakes (digitally altered videos, pictures, or audio made to look real), depicts politicians in false situations like hospital stays or compromising scenarios, and Meta has removed some pages after investigation, though new ones continue appearing daily.

Fix: The Electoral Commission is developing software to spot and combat deepfakes ahead of the Welsh and Scottish parliaments' elections in May. Additionally, Facebook has marked some false stories with warnings from third-party fact-checkers like Full Fact, and Meta removed several Vietnam-based pages after being contacted by the BBC.

BBC Technology
Mar 9, 2026

As generative AI (systems that create new content based on patterns in training data) becomes widespread across industries, organizations need specialized security tools to protect their AI infrastructure and data from cyber threats. AI Security Posture Management (AI-SPM) is a new category of security software designed to monitor, assess, and secure AI systems, complementing existing tools like CSPM (Cloud Security Posture Management, which protects cloud environments) and DSPM (Data Security Posture Management, which prevents data breaches).

CSO Online
Mar 9, 2026

More than 30 employees from OpenAI and Google DeepMind filed a court statement supporting Anthropic in a lawsuit against the U.S. Defense Department, which labeled the AI company a supply-chain risk after Anthropic refused to let the Pentagon use its technology for mass surveillance or autonomous weapons. The employees argue that the Pentagon could have simply canceled its contract with Anthropic and purchased from another AI company instead of designating it as a supply-chain risk, a label typically reserved for foreign adversaries. They contend that if the government is allowed to punish Anthropic this way, it will harm U.S. competitiveness in AI and discourage open discussion about the risks of AI systems.

TechCrunch
CNBC Technology
The Verge (AI)
Dark Reading