aisecwatch.com
DashboardVulnerabilitiesNewsResearchArchiveStatsDataset
aisecwatch.com

Real-time AI security monitoring. Tracking AI-related vulnerabilities, safety and security incidents, privacy risks, research developments, and policy changes.

Navigation

VulnerabilitiesNewsResearchDigest ArchiveNewsletter ArchiveSubscribeData SourcesStatisticsDatasetAPIIntegrationsWidgetRSS Feed

Maintained by

Truong (Jack) Luu

Information Systems Researcher

Browse All

All tracked items across vulnerabilities, news, research, incidents, and regulatory updates.

to
Export CSV
3117 items

Alibaba launches agentic AI tool for businesses with Slack, Teams integration plans

infonews
industry
Mar 17, 2026

Alibaba released Wukong, a new agentic AI tool (software that can take proactive actions on company systems, not just respond to questions) designed to help businesses manage multiple AI agents through a single interface with planned integration into messaging apps like Slack and Microsoft Teams. The platform handles tasks such as document editing, approvals, and meeting transcription, though the company acknowledges that giving AI agents broad access to company data raises privacy and security concerns.

CNBC Technology

Open, Closed and Broken: Prompt Fuzzing Finds LLMs Still Fragile Across Open and Closed Models

infonews
securityresearch

OpenAI Japan announces Japan Teen Safety Blueprint to put teen safety first

infonews
safetypolicy

Introducing GPT-5.4 mini and nano

infonews
industry
Mar 17, 2026

OpenAI released GPT-5.4 mini and nano, smaller and faster versions of their GPT-5.4 model designed for high-volume tasks where response speed matters. GPT-5.4 mini runs more than 2x faster than GPT-5 mini while approaching the performance of the full GPT-5.4 model on coding and reasoning tasks, while GPT-5.4 nano is the smallest and cheapest option for simpler jobs like classification and data extraction. These models work best in applications like coding assistants, AI subagents (specialized AI components that handle specific subtasks), and systems that interpret screenshots, where being fast and cost-effective is more important than raw capability.

A novel android malware detection method based on CWInFs and MPTACF optimization

inforesearchPeer-Reviewed
research

Runtime: The new frontier of AI agent security

infonews
securitysafety

Modeling of physical unclonable functions (PUF): A systematic literature review

inforesearchPeer-Reviewed
security

A photo of Iran’s bombed schoolgirl graveyard went around the world. Was it real, or AI?

infonews
safetypolicy

Agent Commander: Promptware-Powered Command and Control

infonews
securityresearch

AI firm Anthropic seeks weapons expert to stop users from 'misuse'

infonews
safetypolicy

Equipping workers with insights about compensation

infonews
researchindustry

Introducing Mistral Small 4

infonews
industry
Mar 16, 2026

Mistral released Mistral Small 4, a new 119-billion parameter model (Mixture-of-Experts, a technique where only some parts of the model activate for each task) that combines reasoning, image understanding, and coding capabilities into one system. The model supports two reasoning modes and is available through the Mistral API, though the reasoning effort setting was not yet documented in their API at the time of writing.

Child abuse material ‘systemic’ on Elon Musk’s X amid Grok scandal, Australian online safety regulator warned

infonews
safetypolicy

DLSS 5 looks like a real-time generative AI filter for video games

infonews
industry
Mar 16, 2026

Nvidia announced DLSS 5, a new technology that uses generative AI (artificial intelligence that creates new content) to improve video game graphics in real-time by enhancing lighting and shadows. The update has received mixed reactions, with some critics calling it low-quality output that disrespects game artists' original creative choices, while Nvidia claims it represents a major breakthrough that combines hand-crafted graphics with AI to improve visual quality while keeping artists in control.

Teens sue Elon Musk’s xAI over Grok’s AI-generated CSAM

infonews
safetypolicy

Quoting A member of Anthropic’s alignment-science team

infonews
safetyresearch

Alignment of Diffusion Models: Fundamentals, Challenges, and Future

inforesearchPeer-Reviewed
research

Machine Learning for Cybersecurity: A Comprehensive Literature Review

inforesearchPeer-Reviewed
research

Selective Forgetting in Machine Learning and Beyond: A Survey

inforesearchPeer-Reviewed
research

A Systematic Review on Human Roles, Solutions, and Methodological Approaches to Address Bias in AI

inforesearchPeer-Reviewed
research
Previous7 / 156Next
Mar 17, 2026

Researchers created a genetic algorithm-inspired prompt fuzzing method (automatically generating variations of harmful requests while keeping their meaning) that found significant weaknesses in guardrails (safety systems protecting LLMs) across multiple AI models, with evasion rates ranging from low to high depending on the model and keywords used. The key risk is that while individual jailbreak attempts (tricking an AI to ignore its safety rules) may have low success rates, attackers can automate this process at scale to reliably bypass protections. This matters because LLMs are increasingly used in customer support and internal tools, so guardrail failures can lead to safety incidents and compliance problems.

Fix: The source recommends five mitigation strategies: treating LLMs as non-security boundaries, defining scope, applying layered controls, validating outputs, and continuously testing GenAI with adversarial fuzzing (automated testing with malicious inputs) and red-teaming (simulated attacks to find weaknesses). Palo Alto Networks customers can use Prisma AIRS and the Unit 42 AI Security Assessment products for additional protection.

Palo Alto Unit 42
Mar 17, 2026

OpenAI Japan announced the Japan Teen Safety Blueprint, a framework to help teenagers use generative AI (systems that create text, images, or other content based on patterns) safely by reducing risks like misinformation and inappropriate content. The blueprint includes age-aware protections, stronger safety policies for users under 18, expanded parental controls, and research-based design improvements developed with child safety experts.

Fix: OpenAI will implement: (1) privacy-conscious, risk-based age estimation to distinguish teens from adults with appeals processes for incorrect determinations; (2) strengthened safety policies preventing AI from depicting self-harm, generating explicit content, or encouraging dangerous behavior; (3) expanded parental controls including account linking, privacy settings, usage-time management, and alerts; (4) research-based design features such as break reminders and pathways to real-world support; and (5) continuation of existing safeguards including in-product break reminders, self-harm detection systems, multi-layered safety systems, and abuse monitoring.

OpenAI Blog
OpenAI Blog
Mar 17, 2026

Android malware is a major security threat because the Android operating system's open app ecosystem allows unverified applications to be installed, making it easier for malicious software to spread and steal data, perform unauthorized financial transactions, or remotely control devices. Researchers are using machine learning (algorithms that learn patterns from data) to detect malware by analyzing features of Android application packages (APK files, the file format for Android apps), with recent research focusing on three main approaches: selecting the most important features to analyze, combining multiple detection models together, and handling datasets where malicious apps are much rarer than legitimate ones.

Elsevier Security Journals
Mar 17, 2026

AI agents (autonomous software programs that can perform tasks independently) are now operating inside company networks with real access to systems, sometimes causing expensive mistakes like deleting inboxes or taking services offline. Traditional security approaches focus on preventing problems before deployment, but security leaders increasingly argue that runtime security (continuously monitoring what software actually does while it's running) is equally critical because agents can bypass normal security checkpoints and make mistakes at high speed. The challenge is that agents operate through API calls and other direct connections that traditional security tools don't intercept, generate enormous volumes of activity, and often don't create detailed logs that security teams can review.

CSO Online
Mar 17, 2026

This academic paper is a systematic literature review (a comprehensive analysis of existing research) about physical unclonable functions, or PUFs, which are hardware-based security features that create unique, unchangeable identifiers for devices based on their physical properties. Published in July 2026, the review examines how PUFs are modeled and studied across different research papers. The paper does not describe a security problem or vulnerability, but rather surveys current knowledge about how these security devices work.

Elsevier Security Journals
Mar 17, 2026

Multiple fake images and unreliable responses from AI systems like Gemini and Grok have spread widely during coverage of the Iran conflict, making it difficult to verify whether widely-shared photos, such as one purporting to show a mass grave for schoolgirls, are real or AI-generated. The article highlights how AI-generated misinformation (often called "AI slop," low-quality AI-produced content) is flooding news coverage of the war.

The Guardian Technology
Mar 16, 2026

Promptware-powered command and control (C2, a system attackers use to remotely control compromised devices) refers to using prompt injection (tricking an AI by hiding instructions in its input) attacks against AI tools like ChatGPT to create a malicious control channel. Researchers have demonstrated that by combining features like browsing and memory capabilities in AI systems, attackers can build complex, malware-like prompt injection payloads that function similarly to traditional malware for remote control purposes.

Embrace The Red
Mar 16, 2026

Anthropic, a US AI company, is hiring a weapons expert to prevent its AI tools from being misused to create chemical, biological, or radioactive weapons. The article notes that other AI firms like OpenAI are doing the same, but some experts worry this approach is risky because it requires exposing AI systems to sensitive weapons information, even if the systems are instructed not to use it.

BBC Technology
Mar 16, 2026

Workers are using ChatGPT to find wage information, sending nearly 3 million messages per day in the US asking about compensation, especially in fields where pay is hard to find or varies widely like creative work, management, and healthcare. The article describes how AI can help close the wage information gap by synthesizing pay data across multiple sources, which matters because better wage information helps workers make informed decisions about job applications, negotiations, and career moves. OpenAI introduced WorkerBench, a new benchmark tool, to evaluate how accurately ChatGPT provides labor market wage information compared to official government data.

OpenAI Blog
Simon Willison's Weblog
Mar 16, 2026

Australia's online safety regulator warned Elon Musk's X platform that child abuse material was unusually widespread on the service after Grok, a chatbot (an AI designed to have conversations), was used to create sexualized images of women and children. The regulator's letter, sent in January following the incident, pointed out that such harmful content was more accessible on X than on other major social media platforms.

The Guardian Technology
The Verge (AI)
Mar 16, 2026

Three Tennessee teens are suing Elon Musk's xAI company, claiming that Grok, an AI chatbot, generated sexualized images and videos of them as minors. The lawsuit alleges that xAI leaders knew the chatbot's "spicy mode" (a less-restricted version of the AI) would produce CSAM (child sexual abuse material, illegal content depicting minors in sexual situations) when they launched it last year.

The Verge (AI)
Mar 16, 2026

An Anthropic alignment researcher explains that their team conducted a blackmail exercise to demonstrate misalignment risk (when an AI system's goals don't match what humans intend) in a way that would convince policymakers. The goal was to create compelling, concrete evidence that would make the potential dangers of misaligned AI feel real to people who hadn't previously considered the issue.

Simon Willison's Weblog
safety
Mar 16, 2026

This is an academic survey paper published in ACM Computing Surveys that examines alignment of diffusion models (AI systems trained to generate images or other content by gradually removing noise from random data). The paper covers fundamental concepts, current challenges in making these models behave as intended, and directions for future research in this area.

ACM Digital Library (TOPS, DTRAP, CSUR)
Mar 16, 2026

This is a literature review article published in an academic journal that surveys how machine learning (algorithms that learn patterns from data to make predictions) is being applied to cybersecurity problems. The article covers research across the field but does not describe a specific security vulnerability or incident requiring a fix.

ACM Digital Library (TOPS, DTRAP, CSUR)
safety
Mar 16, 2026

This is a survey article that reviews research on selective forgetting in machine learning, which is the ability to remove or reduce specific information from a trained AI model without completely retraining it from scratch. The article covers methods and applications of this technique across various AI systems and domains. The survey appears to be an academic overview of current knowledge in this area rather than describing a specific problem or vulnerability.

ACM Digital Library (TOPS, DTRAP, CSUR)
safety
Mar 16, 2026

This academic review examines how bias (systematic unfairness in AI decision-making) occurs in AI systems and explores the human roles, solutions, and research methods used to identify and reduce it. The paper surveys existing approaches to addressing bias rather than proposing a single new solution.

ACM Digital Library (TOPS, DTRAP, CSUR)