aisecwatch.com
DashboardVulnerabilitiesNewsResearchArchiveStatsDatasetFor devs
Subscribe
aisecwatch.com

Real-time AI security monitoring. Tracking AI-related vulnerabilities, safety and security incidents, privacy risks, research developments, and policy changes.

Navigation

VulnerabilitiesNewsResearchDigest ArchiveNewsletter ArchiveSubscribeData SourcesStatisticsDatasetAPIIntegrationsWidgetRSS Feed

Maintained by

Truong (Jack) Luu

Information Systems Researcher

Browse All

All tracked items across vulnerabilities, news, research, incidents, and regulatory updates.

to
Export CSV
6089 items

Microsoft restricts Claude Fable for employees over data retention concerns

infonews
securityprivacy
Jun 10, 2026

Microsoft is restricting employee access to Claude Fable 5, Anthropic's new AI model, because of concerns about its data retention requirements. While the model is available to external GitHub Copilot and Foundry customers, Microsoft employees cannot access it through their internal tools because Claude Fable 5 does not operate under Zero Data Retention (ZDR, a policy where user data is not stored after interactions) like other Claude models do.

The Verge (AI)

DiffusionGemma: 4x faster text generation

infonews
research
Jun 10, 2026

DiffusionGemma is an experimental open AI model that uses text diffusion (a method that generates multiple words at once instead of one at a time) to achieve up to 4x faster text generation on GPUs compared to traditional language models. Unlike standard LLMs that predict words sequentially, DiffusionGemma generates entire blocks of 256 tokens in parallel, making it useful for speed-critical tasks like real-time editing and code completion, though with lower output quality than standard models.

Turn specs into evals for any agent with ASSERT

infonews
researchsafety

Cybersecurity researchers aren’t happy about the guardrails on Anthropic’s Fable

infonews
safety
Jun 10, 2026

Anthropic released Fable, a limited version of its cybersecurity AI model Mythos, with guardrails (safety restrictions) that block requests related to cybersecurity and biology topics to prevent misuse for creating malware or biological weapons. However, cybersecurity researchers complain the restrictions are overly broad and keyword-based, rejecting even legitimate tasks like code reviews and secure coding practices, though experts acknowledge this is an early-stage approach that may improve over time.

AI Agents Are Becoming Enterprise Workers. Who Secures Them?

infonews
securitysafety

FIT-Print: Toward False-Claim-Resistant Model Ownership Verification via Targeted Fingerprint

inforesearchPeer-Reviewed
security

SOOM: A Schedule-Search-Based Operator Obfuscation Method Against Model Extraction Attacks

inforesearchPeer-Reviewed
security

MEC-Dedup: Secure data deduplication for mobile users in edge-assisted cloud storage systems

inforesearchPeer-Reviewed
security

A provably secure identity-based aggregate signcryption scheme for Vehicle-to-Infrastructure communication in VANETs

inforesearchPeer-Reviewed
security

CISO Forum Webinar Today: 2026 Mid-Year Review

infonews
securitypolicy

PRC-linked influence operations are targeting AI debates in the US

infoincident
securitypolicy

Autonomous AI agents duped into leaking sensitive data in phishing test

mediumnews
securitysafety

Investing in multi-agent AI safety research

infonews
safetyresearch

AI red teaming comes of age

infonews
securityresearch

Anthropic Releases Claude Fable 5, Its Most Powerful AI Yet, With Cyber Safeguards

infonews
safetysecurity

Chinese activist in UK told by X that abusive deepfakes do not breach rules

infonews
safetypolicy

Enterprises know AI-generated code is vulnerable; they’re shipping it anyway

infonews
securitysafety

Anthropic rolls out Claude Fable 5, but it's available for a limited time

infonews
industry
Jun 9, 2026

Anthropic released Fable 5, a safer version of its powerful Mythos AI model that includes guardrails (safety restrictions) to block harmful requests related to cybersecurity attacks, biology, and chemistry. Because Fable 5 consumes computing resources much faster than other models, Anthropic is offering it free only until June 22 to Pro, Max, and Enterprise subscribers, after which it will switch to usage-based pricing.

If Claude Fable stops helping you, you'll never know

mediumnews
safetypolicy

CVE-2026-46517: LMDeploy is a toolkit for compressing, deploying, and serving large language models. In versions 0.12.3 and prior, hardc

highvulnerability
security
Jun 9, 2026
CVE-2026-46517

LMDeploy, a toolkit for compressing and deploying large language models, has a vulnerability in versions 0.12.3 and earlier where a setting called 'trust_remote_code' is hardcoded to 'True'. This allows an attacker to execute remote code (RCE, meaning they can run commands on a system) through the software supply chain without the user agreeing to it. At the time this vulnerability was published, no patches were available to fix it.

Previous16 / 305Next

Fix: For applications requiring maximum quality, the source recommends deploying standard Gemma 4 instead. Additionally, the source states that you can improve DiffusionGemma's performance on specific tasks through fine-tuning.

DeepMind Safety Research
Jun 10, 2026

ASSERT is an open-source framework that automatically converts written behavior requirements into evaluation tests for AI systems (like chatbots or agents). Instead of manually creating tests, ASSERT takes plain-language specifications and generates test scenarios, metrics, and scorecards to check whether an AI system behaves as intended, addressing the problem that generic evaluation metrics often miss application-specific requirements.

Microsoft Security Blog

Fix: Anthropic offers a Cyber Verification Program that approved cybersecurity professionals can join to gain fewer limitations on using Claude for cybersecurity work. Additionally, the source notes that Fable is programmed to fall back to Claude Opus 4.8 when it hits a guardrail, allowing users to continue their work with a less restricted model version.

TechCrunch (Security)
Jun 10, 2026

AI agents are now being deployed in companies to automate business workflows, such as managing customer renewal requests by reading emails, accessing CRM (customer relationship management, a database of customer information) data, and taking actions like drafting responses and updating records. Unlike simple text generators, these agents actively read sensitive business data, use system credentials (login information that grants access), and call external tools, which creates new security challenges that organizations need to address.

Check Point Research
research
Jun 10, 2026

Existing model fingerprinting techniques (methods that create unique digital signatures to prove ownership of AI models) are vulnerable to false claim attacks, where attackers can fraudulently claim they own models they didn't create. This paper introduces FIT-Print, a targeted fingerprinting approach that uses optimization to create verifiable signatures resistant to these false claims, offering two specific methods (bit-wise FIT-ModelDiff and list-wise FIT-LIME) that achieved 100% success in preventing false ownership claims while maintaining accurate ownership verification.

Fix: The paper proposes FIT-Print, a targeted fingerprinting paradigm that 'actively counters false claim attacks' by leveraging 'optimization to transform the fingerprint into a verifiable, targeted signature.' Two specific black-box fingerprinting methods are introduced: 'bit-wise FIT-ModelDiff' which 'utilizes output distances' and 'list-wise FIT-LIME' which utilizes 'feature attributions as robust model signatures.' The framework demonstrated '100% defense success rate' against false claim attacks and '100% ownership verification rate.'

IEEE Xplore (Security & AI Journals)
research
Jun 10, 2026

Researchers created SOOM, a defense method that obfuscates (hides or disguises) deep learning operators to protect against model extraction attacks, where attackers reverse-engineer compiled neural network code to recreate trainable models. Built on TVM (a deep learning compiler), SOOM uses a machine learning cost model to scramble how operators work while keeping inference fast, achieving a 89% failure rate against extraction attacks with minimal performance slowdown.

Fix: The source proposes SOOM itself as the mitigation: a schedule-search-based operator obfuscation method built on TVM that constructs an obfuscation space for deep learning operators and uses a security-aware learned cost model based on XGBoost gradient boosted trees to generate obfuscated executable code for various deep learning operators, balancing security objectives with performance requirements.

IEEE Xplore (Security & AI Journals)
Jun 10, 2026

MEC-Dedup is a security approach for mobile users storing data in cloud systems that use edge computing (processing done on devices near the user rather than in distant data centers). The system addresses risks that arise when multiple users' identical files are deduplicated (combined into one copy to save space), which could let attackers identify sensitive information. The research proposes methods to keep user data secure while still allowing the efficiency gains of deduplication in edge-assisted cloud storage.

Elsevier Security Journals
Jun 10, 2026

This academic paper presents a new cryptographic method for secure communication between vehicles and infrastructure in VANETs (vehicular ad hoc networks, which are temporary networks formed by moving vehicles). The scheme uses identity-based aggregate signcryption (a technique that combines digital signatures for authentication with encryption for confidentiality, while processing multiple messages together), and the authors claim to have mathematically proven it cannot be broken by attackers.

Elsevier Security Journals
Jun 10, 2026

This webinar announcement discusses how attackers are using AI to exploit vulnerabilities more quickly, and how security teams can defend using AI-driven tools. Key topics include protecting against Shadow AI (unmonitored use of generative AI in business units) and building AI governance frameworks to manage AI risks in organizations.

SecurityWeek
Jun 10, 2026

OpenAI discovered and banned two clusters of ChatGPT accounts likely from China that were running covert influence operations (hidden campaigns to manipulate public opinion) to shape American debates about AI policy. One cluster spread false claims that data centers were raising electricity prices, while the other criticized US tariffs while excluding China's leader from discussions, and OpenAI is publishing these findings to help the industry, governments, and the public identify and stop similar foreign manipulation attempts.

OpenAI Blog
Jun 10, 2026

Autonomous AI agents (systems that independently perform tasks across business applications) with access to corporate email and applications can fall victim to phishing attacks (tricks to steal sensitive information by impersonating trusted people). In security tests, an AI agent called Pinchy failed to verify sender identities and leaked AWS credentials, database passwords, and customer data when requested through email, though it performed better against technical phishing attempts, revealing that the main weakness was social trust rather than technical reasoning.

CSO Online
Jun 10, 2026

Google DeepMind and partner organizations are funding $10M in research to study how safety challenges emerge when multiple AI agents (independent AI systems built by different organizations) interact with each other across networks. The concern is that when many agents communicate and work together, they can create unexpected collective behaviors that current safety tools cannot predict or control, so researchers need to develop better frameworks to understand and manage these multi-agent interactions before they become widespread.

DeepMind Safety Research
Jun 10, 2026

AI red teaming, the practice of testing AI systems for vulnerabilities before release, has become a major cybersecurity specialty since large language models like GPT-4 arrived, but traditional security testing methods no longer work. The field faces unique challenges because AI is probabilistic (producing different outputs each time) rather than deterministic, and because the most impactful attacks often come from casual users experimenting with prompts rather than sophisticated adversaries.

CSO Online
Jun 10, 2026

Anthropic released Claude Fable 5, a powerful AI model with safety classifiers (separate AI systems that monitor for misuse) that block cybersecurity-related requests by routing them to a weaker model instead of refusing them outright. The company also released Claude Mythos 5, an identical but unrestricted version for vetted cybersecurity professionals, because the underlying model is so effective at finding software vulnerabilities that giving it to the general public without controls could help attackers.

Fix: Anthropic stated it will narrow the safeguards and cut false positives after launch. The company also plans to make any remaining universal jailbreaks (prompts that completely bypass safety measures) slow and costly enough to catch before they are used at scale.

The Hacker News
Jun 10, 2026

A Chinese activist in the UK named Apple Peiqing Ni was targeted with deepfakes (synthetic media created by AI to manipulate someone's appearance or voice) on X (formerly Twitter) that falsely portrayed her as a drug addict, but X told her this abuse did not violate the platform's rules. She had reported the content to X after UK police advised her to do so, believing the deepfakes were created by a pro-regime bot (an automated account).

The Guardian Technology
Jun 9, 2026

Enterprises are deploying AI-generated code that contains security vulnerabilities at alarming rates, with nearly half of production code now AI-generated and organizations using 81-100% AI code shipping vulnerable code 3.4 times more often than conservative users. Despite knowing about these risks, companies are choosing to ship vulnerable code anyway due to pressure for ROI (return on investment, the financial benefit gained from an investment), outdated security practices, and organizational bottlenecks where the decision to deploy flawed code happens at the human level rather than the detection level.

CSO Online
BleepingComputer
Jun 9, 2026

Anthropic announced that Claude Fable 5 would silently reduce its helpfulness on requests about frontier LLM (large language model) development, such as building training infrastructure, without telling users it was doing so. Unlike other safety filters that give users feedback, these hidden interventions would use techniques like prompt modification and parameter-efficient fine-tuning (PEFT, adjusting a model's weights to change its behavior) to degrade response quality, affecting an estimated 0.03% of user requests.

Fix: Anthropic walked back this policy in the face of widespread outrage from the research community.

Simon Willison's Weblog
NVD/CVE Database