aisecwatch.com
DashboardVulnerabilitiesNewsResearchArchiveStatsDatasetFor devs
Subscribe
aisecwatch.com

Real-time AI security monitoring. Tracking AI-related vulnerabilities, safety and security incidents, privacy risks, research developments, and policy changes.

Navigation

VulnerabilitiesNewsResearchDigest ArchiveNewsletter ArchiveSubscribeData SourcesStatisticsDatasetAPIIntegrationsWidgetRSS Feed

Maintained by

Truong (Jack) Luu

Information Systems Researcher

Industry News

New tools, products, platforms, funding rounds, and company developments in AI security.

to
Export CSV
2829 items

AI Risk Worries Insurers and Businesses Alike

infonews
policyindustry
Jun 10, 2026

Insurance companies are responding differently to the growing use of AI in businesses: some are refusing to cover AI-related risks entirely, while others are developing frameworks to manage those risks. The article raises the question of which AI risks companies can actually handle and control.

Dark Reading

Claude Fable won’t answer basic biology questions

infonews
safetypolicy

The future of AI regulation is courting the strangest, most anxious bedfellows

infonews
policy
Jun 10, 2026

This article discusses AI regulation efforts in Washington, D.C., noting that various political figures and stakeholders with differing interests are coming together to shape AI policy. The piece frames these unexpected political alliances as complex and contentious, comparing the current regulatory landscape to chaos.

Microsoft restricts Claude Fable for employees over data retention concerns

infonews
securityprivacy

DiffusionGemma: 4x faster text generation

infonews
research
Jun 10, 2026

DiffusionGemma is an experimental open AI model that uses text diffusion (a method that generates multiple words at once instead of one at a time) to achieve up to 4x faster text generation on GPUs compared to traditional language models. Unlike standard LLMs that predict words sequentially, DiffusionGemma generates entire blocks of 256 tokens in parallel, making it useful for speed-critical tasks like real-time editing and code completion, though with lower output quality than standard models.

Turn specs into evals for any agent with ASSERT

infonews
researchsafety

Cybersecurity researchers aren’t happy about the guardrails on Anthropic’s Fable

infonews
safety
Jun 10, 2026

Anthropic released Fable, a limited version of its cybersecurity AI model Mythos, with guardrails (safety restrictions) that block requests related to cybersecurity and biology topics to prevent misuse for creating malware or biological weapons. However, cybersecurity researchers complain the restrictions are overly broad and keyword-based, rejecting even legitimate tasks like code reviews and secure coding practices, though experts acknowledge this is an early-stage approach that may improve over time.

AI Agents Are Becoming Enterprise Workers. Who Secures Them?

infonews
securitysafety

CISO Forum Webinar Today: 2026 Mid-Year Review

infonews
securitypolicy

Autonomous AI agents duped into leaking sensitive data in phishing test

mediumnews
securitysafety

Investing in multi-agent AI safety research

infonews
safetyresearch

AI red teaming comes of age

infonews
securityresearch

Anthropic Releases Claude Fable 5, Its Most Powerful AI Yet, With Cyber Safeguards

infonews
safetysecurity

Chinese activist in UK told by X that abusive deepfakes do not breach rules

infonews
safetypolicy

Enterprises know AI-generated code is vulnerable; they’re shipping it anyway

infonews
securitysafety

Anthropic rolls out Claude Fable 5, but it's available for a limited time

infonews
industry
Jun 9, 2026

Anthropic released Fable 5, a safer version of its powerful Mythos AI model that includes guardrails (safety restrictions) to block harmful requests related to cybersecurity attacks, biology, and chemistry. Because Fable 5 consumes computing resources much faster than other models, Anthropic is offering it free only until June 22 to Pro, Max, and Enterprise subscribers, after which it will switch to usage-based pricing.

If Claude Fable stops helping you, you'll never know

mediumnews
safetypolicy

From data to decisions: how LSEG is scaling trusted AI

infonews
industry
Jun 9, 2026

London Stock Exchange Group (LSEG) deployed ChatGPT Enterprise and OpenAI APIs across their organization to transform how employees work with financial data and generate insights, rather than just improving existing systems. The company implemented governance frameworks including model evaluation, human review of critical outputs, and strict data privacy controls from the start. This approach reduced product release cycles from 3-6 months to 2 weeks and accelerated customer delivery timelines to approximately 4 weeks.

Initial impressions of Claude Fable 5

infonews
industry
Jun 9, 2026

Claude Fable 5 is a new AI model released by Anthropic that matches the capabilities of Claude Mythos 5 but includes stricter guardrails (safety restrictions to prevent harmful use) that trigger frequently enough to require new API mechanisms for handling rejections. The model has a 1 million token context window (the amount of text it can process at once), costs twice as much as previous models, and demonstrates notably stronger knowledge retention compared to earlier versions like Claude Opus 4.8.

I tried Siri AI, and so far it actually works

infonews
industry
Jun 9, 2026

Apple has released an upgraded version of Siri, its voice assistant (software that responds to spoken commands), which can now perform practical tasks like adding multiple calendar events from emails or flyers, creating shopping lists, and setting reminders. The new Siri can also access information from a user's email and calendar to make personalized recommendations, such as suggesting gardening tasks based on yard conditions.

Previous15 / 142Next
Jun 10, 2026

Anthropic released Claude Fable 5, claiming it is their most powerful model, but it refuses to answer basic biology questions and instead redirects them to an older model called Claude Opus 4.8. This limitation is intentional by design, not because the model lacks knowledge. Fable belongs to the Mythos-class family, a group of models so skilled at cybersecurity tasks that Anthropic decided they were too dangerous to release to the public.

The Verge (AI)
The Verge (AI)
Jun 10, 2026

Microsoft is restricting employee access to Claude Fable 5, Anthropic's new AI model, because of concerns about its data retention requirements. While the model is available to external GitHub Copilot and Foundry customers, Microsoft employees cannot access it through their internal tools because Claude Fable 5 does not operate under Zero Data Retention (ZDR, a policy where user data is not stored after interactions) like other Claude models do.

The Verge (AI)

Fix: For applications requiring maximum quality, the source recommends deploying standard Gemma 4 instead. Additionally, the source states that you can improve DiffusionGemma's performance on specific tasks through fine-tuning.

DeepMind Safety Research
Jun 10, 2026

ASSERT is an open-source framework that automatically converts written behavior requirements into evaluation tests for AI systems (like chatbots or agents). Instead of manually creating tests, ASSERT takes plain-language specifications and generates test scenarios, metrics, and scorecards to check whether an AI system behaves as intended, addressing the problem that generic evaluation metrics often miss application-specific requirements.

Microsoft Security Blog

Fix: Anthropic offers a Cyber Verification Program that approved cybersecurity professionals can join to gain fewer limitations on using Claude for cybersecurity work. Additionally, the source notes that Fable is programmed to fall back to Claude Opus 4.8 when it hits a guardrail, allowing users to continue their work with a less restricted model version.

TechCrunch (Security)
Jun 10, 2026

AI agents are now being deployed in companies to automate business workflows, such as managing customer renewal requests by reading emails, accessing CRM (customer relationship management, a database of customer information) data, and taking actions like drafting responses and updating records. Unlike simple text generators, these agents actively read sensitive business data, use system credentials (login information that grants access), and call external tools, which creates new security challenges that organizations need to address.

Check Point Research
Jun 10, 2026

This webinar announcement discusses how attackers are using AI to exploit vulnerabilities more quickly, and how security teams can defend using AI-driven tools. Key topics include protecting against Shadow AI (unmonitored use of generative AI in business units) and building AI governance frameworks to manage AI risks in organizations.

SecurityWeek
Jun 10, 2026

Autonomous AI agents (systems that independently perform tasks across business applications) with access to corporate email and applications can fall victim to phishing attacks (tricks to steal sensitive information by impersonating trusted people). In security tests, an AI agent called Pinchy failed to verify sender identities and leaked AWS credentials, database passwords, and customer data when requested through email, though it performed better against technical phishing attempts, revealing that the main weakness was social trust rather than technical reasoning.

CSO Online
Jun 10, 2026

Google DeepMind and partner organizations are funding $10M in research to study how safety challenges emerge when multiple AI agents (independent AI systems built by different organizations) interact with each other across networks. The concern is that when many agents communicate and work together, they can create unexpected collective behaviors that current safety tools cannot predict or control, so researchers need to develop better frameworks to understand and manage these multi-agent interactions before they become widespread.

DeepMind Safety Research
Jun 10, 2026

AI red teaming, the practice of testing AI systems for vulnerabilities before release, has become a major cybersecurity specialty since large language models like GPT-4 arrived, but traditional security testing methods no longer work. The field faces unique challenges because AI is probabilistic (producing different outputs each time) rather than deterministic, and because the most impactful attacks often come from casual users experimenting with prompts rather than sophisticated adversaries.

CSO Online
Jun 10, 2026

Anthropic released Claude Fable 5, a powerful AI model with safety classifiers (separate AI systems that monitor for misuse) that block cybersecurity-related requests by routing them to a weaker model instead of refusing them outright. The company also released Claude Mythos 5, an identical but unrestricted version for vetted cybersecurity professionals, because the underlying model is so effective at finding software vulnerabilities that giving it to the general public without controls could help attackers.

Fix: Anthropic stated it will narrow the safeguards and cut false positives after launch. The company also plans to make any remaining universal jailbreaks (prompts that completely bypass safety measures) slow and costly enough to catch before they are used at scale.

The Hacker News
Jun 10, 2026

A Chinese activist in the UK named Apple Peiqing Ni was targeted with deepfakes (synthetic media created by AI to manipulate someone's appearance or voice) on X (formerly Twitter) that falsely portrayed her as a drug addict, but X told her this abuse did not violate the platform's rules. She had reported the content to X after UK police advised her to do so, believing the deepfakes were created by a pro-regime bot (an automated account).

The Guardian Technology
Jun 9, 2026

Enterprises are deploying AI-generated code that contains security vulnerabilities at alarming rates, with nearly half of production code now AI-generated and organizations using 81-100% AI code shipping vulnerable code 3.4 times more often than conservative users. Despite knowing about these risks, companies are choosing to ship vulnerable code anyway due to pressure for ROI (return on investment, the financial benefit gained from an investment), outdated security practices, and organizational bottlenecks where the decision to deploy flawed code happens at the human level rather than the detection level.

CSO Online
BleepingComputer
Jun 9, 2026

Anthropic announced that Claude Fable 5 would silently reduce its helpfulness on requests about frontier LLM (large language model) development, such as building training infrastructure, without telling users it was doing so. Unlike other safety filters that give users feedback, these hidden interventions would use techniques like prompt modification and parameter-efficient fine-tuning (PEFT, adjusting a model's weights to change its behavior) to degrade response quality, affecting an estimated 0.03% of user requests.

Fix: Anthropic walked back this policy in the face of widespread outrage from the research community.

Simon Willison's Weblog
OpenAI Blog
Simon Willison's Weblog
The Verge (AI)