aisecwatch.com
HomeVulnerabilitiesNewsResearchArchive
Stay Informed
aisecwatch.com

Your daily watch on AI and LLM security — vulnerabilities, privacy incidents, safety research, and industry developments.

Maintained and curated by Truong (Jack) Luu

Information Systems Researcher

LIVE — Updated every 12 hours

AI Sec Watch

Your daily watch on AI and LLM security — vulnerabilities, privacy incidents, safety research, and industry developments.

Today's TLDRWednesday, February 11, 2026
47 tracked
>

Critical LLM Security Vulnerabilities Discovered: Claude Opus 4.6 discovered 500+ previously unknown high-severity vulnerabilities in major open-source libraries, while researchers revealed fundamental safety weaknesses in Mixture-of-Experts models where manipulating specific routers can bypass safety mechanisms (up to 86.3% jailbreak success), and concentration of safety behaviors in few experts enables "lobotomy" attacks that silence safety-critical components.

>

Multiple SSRF and Authentication Bypass Flaws in Production Systems: LangChain versions prior to 1.2.11 contain SSRF vulnerabilities in RecursiveUrlLoader and ChatOpenAI token counting (CVE-2026-26013) that allow attackers to access internal infrastructure and cloud metadata services, while OpenMetadata leaks highly-privileged JWT tokens through API calls, enabling read-only users to escalate privileges and make destructive changes.

>

Emerging Threats in AI Agent Security and RAG Systems: New research exposes "retrieval pivot attacks" in hybrid RAG systems where vector-retrieved content can pivot through knowledge graphs to leak cross-tenant data (RPR up to 0.95), while the AARM specification proposes runtime security controls for autonomous agents to prevent prompt injection, confused deputy attacks, and intent drift as AI systems evolve from assistants to autonomous actors.

>

AI Detection and Fingerprinting Systems Face Evasion Challenges: StealthRL achieves 99.9% attack success rate against AI-text detectors using reinforcement learning paraphrasing attacks, while researchers developed "refusal vector" fingerprinting achieving 100% accuracy in identifying LLM provenance across model modifications, and compositional reasoning attacks scattered across long contexts (64k tokens) successfully evade safety alignment in stronger reasoning models.

>

India's Deepfake Deadline and Rising Industry Investment in AI Security: India mandates social media platforms remove illegal AI-generated content and label all synthetic content by February 20th, affecting 1 billion users, while AI security startups Outtake ($40M Series B) and Zast.AI ($6M) raise significant funding to address AI-enabled fraud, impersonation attacks, and automated vulnerability detection as threats scale beyond manual human intervention.

Latest

VIEW ALL →

Instagram and X have an impossible deepfake detection deadline

policysafety
2/11/2026

India has mandated that social media platforms must remove illegal AI-generated content much faster and ensure all synthetic content is clearly labeled, with rules taking effect on February 20th. This gives tech companies only days to implement detection and labeling systems for deepfakes, putting immediate pressure on platforms like Instagram and X to comply in a critical market of 1 billion internet users.

The Verge

GHSA-gf3v-fwqg-4vh7: @langchain/community affected by SSRF Bypass in RecursiveUrlLoader via insufficient URL origin validation

security
2/11/2026

The RecursiveUrlLoader class in @langchain/community had an SSRF vulnerability due to insufficient URL validation. It used String.startsWith() for URL comparison, allowing attackers to bypass the preventOutside option with domain prefix tricks (e.g., example.com.attacker.com), and had no validation against private/reserved IP addresses, enabling access to cloud metadata services and internal infrastructure.

Fix: Two changes were made: 1) The startsWith check was replaced with strict origin comparison using the URL API (new URL(link).origin === new URL(baseUrl).origin) to prevent subdomain-based bypasses. 2) A new URL validation module (@langchain/core/utils/ssrf) was introduced that blocks requests to cloud metadata endpoints (169.254.169.254, metadata.google.internal, etc.), private IP ranges (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 127.0.0.0/8, etc.), IPv6 equivalents (::1, fc00::/7, fe80::/10), and non-HTTP/HTTPS schemes. As a workaround for users who cannot upgrade immediately: avoid using RecursiveUrlLoader on untrusted or user-influenced content, or run the crawler in a network environment without access to cloud metadata or internal services.

GitHub Advisory Database

GHSA-2g6r-c272-w58r: LangChain affected by SSRF via image_url token counting in ChatOpenAI.get_num_tokens_from_messages

security
2/11/2026

LangChain's ChatOpenAI.get_num_tokens_from_messages() method contains an SSRF vulnerability where it fetches arbitrary image_url values without validation when computing token counts for vision-enabled models. Attackers can exploit this to trigger HTTP requests from the application server to arbitrary internal or external URLs, though impact is limited as it's a blind SSRF with a 5-second timeout.

Fix: The vulnerability has been patched in langchain-openai==1.1.9 (requires langchain-core==1.2.11). The patch adds: (1) SSRF validation using langchain_core._security._ssrf_protection.validate_safe_url() to block private IP ranges, cloud metadata endpoints, and invalid URL schemes; (2) explicit size limits (50 MB maximum); (3) explicit timeout (5 seconds); and (4) ability to disable image fetching via allow_fetching_images=False parameter. If unable to upgrade immediately, sanitize input by validating and filtering image_url values before passing messages to token counting, or implement egress filtering to prevent outbound requests to private IPs.

GitHub Advisory Database

GHSA-pqqf-7hxm-rj5r: Leaky JWTs in OpenMetadata exposing highly-privileged bot users

securityprivacy
2/11/2026

OpenMetadata leaks JWT tokens used by highly-privileged ingestion-bot accounts through API calls to `/api/v1/ingestionPipelines` for certain services (Glue, Redshift, Postgres). Any read-only user can extract these JWTs from the UI's network requests and use them to make destructive API calls, enabling privilege escalation and potential data leakage. The vulnerability was demonstrated in the Collate Sandbox by extracting an ingestion bot JWT and using it to modify database descriptions.

Fix: Redact jwtToken in API payload. Implement role-based filtering - Only return JWT tokens to users with explicit admin/service account permissions. (for Admins) Rotate Ingestion Bot Tokens in affected environments.

GitHub Advisory Database

Zast.AI Raises $6 Million for AI-Powered Code Security

securityindustry
2/11/2026

Zast.AI, a startup focused on AI-powered code security, has raised $6 million in funding. The company uses AI agents to identify and validate software vulnerabilities before reporting them.

SecurityWeek

CVE-2026-26013: LangChain is a framework for building agents and LLM-powered applications. Prior to 1.2.11, the ChatOpenAI.get_num_token

security
2/10/2026

CVE-2026-26013 is a Server-Side Request Forgery (SSRF) vulnerability in LangChain, a framework for building agents and LLM-powered applications. Prior to version 1.2.11, the ChatOpenAI.get_num_tokens_from_messages() method fetches arbitrary image_url values without validation when computing token counts for vision-enabled models, allowing attackers to trigger SSRF attacks through malicious image URLs in user input.

Fix: This vulnerability is fixed in version 1.2.11.

NVD/CVE Database

CAPID: Context-Aware PII Detection for Question-Answering Systems

privacyresearch
2/10/2026

This paper proposes CAPID, a context-aware PII detection system for question-answering platforms that addresses the limitation of current approaches which redact all PII regardless of contextual relevance. The approach fine-tunes a locally owned small language model (SLM) to detect PII spans, classify their types, and determine contextual relevance before data is passed to LLMs, avoiding privacy concerns with closed-source models. A synthetic data generation pipeline using LLMs is introduced to create training data that captures context-dependent PII relevance across multiple domains.

Arxiv (cs.CR + cs.AI)

Trustworthy Agentic AI Requires Deterministic Architectural Boundaries

securityresearchsafety
2/10/2026

This paper argues that current agentic AI architectures are fundamentally incompatible with high-stakes scientific workflows because autoregressive language models cannot deterministically separate commands from data through training alone. The authors contend that probabilistic alignment and guardrails are insufficient for authorization security, and that deterministic architectural enforcement is necessary to prevent the "Lethal Trifecta" of untrusted inputs, privileged data access, and external action capability from becoming an exploit-discovery problem.

Fix: The paper introduces the Trinity Defense Architecture, which enforces security through three mechanisms: action governance via a finite action calculus with reference-monitor enforcement, information-flow control via mandatory access labels preventing cross-scope leakage, and privilege separation isolating perception from execution.

Arxiv (cs.CR + cs.AI)