AI &amp; LLM Vulnerabilities

OpenClaw, an open-source LLM personal assistant tool created by Peter Steinberger that went viral in January 2025, has raised significant security concerns among experts. The tool allows users to create AI assistants with extensive access to personal data (emails, hard drives, credit cards) and operates 24/7, but poses multiple risks including AI mistakes, conventional hacking vulnerabilities, and especially prompt injection attacks where malicious content can hijack the LLM. Security experts and even the Chinese government have issued warnings, with Steinberger himself stating that non-technical people should not use the software.

MIT Technology Review

Instagram and X have an impossible deepfake detection deadline

policysafety

India has mandated that social media platforms must remove illegal AI-generated content much faster and ensure all synthetic content is clearly labeled, with rules taking effect on February 20th. This gives tech companies only days to implement detection and labeling systems for deepfakes, putting immediate pressure on platforms like Instagram and X to comply in a critical market of 1 billion internet users.

The Verge

GHSA-gf3v-fwqg-4vh7: @langchain/community affected by SSRF Bypass in RecursiveUrlLoader via insufficient URL origin validation

security

The RecursiveUrlLoader class in @langchain/community had an SSRF vulnerability due to insufficient URL validation. It used String.startsWith() for URL comparison, allowing attackers to bypass the preventOutside option with domain prefix tricks (e.g., example.com.attacker.com), and had no validation against private/reserved IP addresses, enabling access to cloud metadata services and internal infrastructure.

Fix: Two changes were made: 1) The startsWith check was replaced with strict origin comparison using the URL API (new URL(link).origin === new URL(baseUrl).origin) to prevent subdomain-based bypasses. 2) A new URL validation module (@langchain/core/utils/ssrf) was introduced that blocks requests to cloud metadata endpoints (169.254.169.254, metadata.google.internal, etc.), private IP ranges (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 127.0.0.0/8, etc.), IPv6 equivalents (::1, fc00::/7, fe80::/10), and non-HTTP/HTTPS schemes. As a workaround for users who cannot upgrade immediately: avoid using RecursiveUrlLoader on untrusted or user-influenced content, or run the crawler in a network environment without access to cloud metadata or internal services.

GitHub Advisory Database

GHSA-2g6r-c272-w58r: LangChain affected by SSRF via image_url token counting in ChatOpenAI.get_num_tokens_from_messages

security

LangChain's ChatOpenAI.get_num_tokens_from_messages() method contains an SSRF vulnerability where it fetches arbitrary image_url values without validation when computing token counts for vision-enabled models. Attackers can exploit this to trigger HTTP requests from the application server to arbitrary internal or external URLs, though impact is limited as it's a blind SSRF with a 5-second timeout.

Fix: The vulnerability has been patched in langchain-openai==1.1.9 (requires langchain-core==1.2.11). The patch adds: (1) SSRF validation using langchain_core._security._ssrf_protection.validate_safe_url() to block private IP ranges, cloud metadata endpoints, and invalid URL schemes; (2) explicit size limits (50 MB maximum); (3) explicit timeout (5 seconds); and (4) ability to disable image fetching via allow_fetching_images=False parameter. If unable to upgrade immediately, sanitize input by validating and filtering image_url values before passing messages to token counting, or implement egress filtering to prevent outbound requests to private IPs.

GitHub Advisory Database

GHSA-pqqf-7hxm-rj5r: Leaky JWTs in OpenMetadata exposing highly-privileged bot users

securityprivacy

OpenMetadata leaks JWT tokens used by highly-privileged ingestion-bot accounts through API calls to `/api/v1/ingestionPipelines` for certain services (Glue, Redshift, Postgres). Any read-only user can extract these JWTs from the UI's network requests and use them to make destructive API calls, enabling privilege escalation and potential data leakage. The vulnerability was demonstrated in the Collate Sandbox by extracting an ingestion bot JWT and using it to modify database descriptions.

Fix: Redact jwtToken in API payload. Implement role-based filtering - Only return JWT tokens to users with explicit admin/service account permissions. (for Admins) Rotate Ingestion Bot Tokens in affected environments.

GitHub Advisory Database

Zast.AI Raises $6 Million for AI-Powered Code Security

securityindustry

Zast.AI, a startup focused on AI-powered code security, has raised $6 million in funding. The company uses AI agents to identify and validate software vulnerabilities before reporting them.

SecurityWeek

CVE-2026-26013: LangChain is a framework for building agents and LLM-powered applications. Prior to 1.2.11, the ChatOpenAI.get_num_token

security

CVE-2026-26013 is a Server-Side Request Forgery (SSRF) vulnerability in LangChain, a framework for building agents and LLM-powered applications. Prior to version 1.2.11, the ChatOpenAI.get_num_tokens_from_messages() method fetches arbitrary image_url values without validation when computing token counts for vision-enabled models, allowing attackers to trigger SSRF attacks through malicious image URLs in user input.

Fix: This vulnerability is fixed in version 1.2.11.

NVD/CVE Database

CAPID: Context-Aware PII Detection for Question-Answering Systems

privacyresearch

This paper proposes CAPID, a context-aware PII detection system for question-answering platforms that addresses the limitation of current approaches which redact all PII regardless of contextual relevance. The approach fine-tunes a locally owned small language model (SLM) to detect PII spans, classify their types, and determine contextual relevance before data is passed to LLMs, avoiding privacy concerns with closed-source models. A synthetic data generation pipeline using LLMs is introduced to create training data that captures context-dependent PII relevance across multiple domains.

Trustworthy Agentic AI Requires Deterministic Architectural Boundaries

securityresearchsafety

This paper argues that current agentic AI architectures are fundamentally incompatible with high-stakes scientific workflows because autoregressive language models cannot deterministically separate commands from data through training alone. The authors contend that probabilistic alignment and guardrails are insufficient for authorization security, and that deterministic architectural enforcement is necessary to prevent the "Lethal Trifecta" of untrusted inputs, privileged data access, and external action capability from becoming an exploit-discovery problem.

Spinel: A Post-Quantum Signature Scheme Based on SLn(Fp) Hashing

researchsecurity

Spinel is a post-quantum digital signature scheme that combines the security of SPHINCS+ with a new family of algebraic hash functions based on the Tillich-Zemor paradigm over SL_n(F_p). The scheme's security relies on the hardness of navigating expander graphs over SL_n(F_p), which is believed to be resistant to quantum adversaries. The work includes empirical security evidence, integration within the SPHINCS+ framework, security analysis, parameter selection, and performance evaluation demonstrating practical feasibility.

QRS: A Rule-Synthesizing Neuro-Symbolic Triad for Autonomous Vulnerability Discovery

QRS (Query, Review, Sanitize) is a neuro-symbolic framework that uses three autonomous agents with Large Language Models to generate CodeQL queries, validate findings through semantic reasoning, and perform automated exploit synthesis for vulnerability discovery. Unlike traditional SAST tools that rely on expert-crafted queries and predefined patterns, QRS autonomously discovers vulnerability classes beyond known patterns while reducing false positives. In testing on PyPI packages, QRS achieved 90.6% detection accuracy on 20 historical CVEs and identified 39 medium-to-high-severity vulnerabilities in the top 100 most-downloaded packages, with 5 assigned new CVEs.

LLM-FS: Zero-Shot Feature Selection for Effective and Interpretable Malware Detection

researchsecurity

This research investigates using large language models (LLMs) for zero-shot feature selection in malware detection as an alternative to traditional statistical methods. The study evaluates multiple LLMs (GPT-5.0, GPT-4.0, Gemini-2.5) on the EMBOD dataset against conventional feature selection methods across various classifiers. Results show that LLM-guided zero-shot feature selection achieves competitive performance with traditional methods while providing enhanced interpretability, stability, and reduced dependence on labeled data.

Stop Testing Attacks, Start Diagnosing Defenses: The Four-Checkpoint Framework Reveals Where LLM Safety Breaks

securitysafetyresearch

This research introduces the Four-Checkpoint Framework to analyze where LLM safety mechanisms fail by organizing defenses along processing stage (input vs. output) and detection level (literal vs. intent). Testing GPT-5, Claude Sonnet 4, and Gemini 2.5 Pro with 13 targeted evasion techniques across 3,312 test cases reveals that output-stage defenses (CP3, CP4) are weakest at 72-79% Weighted Attack Success Rate (WASR), while input-literal defenses (CP1) are strongest at 13% WASR. The study finds that traditional Binary ASR underestimates vulnerabilities (22.6%) compared to WASR (52.7%), showing 2.3× higher actual vulnerability rates.

AGMark: Attention-Guided Dynamic Watermarking for Large Vision-Language Models

researchsecurity

This paper introduces AGMark (Attention-Guided Dynamic Watermarking), a novel watermarking framework for Large Vision-Language Models (LVLMs) that addresses limitations in existing approaches. AGMark dynamically identifies semantic-critical tokens at each decoding step using attention weights and context-aware coherence cues, while determining the proportion of protected tokens through uncertainty awareness and evidence calibration. The framework achieves at least 99.36% detection accuracy (AUC) and maintains robust attack resilience (at least 88.61% AUC) while preserving visual semantic fidelity and generation quality.

A Behavioral Fingerprint for Large Language Models: Provenance Tracking via Refusal Vectors

This paper introduces a novel fingerprinting framework for protecting the intellectual property of large language models by using "refusal vectors" - behavioral patterns extracted from a model's internal representations when processing harmful versus harmless prompts. The method demonstrates 100% accuracy in identifying base model families across 76 offspring models and proves robust against common modifications like finetuning, merging, and quantization. The authors propose a theoretical framework using locality-sensitive hashing and zero-knowledge proofs to transform private fingerprints into publicly verifiable, privacy-preserving artifacts.

Autonomous Action Runtime Management(AARM):A System Specification for Securing AI-Driven Actions at Runtime

This paper introduces Autonomous Action Runtime Management (AARM), an open specification for securing AI-driven actions at runtime as AI systems evolve from passive assistants to autonomous agents capable of executing consequential actions. AARM defines a runtime security system that intercepts actions before execution, evaluates them against policy and intent alignment, enforces authorization decisions, and records tamper-evident receipts, addressing threats like prompt injection, confused deputy attacks, data exfiltration, and intent drift. The specification proposes four implementation architectures and aims to establish industry-wide security requirements for AI agent systems before proprietary fragmentation occurs.

Understanding and Enhancing Encoder-based Adversarial Transferability against Large Vision-Language Models

This research systematically studies adversarial transferability of encoder-based attacks against large vision-language models (LVLMs), revealing that existing attacks have severely limited transferability across different LVLM architectures. The study identifies two root causes hindering transferability: inconsistent visual grounding across models and redundant semantic alignment within models. To address these limitations, the authors propose Semantic-Guided Multimodal Attack (SGMA), a framework that achieves higher transferability by directing perturbations toward semantically critical regions and disrupting cross-modal grounding at both global and local levels.

LLMAC: A Global and Explainable Access Control Framework with Large Language Model

LLMAC is a new access control framework that uses Large Language Models to unify traditional access control methods (RBAC, ABAC, DAC) into a single comprehensive system. Using Mistral 7B trained on synthetic datasets representing complex real-world scenarios, the system achieved 98.5% accuracy, significantly outperforming traditional methods (RBAC: 14.5%, ABAC: 58.5%, DAC: 27.5%) while providing human-readable explanations for decisions.

Timing and Memory Telemetry on GPUs for AI Governance

securitypolicyresearch

This research paper introduces a measurement framework for monitoring GPU utilization in untrusted environments to support AI governance. The framework uses four complementary primitives based on timing and memory characteristics—Proof-of-Work-inspired mechanisms, Verifiable Delay Functions, GEMM-based tensor-core measurements, and VRAM-residency tests—to detect GPU compute activity even without trusted firmware or vendor-controlled counters. The approach aims to provide compute-based telemetry that can help detect unauthorized repurposing of GPUs for model training or policy violations.

Benchmarking Knowledge-Extraction Attack and Defense on Retrieval-Augmented Generation

securityprivacyresearch