A Behavioral Fingerprint for Large Language Models: Provenance Tracking via Refusal Vectors
securityresearch
Source: Arxiv (cs.CR + cs.AI)February 10, 2026Summary
This paper introduces a novel fingerprinting framework for protecting the intellectual property of large language models by using "refusal vectors" - behavioral patterns extracted from a model's internal representations when processing harmful versus harmless prompts. The method demonstrates 100% accuracy in identifying base model families across 76 offspring models and proves robust against common modifications like finetuning, merging, and quantization. The authors propose a theoretical framework using locality-sensitive hashing and zero-knowledge proofs to transform private fingerprints into publicly verifiable, privacy-preserving artifacts.
Original source: https://arxiv.org/abs/2602.09434v1
First tracked: February 11, 2026 at 06:00 PM