A Behavioral Fingerprint for Large Language Models: Provenance Tracking via Refusal Vectors

securityresearch

Source: Arxiv (cs.CR + cs.AI)February 10, 2026

Summary

This paper introduces a novel fingerprinting framework for protecting the intellectual property of large language models by using "refusal vectors" - behavioral patterns extracted from a model's internal representations when processing harmful versus harmless prompts. The method demonstrates 100% accuracy in identifying base model families across 76 offspring models and proves robust against common modifications like finetuning, merging, and quantization. The authors propose a theoretical framework using locality-sensitive hashing and zero-knowledge proofs to transform private fingerprints into publicly verifiable, privacy-preserving artifacts.

Original source: https://arxiv.org/abs/2602.09434v1

First tracked: February 11, 2026 at 06:00 PM