PK-Free, Blind and Collusion-Resistant Synthetic Tabular Fingerprinting With Diffusion Models
Summary
This research paper addresses security risks from synthetic tabular data (AI-generated fake datasets) by proposing PBC-TabFip, a fingerprinting framework that embeds hidden identifiers into synthetic data to detect unauthorized copying and identify who leaked it. The framework uses diffusion models (AI systems that generate data by gradually refining random noise) and Tardos codes (a mathematical scheme for tracking which user leaked protected content) to protect synthetic tables even when primary keys (unique identifiers for database rows) are missing or altered, and to resist collusion attacks (when multiple users combine their copies to remove the fingerprint).
Solution / Mitigation
The source proposes 'PBC-TabFip' as the solution: a framework that 'readily incorporates with symmetric Tardos codes of arbitrary alphabet sizes' to enable fingerprinting of synthetic tabular data generated by diffusion models. The paper also proposes specific schemes including 'binary TabFip and TabFip+, quaternary TabFip* and TabFip+*' that use 'Bit Matching (BM) and Valid Bit Matching (VBM) mechanisms' to identify malicious users. According to the authors, 'TabFip with Tardos codes identifies at least one of the colluders with 100% probability and without detecting innocent against two types of collusion attack.'
Classification
Related Issues
Original source: http://ieeexplore.ieee.org/document/11421019
First tracked: May 14, 2026 at 08:01 PM
Classified by LLM (prompt v3) · confidence: 85%