AI Sec Watch: A Security Intelligence Platform for AI Systems

Luu, T.J.

PK-Free, Blind and Collusion-Resistant Synthetic Tabular Fingerprinting With Diffusion Models

inforesearchPeer-Reviewed

securityresearch

Source: IEEE Xplore (Security & AI Journals)March 4, 2026

Summary

This research paper addresses security risks from synthetic tabular data (AI-generated fake datasets) by proposing PBC-TabFip, a fingerprinting framework that embeds hidden identifiers into synthetic data to detect unauthorized copying and identify who leaked it. The framework uses diffusion models (AI systems that generate data by gradually refining random noise) and Tardos codes (a mathematical scheme for tracking which user leaked protected content) to protect synthetic tables even when primary keys (unique identifiers for database rows) are missing or altered, and to resist collusion attacks (when multiple users combine their copies to remove the fingerprint).

Solution / Mitigation

The source proposes 'PBC-TabFip' as the solution: a framework that 'readily incorporates with symmetric Tardos codes of arbitrary alphabet sizes' to enable fingerprinting of synthetic tabular data generated by diffusion models. The paper also proposes specific schemes including 'binary TabFip and TabFip+, quaternary TabFip* and TabFip+*' that use 'Bit Matching (BM) and Valid Bit Matching (VBM) mechanisms' to identify malicious users. According to the authors, 'TabFip with Tardos codes identifies at least one of the colluders with 100% probability and without detecting innocent against two types of collusion attack.'