CAPID: Context-Aware PII Detection for Question-Answering Systems

privacyresearch

Source: Arxiv (cs.CR + cs.AI)February 10, 2026

Summary

This paper proposes CAPID, a context-aware PII detection system for question-answering platforms that addresses the limitation of current approaches which redact all PII regardless of contextual relevance. The approach fine-tunes a locally owned small language model (SLM) to detect PII spans, classify their types, and determine contextual relevance before data is passed to LLMs, avoiding privacy concerns with closed-source models. A synthetic data generation pipeline using LLMs is introduced to create training data that captures context-dependent PII relevance across multiple domains.

Original source: https://arxiv.org/abs/2602.10074v1

First tracked: February 11, 2026 at 06:00 PM