Large language models are exposed to risks of extraction, distillation, and unauthorized fine-tuning. Existing defenses use watermarking or monitoring, but these act after leakage. We design AlignDP, a hybrid privacy lock that blocks knowledge transfer at the data interface. The key idea is to separate rare and non-rare fields. Rare fields are shielded by PAC indistinguishability, giving effective zero-epsilon local DP. Non-rare fields are privatized with RAPPOR, giving unbiased frequency estimates under local DP. A global aggregator enforces composition and budget. This two-tier design hides rare events and adds controlled noise to frequent events. We prove limits of PAC extension to global aggregation, give bounds for RAPPOR estimates, and analyze utility trade-off. A toy simulation confirms feasibility: rare categories remain hidden, frequent categories are recovered with small error.
翻译:大语言模型面临提取、蒸馏及未授权微调等风险。现有防御方案多采用水印或监控技术,但均属泄漏后补救措施。本文设计AlignDP——一种在数据接口阻断知识传递的混合隐私锁。其核心思想在于区分稀有字段与非稀有字段:稀有字段通过PAC不可区分性进行屏蔽,实现有效的零ε局部差分隐私;非稀有字段采用RAPPOR进行隐私化处理,在局部差分隐私下获得无偏频率估计。全局聚合器负责执行组合约束与隐私预算分配。这种双层设计既能隐藏稀有事件,又能对频繁事件添加受控噪声。我们证明了PAC机制扩展至全局聚合的理论局限,给出了RAPPOR估计的误差边界,并系统分析了效用权衡关系。通过玩具仿真验证了方案的可行性:稀有类别保持隐蔽状态,频繁类别能以较小误差实现重构。