As AI-generated content increasingly underpins real-world applications, its accompanying security risks, including privacy leakage and copyright infringement, have become growing concerns. In this context, Federated Learning (FL) offers a promising foundation for enhancing trustworthiness by enabling privacy-preserving collaborative learning over proprietary data. However, its practical adoption is critically threatened by backdoor-based model manipulation, where a small number of malicious clients can compromise the system and induce harmful content generation and decision-making. Although various detection methods have been proposed to detect such manipulation, we reveal that they are either disrupted by non-i.i.d. data distributions and random client participation, or misled by out-of-distribution (OOD) prediction bias, both of which are unique challenges in FL scenarios. To address these issues, we introduce a novel proactive detection method dubbed Coward, inspired by our discovery of multi-backdoor collision effects, in which consecutively planted, distinct backdoors significantly suppress earlier ones. Correspondingly, we modify the federated global model by injecting a carefully designed backdoor-collided watermark, implemented via regulated dual-mapping learning on OOD data. This design not only enables an inverted detection paradigm compared to existing proactive methods, thereby naturally counteracting the adverse impact of OOD prediction bias, but also introduces a low-disruptive training intervention that inherently limits the strength of OOD bias, leading to significantly fewer misjudgments. Extensive experiments on benchmark datasets show that Coward achieves state-of-the-art detection performance, effectively alleviates OOD prediction bias, and remains robust against potential adaptive manipulations.
翻译:随着人工智能生成内容日益支撑现实世界应用,其伴随的安全风险(包括隐私泄露和版权侵犯)已成为日益关注的问题。在此背景下,联邦学习(FL)通过支持基于专有数据的隐私保护协同学习,为增强可信度提供了有前景的基础。然而,其实践应用正受到基于后门的模型操纵的严重威胁——少数恶意客户端可破坏系统并诱发有害内容生成与决策。尽管已有多种检测方法被提出以识别此类操纵,我们发现这些方法要么被非独立同分布数据分布和随机客户端参与所干扰,要么被分布外(OOD)预测偏差所误导,这两者均是联邦学习场景特有的挑战。为解决这些问题,我们提出了一种名为Coward的新型主动检测方法,其灵感来源于我们对多后门碰撞效应的发现:连续植入的不同后门会显著抑制早期后门。相应地,我们通过注入精心设计的后门碰撞水印来修改联邦全局模型,该水印通过对OOD数据进行规范化的双映射学习实现。这一设计不仅实现了与现有主动方法相反的检测范式,从而天然抵消OOD预测偏差的不利影响,还引入了低扰动的训练干预机制,从本质上限制了OOD偏差的强度,显著减少了误判。在基准数据集上的大量实验表明,Coward实现了最先进的检测性能,有效缓解了OOD预测偏差,并对潜在的自适应操纵保持鲁棒性。