通过双阶段权重保护防御未经授权的模型融合 (Defending Unauthorized Model Merging via Dual-Stage Weight Protection)

The rapid proliferation of pretrained models and open repositories has made model merging a convenient yet risky practice, allowing free-riders to combine fine-tuned models into a new multi-capability model without authorization. Such unauthorized model merging not only violates intellectual property rights but also undermines model ownership and accountability. To address this issue, we present MergeGuard, a proactive dual-stage weight protection framework that disrupts merging compatibility while maintaining task fidelity. In the first stage, we redistribute task-relevant information across layers via L2-regularized optimization, ensuring that important gradients are evenly dispersed. In the second stage, we inject structured perturbations to misalign task subspaces, breaking curvature compatibility in the loss landscape. Together, these stages reshape the model's parameter geometry such that merged models collapse into destructive interference while the protected model remains fully functional. Extensive experiments on both vision (ViT-L-14) and language (Llama2, Gemma2, Mistral) models demonstrate that MergeGuard reduces merged model accuracy by up to 90% with less than 1.5% performance loss on the protected model.

翻译：预训练模型和开放存储库的快速扩散使得模型融合成为一种便捷但存在风险的做法，允许搭便车者未经授权地将微调模型组合成新的多能力模型。此类未经授权的模型融合不仅侵犯知识产权，还损害模型所有权和问责机制。为解决这一问题，我们提出了MergeGuard，一种主动式的双阶段权重保护框架，在保持任务保真度的同时破坏融合兼容性。在第一阶段，我们通过L2正则化优化重新分配各层间的任务相关信息，确保重要梯度均匀分散。在第二阶段，我们注入结构化扰动以错位任务子空间，破坏损失函数曲面中的曲率兼容性。这两个阶段共同重塑模型的参数几何结构，使得融合模型陷入破坏性干扰，而受保护模型仍保持完整功能。在视觉模型（ViT-L-14）和语言模型（Llama2、Gemma2、Mistral）上的大量实验表明，MergeGuard能将融合模型准确率降低高达90%，而受保护模型的性能损失小于1.5%。