The rapid growth of Large Language Models (LLMs) has highlighted the pressing need for reliable mechanisms to verify content ownership and ensure traceability. Watermarking offers a promising path forward, but it remains limited by privacy concerns in sensitive scenarios, as traditional approaches often require direct access to a model's parameters or its training data. In this work, we propose a secure multi-party computation (MPC)-based private LLMs watermarking framework, PRIVMARK, to address the concerns. Concretely, we investigate PostMark (EMNLP'2024), one of the state-of-the-art LLMs Watermarking methods, and formulate its basic operations. Then, we construct efficient protocols for these operations using the MPC primitives in a black-box manner. In this way, PRIVMARK enables multiple parties to collaboratively watermark an LLM's output without exposing the model's weights to any single computing party. We implement PRIVMARK using SecretFlow-SPU (USENIX ATC'2023) and evaluate its performance using the ABY3 (CCS'2018) backend. The experimental results show that PRIVMARK achieves semantically identical results compared to the plaintext baseline without MPC and is resistant against paraphrasing and removing attacks with reasonable efficiency.
翻译:暂无翻译