ZipLLM：通过模型感知的协同数据去重与压缩实现高效大语言模型存储 (ZipLLM: Efficient LLM Storage via Model-Aware Synergistic Data Deduplication and Compression)

Modern model hubs, such as Hugging Face, store tens of petabytes of LLMs, with fine-tuned variants vastly outnumbering base models and dominating storage consumption. Existing storage reduction techniques -- such as deduplication and compression -- are either LLM-oblivious or not compatible with each other, limiting data reduction effectiveness. Our large-scale characterization study across all publicly available Hugging Face LLM repositories reveals several key insights: (1) fine-tuned models within the same family exhibit highly structured, sparse parameter differences suitable for delta compression; (2) bitwise similarity enables LLM family clustering; and (3) tensor-level deduplication is better aligned with model storage workloads, achieving high data reduction with low metadata overhead. Building on these insights, we design BitX, an effective, fast, lossless delta compression algorithm that compresses XORed difference between fine-tuned and base LLMs. We build ZipLLM, a model storage reduction pipeline that unifies tensor-level deduplication and lossless BitX compression. By synergizing deduplication and compression around LLM family clustering, ZipLLM reduces model storage consumption by 54%, over 20% higher than state-of-the-art deduplication and compression approaches.

翻译：现代模型中心（如Hugging Face）存储着数十PB规模的大语言模型，其中微调变体的数量远超基础模型并主导存储消耗。现有存储缩减技术——如去重与压缩——要么未针对大语言模型特性设计，要么彼此不兼容，限制了数据缩减效能。我们对所有公开可用的Hugging Face大语言模型仓库进行大规模特征分析，揭示了若干关键发现：（1）同一模型族内的微调模型呈现高度结构化、稀疏的参数差异，适合采用增量压缩；（2）比特级相似性支持大语言模型族聚类；（3）张量级去重与模型存储工作负载更匹配，能以低元数据开销实现高数据缩减率。基于这些发现，我们设计了BitX算法——一种高效、快速、无损的增量压缩算法，可压缩微调模型与基础大语言模型间的异或差值。我们构建了ZipLLM模型存储缩减流水线，统一整合张量级去重与无损BitX压缩。通过围绕大语言模型族聚类协同去重与压缩技术，ZipLLM将模型存储消耗降低54%，较现有最优去重与压缩方案提升超过20%。