突破专家知识局限：大语言模型的自剪枝方法 (Breaking Expert Knowledge Limits: Self-Pruning for Large Language Models)

Large language models (LLMs) have achieved remarkable performance on a wide range of tasks, hindering real-world deployment due to their massive size. Existing pruning methods (e.g., Wanda) tailored for LLMs rely heavily on manual design pruning algorithms, thereby leading to \textit{huge labor costs} and \textit{requires expert knowledge}. Furthermore, we are the first to identify the serious \textit{outlier value issue} behind dramatic performance degradation under high pruning ratios that are caused by uniform sparsity, raising an additional concern about how to design adaptive pruning sparsity ideal for LLMs. Can LLMs prune by themselves? In this work, we introduce an affirmative answer by proposing a novel pruning method called \textbf{AutoPrune}, which first overcomes expert knowledge limits by leveraging LLMs to design optimal pruning algorithms for themselves automatically without any expert knowledge. Specifically, to mitigate the black-box nature of LLMs, we propose a Graph-driven Chain-of-Thought (GCoT) to optimize prompts, significantly enhancing the reasoning process in learning the pruning algorithm and enabling us to generate pruning algorithms with superior performance and interpretability in the next generation. Finally, grounded in insights of outlier value issue, we introduce Skew-aware Dynamic Sparsity Allocation (SDSA) to overcome the outlier value issue, mitigating performance degradation under high pruning ratios. We conduct extensive experiments on mainstream LLMs benchmarks, demonstrating the superiority of AutoPrune, which consistently excels state-of-the-art competitors. The code is available at: https://anonymous.4open.science/r/AutoPrune.

翻译：大语言模型（LLMs）在广泛任务中取得了显著性能，但其庞大的规模阻碍了实际部署。现有针对LLMs的剪枝方法（如Wanda）严重依赖人工设计的剪枝算法，导致巨大的人力成本并需要专家知识。此外，我们首次揭示了在高剪枝比例下由均匀稀疏性引起的严重离群值问题，该问题导致性能急剧下降，这进一步引发了对如何设计适用于LLMs的自适应剪枝稀疏性的关注。LLMs能否自行剪枝？在本工作中，我们通过提出一种名为AutoPrune的新型剪枝方法给出了肯定答案，该方法首次通过利用LLMs自动为自身设计最优剪枝算法，无需任何专家知识，从而突破了专家知识局限。具体而言，为缓解LLMs的黑箱特性，我们提出图驱动思维链（GCoT）来优化提示，显著增强学习剪枝算法时的推理过程，使我们能够在下一代生成具有优异性能和可解释性的剪枝算法。最后，基于对离群值问题的深入理解，我们引入偏斜感知动态稀疏性分配（SDSA）以克服离群值问题，减轻高剪枝比例下的性能下降。我们在主流LLMs基准上进行了广泛实验，证明了AutoPrune的优越性，其性能持续超越最先进的竞争方法。代码发布于：https://anonymous.4open.science/r/AutoPrune。