自回归大语言模型的相关维数 (Correlation Dimension of Auto-Regressive Large Language Models)

Large language models (LLMs) have achieved remarkable progress in natural language generation, yet they continue to display puzzling behaviors -- such as repetition and incoherence -- even when exhibiting low perplexity. This highlights a key limitation of conventional evaluation metrics, which emphasize local prediction accuracy while overlooking long-range structural complexity. We introduce correlation dimension, a fractal-geometric measure of self-similarity, to quantify the epistemological complexity of text as perceived by a language model. This measure captures the hierarchical recurrence structure of language, bridging local and global properties in a unified framework. Through extensive experiments, we show that correlation dimension (1) reveals three distinct phases during pretraining, (2) reflects context-dependent complexity, (3) indicates a model's tendency toward hallucination, and (4) reliably detects multiple forms of degeneration in generated text. The method is computationally efficient, robust to model quantization (down to 4-bit precision), broadly applicable across autoregressive architectures (e.g., Transformer and Mamba), and provides fresh insight into the generative dynamics of LLMs.

翻译：大语言模型（LLMs）在自然语言生成方面取得了显著进展，但即使困惑度较低时，它们仍表现出令人费解的行为——例如重复和不连贯。这突显了传统评估指标的一个关键局限，即强调局部预测准确性而忽视了长程结构复杂性。我们引入相关维数——一种分形几何的自相似性度量——来量化语言模型感知的文本认识论复杂性。该度量捕捉了语言的层次递归结构，在统一框架中桥接了局部与全局特性。通过大量实验，我们证明相关维数能够：（1）揭示预训练过程中三个不同的阶段；（2）反映上下文相关的复杂性；（3）指示模型产生幻觉的倾向；（4）可靠地检测生成文本中多种形式的退化。该方法计算高效，对模型量化（低至4位精度）具有鲁棒性，广泛适用于各类自回归架构（例如Transformer和Mamba），并为理解大语言模型的生成动力学提供了新的视角。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日