Large language models (LLMs) are limited by substantial computational cost. We introduce a "computational economics" framework that treats an LLM as an internal economy of resource-constrained agents (attention heads and neuron blocks) that must allocate scarce computation to maximize task utility. First, we show empirically that when computation is scarce, standard LLMs reallocate attention toward high-value tokens while preserving accuracy. Building on this observation, we propose an incentive-driven training paradigm that augments the task loss with a differentiable computation cost term, encouraging sparse and efficient activations. On GLUE (MNLI, STS-B, CoLA) and WikiText-103, the method yields a family of models that trace a Pareto frontier and consistently dominate post-hoc pruning; for a similar accuracy we obtain roughly a forty percent reduction in FLOPS and lower latency, together with more interpretable attention patterns. These results indicate that economic principles offer a principled route to designing efficient, adaptive, and more transparent LLMs under strict resource constraints.
翻译:大语言模型(LLMs)受限于高昂的计算成本。我们引入一个“计算经济学”框架,将LLM视为一个由资源受限智能体(注意力头与神经元块)构成的内部经济体,这些智能体必须分配稀缺的计算资源以最大化任务效用。首先,我们通过实证表明,当计算资源稀缺时,标准LLM会将注意力重新分配给高价值词元,同时保持准确性。基于这一观察,我们提出一种激励驱动的训练范式,在任务损失函数中加入可微分的计算成本项,从而鼓励稀疏且高效的激活。在GLUE(MNLI、STS-B、CoLA)和WikiText-103基准测试中,该方法生成了一系列模型,这些模型描绘出一条帕累托前沿,并持续优于事后剪枝方法;在保持相近准确率的情况下,我们实现了约百分之四十的FLOPS降低与更低的延迟,同时获得了更具可解释性的注意力模式。这些结果表明,经济学原理为在严格资源约束下设计高效、自适应且更透明的大语言模型提供了一条理论化路径。