Modular exponentiation is crucial to number theory and cryptography, yet remains largely unexplored from a mechanistic interpretability standpoint. We train a 4-layer encoder-decoder Transformer model to perform this operation and investigate the emergence of numerical reasoning during training. Utilizing principled sampling strategies, PCA-based embedding analysis, and activation patching, we examine how number-theoretic properties are encoded within the model. We find that reciprocal operand training leads to strong performance gains, with sudden generalization across related moduli. These synchronized accuracy surges reflect grokking-like dynamics, suggesting the model internalizes shared arithmetic structure. We also find a subgraph consisting entirely of attention heads in the final layer sufficient to achieve full performance on the task of regular exponentiation. These results suggest that transformer models learn modular arithmetic through specialized computational circuits, paving the way for more interpretable and efficient neural approaches to modular exponentiation.
翻译:模幂运算在数论和密码学中至关重要,但从机制可解释性角度仍鲜有研究。我们训练了一个4层编码器-解码器Transformer模型来执行该运算,并探究训练过程中数值推理能力的涌现机制。通过采用基于原则的采样策略、基于PCA的嵌入分析和激活修补技术,我们研究了数论特性在模型中的编码方式。研究发现,采用倒数操作数训练能带来显著的性能提升,并在相关模数间出现突发性泛化现象。这些同步的准确率跃升反映了类顿悟动态,表明模型内化了共享的算术结构。我们还发现,仅由最终层注意力头构成的子图就足以在常规幂运算任务上实现完整性能。这些结果表明Transformer模型通过特化的计算电路学习模运算,为开发更具可解释性和高效性的模幂运算神经方法奠定了基础。