Understanding how learning algorithms shape the computational strategies that emerge in neural networks remains a fundamental challenge in machine intelligence. While network architectures receive extensive attention, the role of the learning paradigm itself in determining emergent dynamics remains largely unexplored. Here we demonstrate that reinforcement learning (RL) and supervised learning (SL) drive recurrent neural networks (RNNs) toward fundamentally different computational solutions when trained on identical decision-making tasks. Through systematic dynamical systems analysis, we reveal that RL spontaneously discovers hybrid attractor architectures, combining stable fixed-point attractors for decision maintenance with quasi-periodic attractors for flexible evidence integration. This contrasts sharply with SL, which converges almost exclusively to simpler fixed-point-only solutions. We further show that RL sculpts functionally balanced neural populations through a powerful form of implicit regularization -- a structural signature that enhances robustness and is conspicuously absent in the more heterogeneous solutions found by SL-trained networks. The prevalence of these complex dynamics in RL is controllably modulated by weight initialization and correlates strongly with performance gains, particularly as task complexity increases. Our results establish the learning algorithm as a primary determinant of emergent computation, revealing how reward-based optimization autonomously discovers sophisticated dynamical mechanisms that are less accessible to direct gradient-based optimization. These findings provide both mechanistic insights into neural computation and actionable principles for designing adaptive AI systems.
翻译:理解学习算法如何塑造神经网络中涌现的计算策略,仍然是机器智能领域的一个基本挑战。虽然网络架构受到广泛关注,但学习范式本身在决定涌现动力学中的作用在很大程度上仍未得到探索。本文证明,当在相同的决策任务上进行训练时,强化学习(RL)和监督学习(SL)会将循环神经网络(RNN)驱动至根本不同的计算解决方案。通过系统的动力系统分析,我们发现RL自发地发现了混合吸引子架构,该架构将用于决策维持的稳定不动点吸引子与用于灵活证据整合的准周期吸引子相结合。这与SL形成了鲜明对比,后者几乎完全收敛于更简单的仅包含不动点的解决方案。我们进一步表明,RL通过一种强大的隐式正则化形式塑造了功能平衡的神经群体——这是一种增强鲁棒性的结构特征,而在SL训练网络所发现的更异质的解决方案中,这一特征明显缺失。这些复杂动力学在RL中的普遍性可通过权重初始化进行可控调节,并且与性能增益(尤其是随着任务复杂性增加时)密切相关。我们的研究结果确立了学习算法作为涌现计算的主要决定因素,揭示了基于奖励的优化如何自主地发现那些基于梯度的直接优化方法较难触及的复杂动力学机制。这些发现既为神经计算提供了机制性见解,也为设计自适应人工智能系统提供了可操作的原理。