Multilayer Perceptrons struggle to learn certain simple arithmetic tasks. Specialist neural modules for arithmetic can outperform classical architectures with gains in extrapolation, interpretability and convergence speeds, but are highly sensitive to the training range. In this paper, we show that Neural Multiplication Units (NMUs) are unable to reliably learn tasks as simple as multiplying two inputs when given different training ranges. Causes of failure are linked to inductive and input biases which encourage convergence to solutions in undesirable optima. A solution, the stochastic NMU (sNMU), is proposed to apply reversible stochasticity, encouraging avoidance of such optima whilst converging to the true solution. Empirically, we show that stochasticity provides improved robustness with the potential to improve learned representations of upstream networks for numerical and image tasks.
翻译:多层倍受体在努力学习某些简单的算术任务。 用于算术的专家神经模块可以通过外推、可解释性和趋同速度的增益,优于古典结构,但对于培训范围非常敏感。 在本文中,我们显示神经倍增单元(NMUs)无法可靠地学习简单的任务,比如在不同的培训范围中将两种投入乘以倍增。 失败的原因与诱导和输入偏差相关,这鼓励了在不理想的opima中与解决方案的趋同。 一种解决方案,即随机NMU(sNMU),建议应用可逆的随机性,鼓励避免这种选择性,同时与真正的解决方案相融合。 偶然的是,我们表明,随机性提供了更强的强健性,有可能改善数字和图像任务的上游网络的学习表现。