Despite progress across a broad range of applications, Transformers have limited success in systematic generalization. The situation is especially frustrating in the case of algorithmic tasks, where they often fail to find intuitive solutions that route relevant information to the right node/operation at the right time in the grid represented by Transformer columns. To facilitate the learning of useful control flow, we propose two modifications to the Transformer architecture, copy gate and geometric attention. Our novel Neural Data Router (NDR) achieves 100% length generalization accuracy on the classic compositional table lookup task, as well as near-perfect accuracy on the simple arithmetic task and a new variant of ListOps testing for generalization across computational depths. NDR's attention and gating patterns tend to be interpretable as an intuitive form of neural routing. Our code is public.
翻译:尽管在广泛的应用领域取得了进步,但变异器在系统化概括方面成功有限,在算法任务方面情况尤其令人沮丧,在算法任务方面,他们往往无法找到直观的解决办法,在变异器列所代表的电网中,在正确的时间将有关信息引导到正确的节点/合作。为了便于了解有用的控制流,我们建议对变异器结构进行两处修改,复制门和几何关注。我们的新颖的神经数据路由器(NDR)在典型的成份表外观任务上实现了100%的通用准确性,以及简单算术任务的近乎完美准确性,以及在计算深度之间通用的ListOps测试新变式。NDR的注意力和定型模式往往可以被解释为神经路由的直观形式。我们的代码是公开的。