We introduce a differentiable random access memory module with $O(1)$ performance regardless of size, scaling to billions of entries. The design stores entries on points of a chosen lattice to calculate nearest neighbours of arbitrary points efficiently by exploiting symmetries. Augmenting a standard neural network architecture with a single memory layer based on this, we can scale the parameter count up to memory limits with negligible computational overhead, giving better accuracy at similar cost. On large language modelling tasks, these enhanced models with larger capacity significantly outperform the unmodified transformer baseline. We found continued scaling with memory size up to the limits tested.
翻译:我们引入了一个不同的随机存取内存模块, 其性能为1美元, 不论大小, 缩放至数十亿个条目。 在所选的宽边点上设计存储条目, 以便通过利用对称来有效计算任意点的近邻。 增强一个标准神经网络结构, 并以此为基础, 以单一的内存层为基础, 我们可以以可忽略的计算间接费用来将参数计算到内存极限, 从而以类似成本提高准确性。 在大型语言建模任务中, 这些容量较大的模型大大超过未修改的变压器基线。 我们发现, 内存大小持续到测试的极限 。