We propose a routing algorithm that takes a sequence of vectors and computes a new sequence with specified length and vector size. Each output vector maximizes "bang per bit," the difference between a net benefit to use and net cost to ignore data, by better predicting the input vectors. We describe output vectors as geometric objects, as latent variables that assign credit, as query states in a model of associative memory, and as agents in a model of a Society of Mind. We implement the algorithm with optimizations that reduce parameter count, computation, and memory use by orders of magnitude, enabling us to route sequences of greater length than previously possible. We evaluate our implementation on natural language and visual classification tasks, obtaining competitive or state-of-the-art accuracy and end-to-end credit assignments that are interpretable.
翻译:我们建议一种路径算法, 采用矢量序列, 并计算出一个具有特定长度和矢量大小的新序列。 每个输出矢量会最大化“ 每比位跳 ”, 使用的净效益和忽略数据的净成本之间的差别, 更好地预测输入矢量。 我们将输出矢量描述为几何对象、 分配信用的潜在变量、 联系内存模型中的查询状态, 以及思维社会模型中的代理。 我们采用优化的算法, 减少参数的计算、 计算和记忆的使用, 从而让我们能够选择比以往可能长的路线序列 。 我们评估自然语言和视觉分类任务的执行情况, 获取可解释的具有竞争力或最新水平的准确性和端到端到端信用任务 。