We introduce hyperbolic attention networks to endow neural networks with enough capacity to match the complexity of data with hierarchical and power-law structure. A few recent approaches have successfully demonstrated the benefits of imposing hyperbolic geometry on the parameters of shallow networks. We extend this line of work by imposing hyperbolic geometry on the activations of neural networks. This allows us to exploit hyperbolic geometry to reason about embeddings produced by deep networks. We achieve this by re-expressing the ubiquitous mechanism of soft attention in terms of operations defined for hyperboloid and Klein models. Our method shows improvements in terms of generalization on neural machine translation, learning on graphs and visual question answering tasks while keeping the neural representations compact.
翻译:我们引入双曲式关注网络,让具有足够能力将数据的复杂性与等级和权力法结构相匹配的内线网络具有超曲式关注网络。最近的一些方法成功地展示了对浅线网络参数强制实施超曲式几何学的好处。我们通过对神经网络的激活施加超曲式几何学来扩展这一工作线。这使我们能够利用超曲式几何学来解释深线网络产生的嵌入。我们通过重新表达在为超模和克莱因模型界定的操作中普遍给予软关注的机制来实现这一目标。我们的方法显示在神经机器翻译、图表和视觉问题回答任务的一般化方面有所改进,同时保持神经描述的紧凑。