We challenge a common assumption underlying most supervised deep learning: that a model makes a prediction depending only on its parameters and the features of a single input. To this end, we introduce a general-purpose deep learning architecture that takes as input the entire dataset instead of processing one datapoint at a time. Our approach uses self-attention to reason about relationships between datapoints explicitly, which can be seen as realizing non-parametric models using parametric attention mechanisms. However, unlike conventional non-parametric models, we let the model learn end-to-end from the data how to make use of other datapoints for prediction. Empirically, our models solve cross-datapoint lookup and complex reasoning tasks unsolvable by traditional deep learning models. We show highly competitive results on tabular data, early results on CIFAR-10, and give insight into how the model makes use of the interactions between points.
翻译:我们向一个共同的假设挑战,这个共同的假设是基于最受监督的深层次学习:模型仅根据其参数和单一输入的特征作出预测。为此,我们引入了一个通用的深层次学习结构,将整个数据集作为输入,而不是一次处理一个数据点。我们的方法是自觉地思考数据点之间的关系,这可以被视为是利用参数关注机制实现非参数模型。然而,与传统的非参数模型不同,我们让模型从数据中学习如何利用其他数据点进行预测的端对端。 简而言之,我们的模型解决了交叉数据点搜索和复杂的推理任务,而传统的深层次学习模式是无法解决的。我们展示了表格数据方面的高度竞争性结果、CIFAR-10的早期结果,并深入了解模型如何利用各点之间的相互作用。