In this paper, we empirically analyze a simple, non-learnable, and nonparametric Nadaraya-Watson (NW) prediction head that can be used with any neural network architecture. In the NW head, the prediction is a weighted average of labels from a support set. The weights are computed from distances between the query feature and support features. This is in contrast to the dominant approach of using a learnable classification head (e.g., a fully-connected layer) on the features, which can be challenging to interpret and can yield poorly calibrated predictions. Our empirical results on an array of computer vision tasks demonstrate that the NW head can yield better calibration than its parametric counterpart, while having comparable accuracy and with minimal computational overhead. To further increase inference-time efficiency, we propose a simple approach that involves a clustering step run on the training set to create a relatively small distilled support set. In addition to using the weights as a means of interpreting model predictions, we further present an easy-to-compute "support influence function," which quantifies the influence of a support element on the prediction for a given query. As we demonstrate in our experiments, the influence function can allow the user to debug a trained model. We believe that the NW head is a flexible, interpretable, and highly useful building block that can be used in a range of applications.
翻译:在本文中,我们从经验上分析了一个简单、不可忽略和不可对称的Nadaraya-Watson(NW)预测头,可用于任何神经网络结构。在NW头部,预测是一组支持标签的加权平均值。权重是从查询特征与支持特征之间的距离计算的。这与使用可学习分类头(例如,完全连接的层)的主导方法形成对比,后者可能对解释模型预测具有挑战性并可得出错误的预测。我们在一系列计算机愿景任务上的经验结果表明,NW头可以比对应的参数产生更好的校准,同时具有可比的准确性和最小的计算间接数据。为了进一步提高推断时间效率,我们建议一种简单的方法,即在培训场上进行分组,以创建相对小的精细的精选支持组合。除了将权重作为解释模型预测的一种有用手段外,我们还可以提出一个简单易算的“支持影响功能”,用以量化对准一个支持要素的影响力,而我们所培训的用户的判断力是,我们所应用的一种高度的判断。