In this paper, we empirically analyze a simple, non-learnable, and nonparametric Nadaraya-Watson (NW) prediction head that can be used with any neural network architecture. In the NW head, the prediction is a weighted average of labels from a support set. The weights are computed from distances between the query feature and support features. This is in contrast to the dominant approach of using a learnable classification head (e.g., a fully-connected layer) on the features, which can be challenging to interpret and can yield poorly calibrated predictions. Our empirical results on an array of computer vision tasks demonstrate that the NW head can yield better calibration with comparable accuracy compared to its parametric counterpart, particularly in data-limited settings. To further increase inference-time efficiency, we propose a simple approach that involves a clustering step run on the training set to create a relatively small distilled support set. Furthermore, we explore two means of interpretability/explainability that fall naturally from the NW head. The first is the label weights, and the second is our novel concept of the ``support influence function,'' which is an easy-to-compute metric that quantifies the influence of a support element on the prediction for a given query. As we demonstrate in our experiments, the influence function can allow the user to debug a trained model. We believe that the NW head is a flexible, interpretable, and highly useful building block that can be used in a range of applications.
翻译:在本文中, 我们从经验上分析了一个简单、 不可读、 且非对称的 Nadaraya- Watson (NW) 预测头, 可用于任何神经网络结构。 在 NW 头部, 预测是一组支持的标签的加权平均值。 权重是从查询特性与支持特性之间的距离计算的。 这与使用一个可学习分类头( 例如, 完全连接的层) 的主要方法形成对照, 这可能会对解读具有挑战性, 并可能产生错误的预测。 我们在一系列计算机愿景任务上的经验结果显示, NW 头的应用可以产生比对准对应的精确度更好的校准, 特别是在数据限制的设置中。 为了进一步提高推论效率, 我们提出了一个简单的方法, 在一个训练组合中运行一个小的集合步骤, 以创建一个较小的精锐的支持组合。 此外, 我们探索两种有用的解释/ 可理解性方法, 可以自然地从 NW 头部掉。 第一个是标签的重量, 第二个是我们的新建筑, 相对于参数的精确的校准的校准精确度, 影响功能是用来解释一个我们判断的精确的校准的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正。