K-nearest neighbors (KNN) is one of the earliest and most established algorithms in machine learning. For regression tasks, KNN averages the targets within a neighborhood which poses a number of challenges: the neighborhood definition is crucial for the predictive performance as neighbors might be selected based on uninformative features, and averaging does not account for how the function changes locally. We propose a novel method called Differential Nearest Neighbors Regression (DNNR) that addresses both issues simultaneously: during training, DNNR estimates local gradients to scale the features; during inference, it performs an n-th order Taylor approximation using estimated gradients. In a large-scale evaluation on over 250 datasets, we find that DNNR performs comparably to state-of-the-art gradient boosting methods and MLPs while maintaining the simplicity and transparency of KNN. This allows us to derive theoretical error bounds and inspect failures. In times that call for transparency of ML models, DNNR provides a good balance between performance and interpretability.
翻译:K- 近邻( KNNN) 是机器学习中最早和最成熟的算法之一。 对于回归任务, KNN 将目标平均在一个社区内部,这带来了一些挑战: 邻里定义对于预测性表现至关重要, 因为邻里可能根据不提供信息的特点选择, 平均并不说明函数如何本地变化 。 我们提出了一个名为“ 差异近邻回归” ( DNNNR) 的新颖方法, 它同时处理这两个问题: 在培训期间, DNNNR 估计本地梯度以缩放功能; 在推断期间, 它使用估计的梯度来进行n- 顺序 Taylor 近似 。 在对250多个数据集进行大规模评估时, 我们发现 DNNR 在保持 KNN 的简单性和透明度的同时, 与最先进的梯度加速法和 MLPs 比较。 这让我们可以得出理论错误的界限并检查失败。 在要求 ML 模型透明度的时候, DNNR 提供了业绩和可解释性之间的良好平衡 。