粗重重量(K美元-最近邻居) (Distributionally Robust Weighted $k$-Nearest Neighbors)

Learning a robust classifier from a few samples remains a key challenge in machine learning. A major thrust of research has been focused on developing $k$-nearest neighbor ($k$-NN) based algorithms combined with metric learning that captures similarities between samples. When the samples are limited, robustness is especially crucial to ensure the generalization capability of the classifier. In this paper, we study a minimax distributionally robust formulation of weighted $k$-nearest neighbors, which aims to find the optimal weighted $k$-NN classifiers that hedge against feature uncertainties. We develop an algorithm, \texttt{Dr.k-NN}, that efficiently solves this functional optimization problem and features in assigning minimax optimal weights to training samples when performing classification. These weights are class-dependent, and are determined by the similarities of sample features under the least favorable scenarios. When the size of the uncertainty set is properly tuned, the robust classifier has a smaller Lipschitz norm than the vanilla $k$-NN, and thus improves the generalization capability. We also couple our framework with neural-network-based feature embedding. We demonstrate the competitive performance of our algorithm compared to the state-of-the-art in the few-training-sample setting with various real-data experiments.

翻译：从几个样本中学习强健的分类器仍然是机器学习的关键挑战。一项主要的研究重点是开发以美元为最近邻居(k$-NN)为基础的算法,同时结合收集样本之间相似之处的量度学习。当样本有限时, 稳健性对于确保分类员的普及能力特别重要。在本文中, 我们研究一个微量分配强度的加权美元- 最近邻居配制, 目的是找到最优的加权基值( $- NN) 标准, 避免特征不确定性。我们开发了一种算法,\ textt{Dr.k- NNN}, 有效地解决了功能优化问题和在进行分类时为培训样本分配微量最佳重量的特征。这些重量是等级依赖的, 由最不受欢迎的情景下的样本特征的相似性决定。当不确定性的大小得到适当调整时, 稳健健健的分类器比范拉美元- 美元- NNN值标准要小, 从而改进了通用能力。我们还将我们的框架与基于神经- 网络的模型的测试与各种特征的对比性模型的状态进行演示。