Learning a robust classifier from a few samples remains a key challenge in machine learning. A major thrust of research has been focused on developing $k$-nearest neighbor ($k$-NN) based algorithms combined with metric learning that captures similarities between samples. When the samples are limited, robustness is especially crucial to ensure the generalization capability of the classifier. In this paper, we study a minimax distributionally robust formulation of weighted $k$-nearest neighbors, which aims to find the optimal weighted $k$-NN classifiers that hedge against feature perturbations. We develop an algorithm, Dr.k-NN, that efficiently solves this functional optimization problem and features in assigning minimax optimal weights to training samples when performing classification. These weights are class-dependent, and are determined by the similarities of sample features under the least favorable scenarios. We also couple our framework with neural-network-based feature embedding. We demonstrate the competitive performance of our algorithm compared to the state-of-the-art in the few-training-sample setting with various real-data experiments.
翻译:从几个样本中学习强健的分类器仍然是机器学习的关键挑战。 一项主要的研究重点是开发以美元为最近邻居(k$-NN)为基础的算法,同时结合收集样本之间相似之处的量度学习。 当样本有限时, 稳健性对于确保分类器的普及能力尤为重要。 在本文中, 我们研究一种微量分配强度的加权美元-最近邻居配制, 目的是找到顶尖的加权 $k$- NNN 分类器, 以防范地貌突扰。 我们开发了一种算法, 即 Dr.k-NNN, 有效地解决了功能优化问题, 以及在进行分类时为培训样本分配微量最佳重量的特性。 这些重量是班级依附的, 由抽样特征的相似性在最不可取的情景下决定。 我们还将我们的框架与基于神经网络的嵌入特征结合起来。 我们用各种真实数据实验来展示我们的算法的竞争性性表现, 与在少数培训样本设置中的状态相比。