Data-driven neighborhood definitions and graph constructions are often used in machine learning and signal processing applications. k-nearest neighbor~(kNN) and $\epsilon$-neighborhood methods are among the most common methods used for neighborhood selection, due to their computational simplicity. However, the choice of parameters associated with these methods, such as k and $\epsilon$, is still ad hoc. We make two main contributions in this paper. First, we present an alternative view of neighborhood selection, where we show that neighborhood construction is equivalent to a sparse signal approximation problem. Second, we propose an algorithm, non-negative kernel regression~(NNK), for obtaining neighborhoods that lead to better sparse representation. NNK draws similarities to the orthogonal matching pursuit approach to signal representation and possesses desirable geometric and theoretical properties. Experiments demonstrate (i) the robustness of the NNK algorithm for neighborhood and graph construction, (ii) its ability to adapt the number of neighbors to the data properties, and (iii) its superior performance in local neighborhood and graph-based machine learning tasks.
翻译:数据驱动的邻域定义和图构造常被机器学习和信号处理应用所采用。k最近邻和ε - 邻域方法是最常用于邻域选择的方法,因为它们的计算简单。然而,这些方法关联的参数选择(例如k和ε)仍然是凭经验的。本文作出两个主要贡献。首先,我们提出了邻域选择的一个替代视角,展示邻域构建等效于一个稀疏信号逼近问题。其次,我们提出了一种算法,即非负核回归(NNK),用于获取邻居,以实现更好的稀疏表示。NNK类似于用于信号表示的正交匹配追踪方法,具有良好的几何和理论特性。实验验证了NNK算法的鲁棒性,以及其在本地实现邻域和基于图的机器学习任务中提高性能的能力。