k-Nearest Neighbors is one of the most fundamental but effective classification models. In this paper, we propose two families of models built on a sequence to sequence model and a memory network model to mimic the k-Nearest Neighbors model, which generate a sequence of labels, a sequence of out-of-sample feature vectors and a final label for classification, and thus they could also function as oversamplers. We also propose 'out-of-core' versions of our models which assume that only a small portion of data can be loaded into memory. Computational experiments show that our models outperform k-Nearest Neighbors, a feed-forward neural network and a memory network, due to the fact that our models must produce additional output and not just the label. As an oversample on imbalanced datasets, the sequence to sequence kNN model often outperforms Synthetic Minority Over-sampling Technique and Adaptive Synthetic Sampling.
翻译:k- Nearest 邻居是最为基本但有效的分类模型之一。 在本文中, 我们提出两个模型的组, 建在序列序列序列模型上的模型和模拟 k- Nearest 邻居模型的记忆网络模型, 产生一个标签序列, 一个标本序列, 一个标本外特质矢量序列和一个最终分类标签, 因此它们也可以发挥过度标本的作用 。 我们还提议了我们的模型的“ 核心外” 版本, 其中假设只有一小部分数据可以装入记忆中。 计算实验显示, 我们的模型超过了 k- Nearest 邻居模型, 一个饲料前神经网络和一个记忆网络, 因为事实上我们的模型必须产生额外的输出, 而不仅仅是标签 。 作为不平衡数据集的过度标本, kNN 模型的序列往往超越合成的合成少数群体过量测试和适应性合成合成合成合成同步抽样抽样的序列 。