与在线积极学习和诊断神经网络进行非静止数据流分类 (Nonstationary data stream classification with online active learning and siamese neural networks)

We have witnessed in recent years an ever-growing volume of information becoming available in a streaming manner in various application areas. As a result, there is an emerging need for online learning methods that train predictive models on-the-fly. A series of open challenges, however, hinder their deployment in practice. These are, learning as data arrive in real-time one-by-one, learning from data with limited ground truth information, learning from nonstationary data, and learning from severely imbalanced data, while occupying a limited amount of memory for data storage. We propose the ActiSiamese algorithm, which addresses these challenges by combining online active learning, siamese networks, and a multi-queue memory. It develops a new density-based active learning strategy which considers similarity in the latent (rather than the input) space. We conduct an extensive study that compares the role of different active learning budgets and strategies, the performance with/without memory, the performance with/without ensembling, in both synthetic and real-world datasets, under different data nonstationarity characteristics and class imbalance levels. ActiSiamese outperforms baseline and state-of-the-art algorithms, and is effective under severe imbalance, even only when a fraction of the arriving instances' labels is available. We publicly release our code to the community.

翻译：近年来,我们目睹了以流流方式在各个应用领域提供越来越多的信息。结果,出现了对在线学习方法的需求,这些方法正在对预测模型进行现场培训。然而,一系列公开的挑战阻碍了其实际的部署。这些挑战包括:当数据以实时一对一的方式到达时学习,从有限的地面真相信息的数据中学习,从非静态数据中学习,从严重不平衡的数据中学习,同时占用了有限的数据存储记忆量。我们提议了AciSiamese算法,通过将在线积极学习、硅网络和多轴记忆结合起来来应对这些挑战。它开发了一种新的基于密度的积极学习战略,其中考虑到潜在空间(而不是投入)的相似性。我们开展了一项广泛的研究,将不同活跃学习预算和战略的作用、有/没有记忆的绩效、在合成和现实世界数据集中的表现/不包含大量数据存储。在不同的数据不静止特征和阶级不平衡水平下,ActiSiameereamee的绩效,只有我们现有的严重失衡的基线和状态下,我们所能获得的代码。