Supervised learning algorithms generally assume the availability of enough memory to store their data model during the training and test phases. However, in the Internet of Things, this assumption is unrealistic when data comes in the form of infinite data streams, or when learning algorithms are deployed on devices with reduced amounts of memory. In this paper, we adapt the online Mondrian forest classification algorithm to work with memory constraints on data streams. In particular, we design five out-of-memory strategies to update Mondrian trees with new data points when the memory limit is reached. Moreover, we design trimming mechanisms to make Mondrian trees more robust to concept drifts under memory constraints. We evaluate our algorithms on a variety of real and simulated datasets, and we conclude with recommendations on their use in different situations: the Extend Node strategy appears as the best out-of-memory strategy in all configurations, whereas different trimming mechanisms should be adopted depending on whether a concept drift is expected. All our methods are implemented in the OrpailleCC open-source library and are ready to be used on embedded systems and connected objects.
翻译:受监督的学习算法通常假设在培训和测试阶段有足够的内存来存储数据模型。 但是,在Things互联网中,当数据以无限数据流的形式出现时,或者当学习算法在记忆量减少的装置上部署时,这种假设是不现实的。 在本文中,我们调整在线蒙德里安森林分类算法,使其与数据流的内存限制相一致。特别是,我们设计了五个不成熟的战略,以便在内存限度达到时用新的数据点更新蒙德里安树。此外,我们设计了三角机制,使蒙德里安树更加强大,以适应记忆限制下的概念漂移。我们评估了各种真实和模拟数据集的算法,我们最后提出了在不同情况下使用这些算法的建议:扩展节战略似乎是所有配置中最好的外表战略,而不同的三角机制应该采用,这取决于是否预期概念流。我们的所有方法都应用在奥帕伊拉克开源库中,并准备用于嵌入系统和连接对象。