As a novel deep learning model, gcForest has been widely used in various applications. However, the current multi-grained scanning of gcForest produces many redundant feature vectors, and this increases the time cost of the model. To screen out redundant feature vectors, we introduce a hashing screening mechanism for multi-grained scanning and propose a model called HW-Forest which adopts two strategies, hashing screening and window screening. HW-Forest employs perceptual hashing algorithm to calculate the similarity between feature vectors in hashing screening strategy, which is used to remove the redundant feature vectors produced by multi-grained scanning and can significantly decrease the time cost and memory consumption. Furthermore, we adopt a self-adaptive instance screening strategy to improve the performance of our approach, called window screening, which can achieve higher accuracy without hyperparameter tuning on different datasets. Our experimental results show that HW-Forest has higher accuracy than other models, and the time cost is also reduced.
翻译:作为一种新型深层学习模式,GcForest在各种应用中被广泛使用。然而,目前对gcForest的多重扫描产生了许多冗余特性矢量,这增加了模型的时间成本。为筛选冗余特性矢量,我们为多重扫描引入了散列筛选机制,并提出了一个称为HW-Forest的模型,该模型采用两种战略,即散列筛选和窗口筛选。HW-Forest采用感知散列算法,计算散列筛选战略中特性矢量的相似性,用于去除多重扫描产生的冗余特性矢量,从而大大降低时间成本和内存消耗量。此外,我们采取了一种自我适应性实例筛选战略来改进我们方法的性能,称为窗口筛选,它可以在不对不同的数据集进行超分立度调整的情况下实现更高的精度。我们的实验结果表明,HW-Forest的特性矢量量值比其他模型更精确,时间成本也有所降低。