移动神经网络的Bias损失 (Bias Loss for Mobile Neural Networks)

Compact convolutional neural networks (CNNs) have witnessed exceptional improvements in performance in recent years. However, they still fail to provide the same predictive power as CNNs with a large number of parameters. The diverse and even abundant features captured by the layers is an important characteristic of these successful CNNs. However, differences in this characteristic between large CNNs and their compact counterparts have rarely been investigated. In compact CNNs, due to the limited number of parameters, abundant features are unlikely to be obtained, and feature diversity becomes an essential characteristic. Diverse features present in the activation maps derived from a data point during model inference may indicate the presence of a set of unique descriptors necessary to distinguish between objects of different classes. In contrast, data points with low feature diversity may not provide a sufficient amount of unique descriptors to make a valid prediction; we refer to them as random predictions. Random predictions can negatively impact the optimization process and harm the final performance. This paper proposes addressing the problem raised by random predictions by reshaping the standard cross-entropy to make it biased toward data points with a limited number of unique descriptive features. Our novel Bias Loss focuses the training on a set of valuable data points and prevents the vast number of samples with poor learning features from misleading the optimization process. Furthermore, to show the importance of diversity, we present a family of SkipNet models whose architectures are brought to boost the number of unique descriptors in the last layers. Our Skipnet-M can achieve 1% higher classification accuracy than MobileNetV3 Large.

翻译：近些年来,光电连锁神经网络(CNNs)的性能取得了显著的改善,然而,它们仍然未能提供与CNN具有大量参数的有线电视新闻网一样的预测力。由各层所捕捉的多样化甚至丰富的特征是这些有线电视网取得成功的重要特征。然而,大型CNN及其紧凑对等者之间在这种特征上的差异却很少被调查。在有线电视网中,由于参数有限,不可能获得丰富的特征,特征多样性成为基本特征。在模型推论期间,从数据点得出的启动地图中存在不同的特征,可能显示有一套独特的描述性能,以区分不同类别的物体。相比之下,具有低特征多样性的数据点可能无法提供足够的独特描述性标语;我们称之为随机预测。随机预测可能会对优化进程产生负面影响,并损害最后的性能。本文建议通过随机预测来解决问题,调整标准跨网络的多功能,使其偏向数据点偏向数量有限的高描述性特征。我们新颖的Biasim Streal Strial 3 以当前最宝贵的数字为核心,我们最新的缩缩略地展示了核心的缩缩缩缩缩缩版结构的模型,从而无法显示我们目前的宝贵数据结构的缩略缩略图。