Although sparse training has been successfully used in various resource-limited deep learning tasks to save memory, accelerate training, and reduce inference time, the reliability of the produced sparse models remains unexplored. Previous research has shown that deep neural networks tend to be over-confident, and we find that sparse training exacerbates this problem. Therefore, calibrating the sparse models is crucial for reliable prediction and decision-making. In this paper, we propose a new sparse training method to produce sparse models with improved confidence calibration. In contrast to previous research that uses only one mask to control the sparse topology, our method utilizes two masks, including a deterministic mask and a random mask. The former efficiently searches and activates important weights by exploiting the magnitude of weights and gradients. While the latter brings better exploration and finds more appropriate weight values by random updates. Theoretically, we prove our method can be viewed as a hierarchical variational approximation of a probabilistic deep Gaussian process. Extensive experiments on multiple datasets, model architectures, and sparsities show that our method reduces ECE values by up to 47.8\% and simultaneously maintains or even improves accuracy with only a slight increase in computation and storage burden.
翻译:尽管在各种资源有限的深层学习任务中成功地利用了稀少的培训,以节省记忆、加快培训和减少推论时间,但所制作的稀少模型的可靠性仍未得到探索。先前的研究显示,深神经网络往往过于自信,我们发现,稀少的培训加剧了这一问题。因此,校准稀少模型对于可靠的预测和决策至关重要。在本文件中,我们提出一种新的稀少培训方法,以产生更加信任度校准的稀少模型。与以前只用一个掩体来控制稀少的地形的研究相比,我们的方法使用了两种面罩,包括确定性面具和随机遮罩。前一种是利用重量和梯度的大小来有效搜索和激活重要重量。后一种是更好的探索,通过随机更新来找到更适当的重量值。理论上,我们证明我们的方法可以被视为一种等级差异性差近似于概率性深高的深度测量过程。在多个数据集、模型结构以及孔隙上进行的广泛实验表明,我们的方法仅通过47.8<unk> 来降低欧洲经委会的值,同时维持或改进精确度,只有轻微的计算。</s>