Although deep Neural Networks (DNNs) have achieved tremendous success in audio classification tasks, their uncertainty calibration are still under-explored. A well-calibrated model should be accurate when it is certain about its prediction and indicate high uncertainty when it is likely to be inaccurate. In this work, we investigate the uncertainty calibration for deep audio classifiers. In particular, we empirically study the performance of popular calibration methods: (i) Monte Carlo Dropout, (ii) ensemble, (iii) focal loss, and (iv) spectral-normalized Gaussian process (SNGP), on audio classification datasets. To this end, we evaluate (i-iv) for the tasks of environment sound and music genre classification. Results indicate that uncalibrated deep audio classifiers may be over-confident, and SNGP performs the best and is very efficient on the two datasets of this paper.
翻译:尽管深神经网络(DNN)在音频分类任务方面取得了巨大成功,但其不确定性校准仍未得到充分探讨。一个经过良好校准的模型在确定其预测时应该准确,在可能不准确时应该显示高度不确定性。在这项工作中,我们调查深音频分类器的不确定性校准。特别是,我们从经验上研究大众校准方法的性能:(一) Monte Carlo Drobout, (二) 共同点损失, (三) 焦点损失, (四) 声频分类数据集的光谱标准化高斯进程(SNGP)。为此,我们评估(一至四) 环境音频和音乐类型分类的任务。结果显示,未经校准的深音频分类器可能过于自信, SNGP 运行最佳数据,在这两套文件上非常高效。