Out-of-distribution (OOD) detection is concerned with identifying data points that do not belong to the same distribution as the model's training data. For the safe deployment of predictive models in a real-world environment, it is critical to avoid making confident predictions on OOD inputs as it can lead to potentially dangerous consequences. However, OOD detection largely remains an under-explored area in the audio (and speech) domain. This is despite the fact that audio is a central modality for many tasks, such as speaker diarization, automatic speech recognition, and sound event detection. To address this, we propose to leverage feature-space of the model with deep k-nearest neighbors to detect OOD samples. We show that this simple and flexible method effectively detects OOD inputs across a broad category of audio (and speech) datasets. Specifically, it improves the false positive rate (FPR@TPR95) by 17% and the AUROC score by 7% than other prior techniques.
翻译:为了在现实环境中安全部署预测模型,必须避免对OOD投入进行有信心的预测,因为它可能导致潜在的危险后果。然而,OOD检测在很大程度上仍然是音频(和语音)领域探索不足的领域。尽管音频是许多任务的核心模式,例如语音分解、自动语音识别和音响事件检测。为了解决这个问题,我们提议利用该模型的特征空间与最深的 K 近邻探测OOD样本。我们表明,这一简单而灵活的方法能够有效探测到OOD输入的音频(和语音)数据集,具体地说,它提高了17%的假正率(FPR@TPR95)和7%的AUROC分数。