Traffic safety challenges arising from extreme driver emotions highlight the urgent need for reliable emotion recognition systems. Traditional deep learning approaches in speech emotion recognition suffer from overfitting and poorly calibrated confidence estimates. We propose a framework integrating Conformal Prediction (CP) and Risk Control,using Mel-spectrogram features processed through a pre-trained convolutional neural network. Our key innovation is the development of a nonconformity score that heuristically measures how closely a classifier's predictions align with given inputs. Through calibration samples, we compute this score and derive a statistically rigorous threshold based on user-specified risk level $\alpha$, constructing prediction sets with provable coverage guarantees ($\geq 1-\alpha$). The Risk Control framework enables task-specific adaptation through customizable loss functions, dynamically adjusting prediction set sizes while maintaining coverage guarantees. Cross-dataset experiments on IEMOCAP and TESS demonstrate: 1) Strict coverage guarantee, 2) Significant negative correlation between Average Prediction Set Size (APSS) and $\alpha$, revealing reduced model uncertainty under high-risk conditions. We further propose APSS as a novel metric for evaluating classification uncertainty. This approach enhances speech emotion recognition reliability, with direct applications in intelligent transportation systems and real-time emotion monitoring.
翻译:暂无翻译