Textual escalation detection has been widely applied to e-commerce companies' customer service systems to pre-alert and prevent potential conflicts. Similarly, in public areas such as airports and train stations, where many impersonal conversations frequently take place, acoustic-based escalation detection systems are also useful to enhance passengers' safety and maintain public order. To this end, we introduce a system based on acoustic-lexical features to detect escalation from speech, Voice Activity Detection (VAD) and label smoothing are adopted to further enhance the performance in our experiments. Considering a small set of training and development data, we also employ transfer learning on several wellknown emotional detection datasets, i.e. RAVDESS, CREMA-D, to learn advanced emotional representations that is then applied to the conversational escalation detection task. On the development set, our proposed system achieves 81.5% unweighted average recall (UAR) which significantly outperforms the baseline with 72.2% UAR.
翻译:同样,在机场和火车站等公共场所,经常发生许多非人性谈话,基于声学的升级探测系统也有助于加强乘客的安全和维护公共秩序。为此,我们采用了基于声学-传统特征的系统,以探测语音升级、语音活动探测(VAD)和标签平滑,以进一步提高我们实验的绩效。考虑到一小套培训和开发数据,我们还利用一些众所周知的情感探测数据集(即REMA-D)的转移学习,学习先进的情感表现,然后应用于谈话升级探测任务。在开发中,我们提议的系统实现了8.1.5%的未加权平均记数(UAR),大大超过了72.2%的UAR基准。