This paper presents our latest investigations on dialog act (DA) classification on automatically generated transcriptions. We propose a novel approach that combines convolutional neural networks (CNNs) and conditional random fields (CRFs) for context modeling in DA classification. We explore the impact of transcriptions generated from different automatic speech recognition systems such as hybrid TDNN/HMM and End-to-End systems on the final performance. Experimental results on two benchmark datasets (MRDA and SwDA) show that the combination CNN and CRF improves consistently the accuracy. Furthermore, they show that although the word error rates are comparable, End-to-End ASR system seems to be more suitable for DA classification.
翻译:本文件介绍了我们最近对自动生成的转录本的对话行为(DA)分类调查情况,我们提出了一种新颖的办法,将进化神经网络(CNNs)和有条件随机字段(CRFs)结合到DA分类的背景模型中,我们探讨了不同自动语音识别系统,如混合TDNN/HMM和端对端系统产生的转录对最后性能的影响,两个基准数据集(MRDA和SWDA)的实验结果表明CNN和CRF的组合不断提高准确性。此外,它们表明,尽管字差率可以比较,但终端对端对端ASR系统似乎更适合DA分类。