Dialogue acts (DAs) can represent conversational actions of tutors or students that take place during tutoring dialogues. Automating the identification of DAs in tutoring dialogues is significant to the design of dialogue-based intelligent tutoring systems. Many prior studies employ machine learning models to classify DAs in tutoring dialogues and invest much effort to optimize the classification accuracy by using limited amounts of training data (i.e., low-resource data scenario). However, beyond the classification accuracy, the robustness of the classifier is also important, which can reflect the capability of the classifier on learning the patterns from different class distributions. We note that many prior studies on classifying educational DAs employ cross entropy (CE) loss to optimize DA classifiers on low-resource data with imbalanced DA distribution. The DA classifiers in these studies tend to prioritize accuracy on the majority class at the expense of the minority class which might not be robust to the data with imbalanced ratios of different DA classes. To optimize the robustness of classifiers on imbalanced class distributions, we propose to optimize the performance of the DA classifier by maximizing the area under the ROC curve (AUC) score (i.e., AUC maximization). Through extensive experiments, our study provides evidence that (i) by maximizing AUC in the training process, the DA classifier achieves significant performance improvement compared to the CE approach under low-resource data, and (ii) AUC maximization approaches can improve the robustness of the DA classifier under different class imbalance ratios.
翻译:对话行为(DAs)可以代表在辅导对话期间发生的导师或学生的对话行动。自动识别导学对话中的DAs对于基于对话的智能辅导系统的设计非常重要。许多之前的研究采用机器学习模型对导学DAs进行分类,并投入大量精力来使用有限数量的训练数据(即低资源数据场景)来优化分类准确率。但是,除了分类准确性之外,分类器的稳健性也非常重要,它可以反映分类器从不同类分布中学习模式的能力。我们注意到,许多之前用于分类教育DAs的研究采用交叉熵(CE)损失来优化DA分类器,其在具有不平衡DA分布的低资源数据中倾向于优先考虑大多数类的准确性,而不考虑少数类,这可能对具有不平衡不同DA类比率的数据不够稳健。为了优化不平衡类分布下分类器的稳健性,我们建议通过最大化ROC曲线下面积(AUC)得分(即最大化AUC)来优化DA分类器的性能。通过广泛的实验,我们的研究提供了证据,即(i)通过最大化训练过程中的AUC,DA分类器在低资源数据下实现了显著的性能改进,并(ii)AUC最大化方法可以提高不同类不平衡比率下的DA分类器的稳健性。