主动学习在教育对话行为分类中的应用：信息量重要吗？ (Does Informativeness Matter? Active Learning for Educational Dialogue Act Classification)

Dialogue Acts (DAs) can be used to explain what expert tutors do and what students know during the tutoring process. Most empirical studies adopt the random sampling method to obtain sentence samples for manual annotation of DAs, which are then used to train DA classifiers. However, these studies have paid little attention to sample informativeness, which can reflect the information quantity of the selected samples and inform the extent to which a classifier can learn patterns. Notably, the informativeness level may vary among the samples and the classifier might only need a small amount of low informative samples to learn the patterns. Random sampling may overlook sample informativeness, which consumes human labelling costs and contributes less to training the classifiers. As an alternative, researchers suggest employing statistical sampling methods of Active Learning (AL) to identify the informative samples for training the classifiers. However, the use of AL methods in educational DA classification tasks is under-explored. In this paper, we examine the informativeness of annotated sentence samples. Then, the study investigates how the AL methods can select informative samples to support DA classifiers in the AL sampling process. The results reveal that most annotated sentences present low informativeness in the training dataset and the patterns of these sentences can be easily captured by the DA classifier. We also demonstrate how AL methods can reduce the cost of manual annotation in the AL sampling process.

翻译：对话行为（DA）可用于解释专业导师和学生在辅导过程中的做法和知识。大多数实证研究采用随机抽样法获取用于DA手动标注的句子样本，然后使用它们来训练DA分类器。然而，这些研究很少关注样本信息量，其可以反映所选样本的信息数量，并告知分类器学习模式的程度。最重要的是，信息量水平可能因样本而异，而分类器可能只需要一小部分低信息量的样本来学习模式。随机抽样可能忽视样本信息量，这会消耗手工标注成本并为训练分类器做出较小的贡献。作为替代方法，研究人员建议采用主动学习（AL）的统计抽样方法来识别信息量丰富的样本，以支持 DA分类器在 AL抽样过程中。然而，在教育 DA分类任务中使用AL方法的研究还不充分。本文研究了手动标注句子样本的信息量。然后，研究调查了如何使用AL方法选择信息量丰富的样本来支持DA分类器在AL抽样过程中。结果表明，大多数手动标注的句子在训练数据集中呈现出低信息量，这些句子的模式可以被DA分类器轻松地捕捉。我们还展示了AL方法如何在AL抽样过程中减少手动标注成本。

相关内容

分类器

关注 6

分类是数据挖掘的一种非常重要的方法。分类的概念是在已有数据的基础上学会一个分类函数或构造出一个分类模型（即我们通常所说的分类器(Classifier)）。该函数或模型能够把数据库中的数据纪录映射到给定类别中的某一个，从而可以应用于数据预测。总之，分类器是数据挖掘中对样本进行分类的方法的统称，包含决策树、逻辑回归、朴素贝叶斯、神经网络等算法。

【吴恩达新课程】ChatGPT提示工程，ChatGPT Prompt Engineering for Developers

专知会员服务

104+阅读 · 2023年4月28日