This study introduces and investigates the capabilities of three different text mining approaches, namely Latent Semantic Analysis, Latent Dirichlet Analysis, and Clustering Word Vectors, for automating code extraction from a relatively small discussion board dataset. We compare the outputs of each algorithm with a previous dataset that was manually coded by two human raters. The results show that even with a relatively small dataset, automated approaches can be an asset to course instructors by extracting some of the discussion codes, which can be used in Epistemic Network Analysis.
翻译:本研究介绍并研究了三种不同的文本挖掘方法,即潜在语义分析,潜在狄利克雷分配和聚类词向量,用于从一个相对较小的讨论板数据集中自动提取代码。我们将每个算法的输出与先前由两个人类评分者手动编码的数据集进行比较。结果表明,即使在相对较小的数据集中,自动化方法也可以作为课程教练的资产,从中提取一些讨论代码,可用于认识网络分析。