When dealing with clinical text classification on a small dataset recent studies have confirmed that a well-tuned multilayer perceptron outperforms other generative classifiers, including deep learning ones. To increase the performance of the neural network classifier, feature selection for the learning representation can effectively be used. However, most feature selection methods only estimate the degree of linear dependency between variables and select the best features based on univariate statistical tests. Furthermore, the sparsity of the feature space involved in the learning representation is ignored. Goal: Our aim is therefore to access an alternative approach to tackle the sparsity by compressing the clinical representation feature space, where limited French clinical notes can also be dealt with effectively. Methods: This study proposed an autoencoder learning algorithm to take advantage of sparsity reduction in clinical note representation. The motivation was to determine how to compress sparse, high-dimensional data by reducing the dimension of the clinical note representation feature space. The classification performance of the classifiers was then evaluated in the trained and compressed feature space. Results: The proposed approach provided overall performance gains of up to 3% for each evaluation. Finally, the classifier achieved a 92% accuracy, 91% recall, 91% precision, and 91% f1-score in detecting the patient's condition. Furthermore, the compression working mechanism and the autoencoder prediction process were demonstrated by applying the theoretic information bottleneck framework.
翻译:在一个小型数据集的近期研究中处理临床文本分类时,大多数特征选择方法都只估计变量之间的线性依赖程度,并根据单体统计测试选择最佳特征。此外,学习代表空间所涉特征空间的广度被忽略。 目标:因此,我们的目标是通过压缩临床代表空间来获取一种替代方法,解决宽度问题,在此可有效处理有限的法国临床笔记。方法:该研究提出一个自动编码器学习算法,以利用临床说明的宽度减少。该研究的动机是确定如何通过减少临床说明代表空间的维度来压缩稀释、高维度数据。随后在经过培训和压缩的特征空间中评估了分类器的分类性能。结果:拟议方法提供了每次评估达到3%的总体绩效,而法国的临床笔记也可以在那里有效处理。方法:该研究提出了一种自动编码器学习算法,以利用临床说明中显示的宽度减少度。 动机是确定如何通过减少临床代表空间的维度的尺寸。随后在经过训练的和压缩的特征空间中评估分类器的性表现。结果:拟议的方法提供了每次评估达到3%的总体绩效。最后的精确度为91和精确度的精确度, 和精确度的精确度为191 和精确度的精确度。