Application of interpretable machine learning techniques on medical datasets facilitate early and fast diagnoses, along with getting deeper insight into the data. Furthermore, the transparency of these models increase trust among application domain experts. Medical datasets face common issues such as heterogeneous measurements, imbalanced classes with limited sample size, and missing data, which hinder the straightforward application of machine learning techniques. In this paper we present a family of prototype-based (PB) interpretable models which are capable of handling these issues. The models introduced in this contribution show comparable or superior performance to alternative techniques applicable in such situations. However, unlike ensemble based models, which have to compromise on easy interpretation, the PB models here do not. Moreover we propose a strategy of harnessing the power of ensembles while maintaining the intrinsic interpretability of the PB models, by averaging the model parameter manifolds. All the models were evaluated on a synthetic (publicly available dataset) in addition to detailed analyses of two real-world medical datasets (one publicly available). Results indicated that the models and strategies we introduced addressed the challenges of real-world medical data, while remaining computationally inexpensive and transparent, as well as similar or superior in performance compared to their alternatives.
翻译:此外,这些模型的透明性提高了应用领域专家之间的信任度; 医疗数据集面临一些共同的问题,如各种测量、抽样规模有限的不平衡等级和缺少数据等,这些问题妨碍了机械学习技术的直截了当的应用; 在本文件中,我们介绍了一套能够处理这些问题的基于原型(PB)的解释性模型(PB)解释性模型,这些模型与适用于这类情况的替代技术相比,具有可比较或优异的性能; 然而,这些模型与在简单解释方面必须妥协的基于整体的模型不同,这里的PB模型并非如此; 此外,我们提出了一个战略,即通过平均使用模型参数数来利用聚合体的力量,同时保持PB模型的内在可解释性; 除了详细分析两个真实世界的医疗数据集(一个公开提供)之外,还对所有模型进行了综合(公开提供的数据数据集)评价; 结果表明,我们采用的模型和战略处理的是现实世界医疗数据的挑战,同时在计算上保持成本低廉和透明的,或者在类似性业绩上,与替代品相比,具有类似的高超度。