用于异种数据可解释建模的集团概率-加权树树总和 (Group Probability-Weighted Tree Sums for Interpretable Modeling of Heterogeneous Data)

Machine learning in high-stakes domains, such as healthcare, faces two critical challenges: (1) generalizing to diverse data distributions given limited training data while (2) maintaining interpretability. To address these challenges, we propose an instance-weighted tree-sum method that effectively pools data across diverse groups to output a concise, rule-based model. Given distinct groups of instances in a dataset (e.g., medical patients grouped by age or treatment site), our method first estimates group membership probabilities for each instance. Then, it uses these estimates as instance weights in FIGS (Tan et al. 2022), to grow a set of decision trees whose values sum to the final prediction. We call this new method Group Probability-Weighted Tree Sums (G-FIGS). G-FIGS achieves state-of-the-art prediction performance on important clinical datasets; e.g., holding the level of sensitivity fixed at 92%, G-FIGS increases specificity for identifying cervical spine injury by up to 10% over CART and up to 3% over FIGS alone, with larger gains at higher sensitivity levels. By keeping the total number of rules below 16 in FIGS, the final models remain interpretable, and we find that their rules match medical domain expertise. All code, data, and models are released on Github.

翻译：在保健等高取量领域学习机器面临两大挑战:(1) 普及不同的数据分布,提供有限的培训数据,同时(2) 保持可解释性; 为应对这些挑战,我们建议采用实例加权的树总法,将不同群体的数据有效汇集到不同的群体,以产生一个简洁、有章可循的模式。鉴于数据集中存在不同的情况(如按年龄或治疗地点分类的病人),我们的方法首先估计每个类的敏感性程度为92%,然后,G-FIGS将这些估计数用作FIGS(Tan等人,2022)的例重,以培育一套其价值与最后预测相匹配的决策树群。我们称之为“可预见性树群”(G-FIGS)。G-FIGS在重要的临床数据集中实现了最先进的预测性表现;例如,将敏感度保持在92%,G-FIGS将确定宫颈脊损伤的精确度提高到10%以上,而仅与FIGS相比提高到3%,在最后预测值上,我们称之为“可更高敏感度”的新的方法组组(GIGS),在最后规则中仍保持其整个理解性,并在GIFI模型中保持其最后规则的准确性。

相关内容

GROUP

关注 1

Group一直是研究计算机支持的合作工作、人机交互、计算机支持的协作学习和社会技术研究的主要场所。该会议将社会科学、计算机科学、工程、设计、价值观以及其他与小组工作相关的多个不同主题的工作结合起来，并进行了广泛的概念化。官网链接：https://group.acm.org/conferences/group20/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

【新书】贝叶斯网络进展与新应用，附全书下载

专知会员服务

122+阅读 · 2019年12月9日

【ECML-PKDD 2019】多维时间序列和事件日志的模式挖掘和异常检测框架（A framework for pattern mining and anomalydetection in multi-dimensional time series andevent logs）

专知会员服务

38+阅读 · 2019年12月1日