在健康分析中引入因果推理 (An introduction to causal reasoning in health analytics)

A data science task can be deemed as making sense of the data and/or testing a hypothesis about it. The conclusions inferred from data can greatly guide us to make informative decisions. Big data has enabled us to carry out countless prediction tasks in conjunction with machine learning, such as identifying high risk patients suffering from a certain disease and taking preventable measures. However, healthcare practitioners are not content with mere predictions - they are also interested in the cause-effect relation between input features and clinical outcomes. Understanding such relations will help doctors treat patients and reduce the risk effectively. Causality is typically identified by randomized controlled trials. Often such trials are not feasible when scientists and researchers turn to observational studies and attempt to draw inferences. However, observational studies may also be affected by selection and/or confounding biases that can result in wrong causal conclusions. In this chapter, we will try to highlight some of the drawbacks that may arise in traditional machine learning and statistical approaches to analyze the observational data, particularly in the healthcare data analytics domain. We will discuss causal inference and ways to discover the cause-effect from observational studies in healthcare domain. Moreover, we will demonstrate the applications of causal inference in tackling some common machine learning issues such as missing data and model transportability. Finally, we will discuss the possibility of integrating reinforcement learning with causality as a way to counter confounding bias.

翻译：数据科学任务可被视为对数据和/或对数据做出假设具有意义。从数据得出的结论可以极大地指导我们作出知情决定。大数据使我们得以在机器学习的同时执行无数的预测任务,例如确定患有某种疾病的高风险病人并采取可预防的措施。然而,医疗从业人员并不满足于单纯的预测----他们也关心投入特征和临床结果之间的因果关系。了解这种关系将有助于医生治疗病人并有效降低风险。原因通常通过随机控制的试验确定。当科学家和研究人员转向观察研究和试图推断时,这类试验往往不可行。然而,观察研究也可能受到选择和/或纠结偏见的影响,而这种偏见可能导致错误的因果关系结论。在本章中,我们将努力强调在传统机器学习和统计方法分析观察数据时可能产生的一些缺陷,特别是在保健数据分析偏向分析领域。我们将讨论因果关系的推论以及从观察研究到保健领域的观察研究中发现某种因果关系的方法是不可行的。此外,我们还将在研究中以共同的因果关系研究方式来讨论如何解决共同的因果关系问题,从而解决机理学问题。