A data science task can be deemed as making sense of the data or testing a hypothesis about it. The conclusions inferred from data can greatly guide us to make informative decisions. Big data has enabled us to carry out countless prediction tasks in conjunction with machine learning, such as identifying high risk patients suffering from a certain disease and taking preventable measures. However, healthcare practitioners are not content with mere predictions - they are also interested in the cause-effect relation between input features and clinical outcomes. Understanding such relations will help doctors treat patients and reduce the risk effectively. Causality is typically identified by randomized controlled trials. Often such trials are not feasible when scientists and researchers turn to observational studies and attempt to draw inferences. However, observational studies may also be affected by selection and/or confounding biases that can result in wrong causal conclusions. In this chapter, we will try to highlight some of the drawbacks that may arise in traditional machine learning and statistical approaches to analyze the observational data, particularly in the healthcare data analytics domain. We will discuss causal inference and ways to discover the cause-effect from observational studies in healthcare domain. Moreover, we will demonstrate the applications of causal inference in tackling some common machine learning issues such as missing data and model transportability. Finally, we will discuss the possibility of integrating reinforcement learning with causality as a way to counter confounding bias.
翻译:数据科学任务可被视为对数据具有意义,或对数据进行假设。从数据得出的结论可以极大地指导我们作出知情决定。大数据使我们得以在机器学习的同时执行无数的预测任务,例如查明患有某种疾病的高风险病人并采取可预防的措施。然而,医疗从业人员并不满足于单纯的预测 -- -- 他们也对投入特征和临床结果之间的因果关系感兴趣。了解这种关系将有助于医生治疗病人并有效降低风险。因果关系通常是通过随机控制的试验确定的。当科学家和研究人员转向观察研究和试图推断时,这类试验往往不可行。然而,观察研究也可能受到选择和/或纠结偏见的影响,而这种偏见可能导致错误的因果关系结论。在本章中,我们将努力强调在传统机器学习和统计方法分析观察数据时可能发生的一些缺陷,特别是在保健数据分析分析领域。我们将讨论因果关系的推断和如何发现在保健领域的观察研究中发现某种原因效应。此外,我们还将在研究中研究共同的因果关系方面,通过研究研究研究,将研究结果的因果关系,从而研究结果,我们将会从共同的因果关系上,从而研究如何研究,从而研究如何解决共同的因果关系的可能性。