Practitioners in diverse fields such as healthcare, economics and education are eager to apply machine learning to improve decision making. The cost and impracticality of performing experiments and a recent monumental increase in electronic record keeping has brought attention to the problem of evaluating decisions based on non-experimental observational data. This is the setting of this work. In particular, we study estimation of individual-level causal effects, such as a single patient's response to alternative medication, from recorded contexts, decisions and outcomes. We give generalization bounds on the error in estimated effects based on distance measures between groups receiving different treatments, allowing for sample re-weighting. We provide conditions under which our bound is tight and show how it relates to results for unsupervised domain adaptation. Led by our theoretical results, we devise representation learning algorithms that minimize our bound, by regularizing the representation's induced treatment group distance, and encourage sharing of information between treatment groups. We extend these algorithms to simultaneously learn a weighted representation to further reduce treatment group distances. Finally, an experimental evaluation on real and synthetic data shows the value of our proposed representation architecture and regularization scheme.
翻译:医疗、经济学和教育等不同领域的从业者渴望应用机器学习来改进决策; 进行实验的成本和不切实际性以及最近电子记录保存的大幅增长使人们注意到根据非实验性观测数据对决定进行评价的问题; 这是这项工作的设置; 我们特别研究个人层面因果效应的估计, 如单一病人对替代药物的反应, 从记录下来的背景、决定和结果; 我们根据接受不同治疗的群体之间的距离措施对估计效果的错误进行概括化限制, 允许抽样重新加权; 我们提供约束很紧的条件, 并显示其与未经监督的域适应结果的关系。 我们根据理论结果, 我们设计了代表学习算法,通过规范代表引致治疗群体的距离, 并且鼓励治疗群体之间分享信息。 我们将这些算法扩大到同时学习加权代表制, 以进一步降低治疗群体的距离。 最后, 对真实和合成数据进行实验性评估, 显示了我们拟议的代表制和正规化计划的价值。