We aim to construct a class of learning algorithms that are of practical value to applied researchers in fields such as biostatistics, epidemiology and econometrics, where the need to learn from incompletely observed information is ubiquitous. We propose a new framework for statistical machine learning of target functions arising as identifiable functionals from statistical models, which we call `IF-learning' due to its reliance on influence functions (IFs). This framework is problem- and model-agnostic and can be used to estimate a broad variety of target parameters of interest in applied statistics: we can consider any target function for which an IF of a population-averaged version exists in analytic form. Throughout, we put particular focus on so-called coarsening at random/doubly robust problems with partially unobserved information. This includes problems such as treatment effect estimation and inference in the presence of missing outcome data. Within this framework, we propose two general learning algorithms that build on the idea of nonparametric plug-in bias removal via IFs: the 'IF-learner' which uses pseudo-outcomes motivated by uncentered IFs for regression in large samples and outputs entire target functions without confidence bands, and the 'Group-IF-learner', which outputs only approximations to a function but can give confidence estimates if sufficient information on coarsening mechanisms is available. We apply both in a simulation study on inferring treatment effects.
翻译:我们的目标是建立一个对生物统计、流行病学和计量经济学等领域的应用研究人员具有实际价值的学习算法,这种算法对于应用应用在生物统计学、流行病学和计量经济学等领域的研究人员来说具有实际价值,在这方面,需要从不完全观测的信息中学习无处不在;我们提议一个新的框架,用于统计机器学习统计模型中可识别功能的目标功能,我们称之为“IF-学习”,因为依赖影响功能,我们称之为“IF-学习”,这个框架是问题和模型-不可知性,可用于估计应用统计数据中感兴趣的广泛目标参数:我们可以考虑人口平均版本的IFF以分析形式存在的任何目标功能。我们从整体上,我们特别注重所谓的随机/粗固问题,而部分缺乏信任信息,这包括治疗效应估计和推断缺失结果数据的问题。在这个框架内,我们提出两种一般性的学习算法,建立在通过IFFS系统进行非参数性插入的偏差偏差偏差外偏差分析的理念:“IFIF-lener”在不完全的模拟模型中使用假的模拟结果分析结果,而在不带有不精确的IFFFFFFFF的模型的模型的模型中,只能在不具有充分的图像的模型中进行反差的分析和分析功能。