Parameter estimation in the empirical fields is usually undertaken using parametric models, and such models are convenient because they readily facilitate statistical inference. Unfortunately, they are unlikely to have a sufficiently flexible functional form to be able to adequately model real-world phenomena, and their usage may therefore result in biased estimates and invalid inference. Unfortunately, whilst non-parametric machine learning models may provide the needed flexibility to adapt to the complexity of real-world phenomena, they do not readily facilitate statistical inference, and may still exhibit residual bias. We explore the potential for semiparametric theory (in particular, the Influence Function) to be used to improve neural networks and machine learning algorithms in terms of (a) improving initial estimates without needing more data (b) increasing the robustness of our models, and (c) yielding confidence intervals for statistical inference. We propose a new neural network method MultiNet, which seeks the flexibility and diversity of an ensemble using a single architecture. Results on causal inference tasks indicate that MultiNet yields better performance than other approaches, and that all considered methods are amenable to improvement from semiparametric techniques under certain conditions. In other words, with these techniques we show that we can improve existing neural networks for `free', without needing more data, and without needing to retrain them. Finally, we provide the expression for deriving influence functions for estimands from a general graph, and the code to do so automatically.
翻译:在经验领域,通常使用参数模型进行参数估计,而这种模型很方便,因为它们容易促进统计推断。不幸的是,它们不太可能具有足够灵活的功能形式,能够充分模拟现实世界现象,因此其使用可能导致偏差估计和无效推断。 不幸的是,虽然非参数机器学习模型可以提供必要的灵活性,以适应现实世界现象的复杂性,但它们并不便于统计推断,而且可能仍然表现出残余偏差。我们探索半参数理论(特别是影响函数)的潜力,以便用来改进神经网络和机器学习算法,其用法可以:(a) 改进初步估计数,而不需要更多数据(b) 提高模型的稳健性,以及(c) 产生对统计推论的信心间隔。我们提出一个新的神经网络方法,用一个单一的结构来寻求一个整体的灵活和多样性。关于因果关系的任务的结果表明,多网络的性能比其他方法要好,而且所有认为的方法都能够从半参数模型中自动地改进。 在不需要更多的数据的情况下,我们不需要用其他语言来改进现有的数据。