Most of previous machine learning algorithms are proposed based on the i.i.d. hypothesis. However, this ideal assumption is often violated in real applications, where selection bias may arise between training and testing process. Moreover, in many scenarios, the testing data is not even available during the training process, which makes the traditional methods like transfer learning infeasible due to their need on prior of test distribution. Therefore, how to address the agnostic selection bias for robust model learning is of paramount importance for both academic research and real applications. In this paper, under the assumption that causal relationships among variables are robust across domains, we incorporate causal technique into predictive modeling and propose a novel Causally Regularized Logistic Regression (CRLR) algorithm by jointly optimize global confounder balancing and weighted logistic regression. Global confounder balancing helps to identify causal features, whose causal effect on outcome are stable across domains, then performing logistic regression on those causal features constructs a robust predictive model against the agnostic bias. To validate the effectiveness of our CRLR algorithm, we conduct comprehensive experiments on both synthetic and real world datasets. Experimental results clearly demonstrate that our CRLR algorithm outperforms the state-of-the-art methods, and the interpretability of our method can be fully depicted by the feature visualization.
翻译:先前的机器学习算法大多是根据i.d.假设提出的。然而,这一理想的假设往往在实际应用中被违反,因为培训和测试过程之间可能出现选择偏差。此外,在许多情况下,由于培训过程中的需要,传统方法,例如转让学习的学习不可行,因此在培训过程中甚至甚至无法提供测试数据,这使得传统方法,例如转让学习在测试分发前的需要进行,因此,如何解决对稳健模型学习的不可知选择偏差对于学术研究和实际应用都至关重要。在本文中,假设各变数之间的因果关系是强有力的,我们将因果技术纳入预测模型,并提议新的Causally正规化物流回归(CRLR)算法,共同优化全球组合平衡和加权后勤回归。全球混和平衡有助于确定因果关系特征,这些因果关系对结果的影响是稳定的,然后对这些因果特性进行后勤回归分析,构建一个强有力的预测模型,以抵消不可知偏差的偏差。在本文中,我们根据这一假设,在合成和真实的世界数据集进行全面实验性实验,通过实验结果,通过直观法解释我们CRLRL的特征的方法可以充分解释。