The insight that causal parameters are particularly suitable for out-of-sample prediction has sparked a lot development of causal-like predictors. However, the connection with strict causal targets, has limited the development with good risk minimization properties, but without a direct causal interpretation. In this manuscript we derive the optimal out-of-sample risk minimizing predictor of a certain target $Y$ in a non-linear system $(X,Y)$ that has been trained in several within-sample environments. We consider data from an observation environment, and several shifted environments. Each environment corresponds to a structural equation model (SEM), with random coefficients and with its own shift and noise vector, both in $L^2$. Unlike previous approaches, we also allow shifts in the target value. We define a sieve of out-of-sample environments, consisting of all shifts $\tilde{A}$ that are at most $\gamma$ times as strong as any weighted average of the observed shift vectors. For each $\beta\in\mathbb{R}^p$ we show that the supremum of the risk functions $R_{\tilde{A}}(\beta)$ has a worst-risk decomposition into a (positive) non-linear combination of risk functions, depending on $\gamma$. We then define the set $\mathcal{B}_\gamma$, as minimizers of this risk. The main result of the paper is that there is a unique minimizer ($|\mathcal{B}_\gamma|=1$) that can be consistently estimated by an explicit estimator, outside a set of zero Lebesgue measure in the parameter space. A practical obstacle for the initial method of estimation is that it involves the solution of a general degree polynomials. Therefore, we prove that an approximate estimator using the bisection method is also consistent.
翻译:暂无翻译