In this work the goal is to generalise to new data in a non-iid setting where datasets from related tasks are observed, each generated by a different causal mechanism, and the test dataset comes from the same task distribution. This setup is motivated by personalised medicine, where a patient is a task and complex diseases are heterogeneous across patients in cause and progression. The difficulty is that there usually is not enough data in one task to identify the causal mechanism, and unless the mechanisms are the same, pooling data across tasks, which meta-learning does one way or the other, may lead to worse predictors when the test setting may be uncontrollably different. In this paper we introduce to meta-learning, formulated as Bayesian hierarchical modelling, a proxy measure of similarity of the causal mechanisms of tasks, by learning a suitable embedding of the tasks from the whole data set. This embedding is used as auxiliary data for assessing which tasks should be pooled in the hierarchical model. We show that such pooling improves predictions in three health-related case studies, and by sensitivity analyses on simulated data that the method aids generalisability by utilising interventional data to identify tasks with similar causal mechanisms for pooling, even in limited data settings.
翻译:暂无翻译