The ability of an agent to perform well in new and unseen environments is a crucial aspect of intelligence. In machine learning, this ability is referred to as strong or out-of-distribution generalization. However, simply considering differences in data distributions is not sufficient to fully capture differences in environments. In the present paper, we assay out-of-variable generalization, which refers to an agent's ability to handle new situations that involve variables never jointly observed before. We expect that such ability is important also for AI-driven scientific discovery: humans, too, explore 'Nature' by probing, observing and measuring subsets of variables at one time. Mathematically, it requires efficient re-use of past marginal knowledge, i.e., knowledge over subsets of variables. We study this problem, focusing on prediction tasks that involve observing overlapping, yet distinct, sets of causal parents. We show that the residual distribution of one environment encodes the partial derivative of the true generating function with respect to the unobserved causal parent. Hence, learning from the residual allows zero-shot prediction even when we never observe the outcome variable in the other environment.
翻译:跨变量泛化能力
在智能领域,智能体在新的、未知的环境中表现良好的能力是智能的一个重要方面。在机器学习中,这种能力被称为强泛化能力或分布外泛化能力。但是,仅仅考虑数据分布的差异是不足以完全捕捉到环境差异的。在本文中,我们探究跨变量泛化能力,它指的是智能体处理以前从未同时观察过的变量的新情况的能力。我们认为这种能力也对基于人工智能的科学发现至关重要:人类也通过探索、观察和测量一次观察一部分变量的方式来探索“自然”。从数学上讲,这需要对过去边际知识的有效重用,即对变量子集的知识。我们研究这个问题,重点关注涉及观察到重叠但不同的因果父项集合的预测任务。我们证明一个环境中的残差分布编码了真实生成函数对未观测的因果父项的偏导数。因此,学习残差可以实现零次预测,即使我们在另一个环境中从未观测到结果变量。