In this work, we study out-of-distribution generalization in meta-learning from an information-theoretic perspective. We focus on two scenarios: (i) when the testing environment mismatches the training environment, and (ii) when the training environment is broader than the testing environment. The first corresponds to the standard distribution mismatch setting, while the second reflects a broad-to-narrow training scenario. We further formalize the generalization problem in meta-reinforcement learning and establish corresponding generalization bounds. Finally, we analyze the generalization performance of a gradient-based meta-reinforcement learning algorithm.
翻译:本文从信息论角度研究元学习中的分布外泛化问题。我们重点关注两种场景:(一)测试环境与训练环境不匹配;(二)训练环境范围宽于测试环境。前者对应标准分布失配设定,后者反映从宽泛到狭窄的训练场景。我们进一步形式化元强化学习中的泛化问题,并建立相应的泛化边界。最后,我们分析基于梯度的元强化学习算法的泛化性能。