Multiobjective reinforcement learning (MORL) poses significant challenges due to the inherent conflicts between objectives and the difficulty of adapting to dynamic environments. Traditional methods often struggle to generalize effectively, particularly in large and complex state-action spaces. To address these limitations, we introduce the Latent Causal Diffusion Model (LacaDM), a novel approach designed to enhance the adaptability of MORL in discrete and continuous environments. Unlike existing methods that primarily address conflicts between objectives, LacaDM learns latent temporal causal relationships between environmental states and policies, enabling efficient knowledge transfer across diverse MORL scenarios. By embedding these causal structures within a diffusion model-based framework, LacaDM achieves a balance between conflicting objectives while maintaining strong generalization capabilities in previously unseen environments. Empirical evaluations on various tasks from the MOGymnasium framework demonstrate that LacaDM consistently outperforms the state-of-art baselines in terms of hypervolume, sparsity, and expected utility maximization, showcasing its effectiveness in complex multiobjective tasks.
翻译:多目标强化学习(MORL)由于目标之间的固有冲突以及适应动态环境的困难而面临重大挑战。传统方法通常难以有效泛化,尤其是在庞大且复杂的状态-动作空间中。为应对这些局限,我们引入了潜在因果扩散模型(LacaDM),这是一种旨在增强MORL在离散和连续环境中适应性的新颖方法。与现有主要处理目标间冲突的方法不同,LacaDM学习环境状态与策略之间的潜在时序因果关系,从而能够在不同的MORL场景中实现高效的知识迁移。通过将这些因果结构嵌入到基于扩散模型的框架中,LacaDM在冲突目标之间实现了平衡,同时在先前未见的环境中保持了强大的泛化能力。在MOGymnasium框架下的各种任务上进行的实证评估表明,LacaDM在超体积、稀疏性和期望效用最大化方面持续优于最先进的基线方法,展示了其在复杂多目标任务中的有效性。