As a scalable data-driven approach, multi-agent reinforcement learning (MARL) has made remarkable advances in solving the cooperative residential load scheduling problems. However, the common centralized training strategy of MARL algorithms raises privacy risks for involved households. In this work, we propose a privacy-preserving multi-agent actor-critic framework where the decentralized actors are trained with distributed critics, such that both the decentralized execution and the distributed training do not require the global state information. The proposed framework can preserve the privacy of the households while simultaneously learn the multi-agent credit assignment mechanism implicitly. The simulation experiments demonstrate that the proposed framework significantly outperforms the existing privacy-preserving actor-critic framework, and can achieve comparable performance to the state-of-the-art actor-critic framework without privacy constraints.
翻译:作为可扩展的数据驱动方法,多试剂强化学习(MARL)在解决合作住宅载荷排期问题方面取得了显著进展,然而,MARL算法的共同集中培训战略增加了相关家庭的隐私风险。在这项工作中,我们提议了一个保护隐私的多试剂行为者框架,让分散化的行为者与分散化的批评者一起接受培训,这样分散化的执行和分散化的培训都不需要全球国家信息。拟议框架可以保护家庭的隐私,同时暗中学习多试剂信用分配机制。模拟实验表明,拟议的框架大大超越了现有保护隐私的行为者-批评框架,可以在不受隐私限制的情况下实现与最先进的行为者-批评框架相似的业绩。