Inspired by recent progress in multi-agent Reinforcement Learning (RL), in this work we examine the collective intelligent behaviour of theoretical universal agents by introducing a weighted mixture operation. Given a weighted set of agents, their weighted mixture is a new agent whose expected total reward in any environment is the corresponding weighted average of the original agents' expected total rewards in that environment. Thus, if RL agent intelligence is quantified in terms of performance across environments, the weighted mixture's intelligence is the weighted average of the original agents' intelligences. This operation enables various interesting new theorems that shed light on the geometry of RL agent intelligence, namely: results about symmetries, convex agent-sets, and local extrema. We also show that any RL agent intelligence measure based on average performance across environments, subject to certain weak technical conditions, is identical (up to a constant factor) to performance within a single environment dependent on said intelligence measure.
翻译:在多剂强化学习(RL)最近进展的启发下,在本工作中,我们通过采用加权混合操作来审查理论普遍物剂的集体智能行为。考虑到一组加权物剂,它们的加权混合体是一个新的物剂,其预期在任何环境中的总报酬是原始物剂预期在该环境中的总报酬的相应加权平均数。因此,如果按不同环境的性能量化了RL物剂情报,加权物剂情报是原始物剂情报的加权平均数。这项操作使得各种有趣的新理论能够揭示出RL物剂情报的几何特征,即:关于对称、 convex物剂集和当地外壳的结果。我们还表明,任何基于不同环境的平均性能的RL物剂情报措施,只要有某些薄弱的技术条件,都与取决于上述情报计量的单一环境中的性能相同(直至一个不变系数)。