Researchers have formalized reinforcement learning (RL) in different ways. If an agent in one RL framework is to run within another RL framework's environments, the agent must first be converted, or mapped, into that other framework. Whether or not this is possible depends on not only the RL frameworks in question and but also how intelligence itself is measured. In this paper, we lay foundations for studying relative-intelligence-preserving mappability between RL frameworks. We define two types of mappings, called weak and strong translations, between RL frameworks and prove that existence of these mappings enables two types of intelligence comparison according to the mappings preserving relative intelligence. We investigate the existence or lack thereof of these mappings between: (i) RL frameworks where agents go first and RL frameworks where environments go first; and (ii) twelve different RL frameworks differing in terms of whether or not agents or environments are required to be deterministic. In the former case, we consider various natural mappings between agent-first and environment-first RL and vice versa; we show some positive results (some such mappings are strong or weak translations) and some negative results (some such mappings are not). In the latter case, we completely characterize which of the twelve RL-framework pairs admit weak translations, under the assumption of integer-valued rewards and some additional mild assumptions.
翻译:如果一个RL框架中的代理机构要在另一个RL框架中运行,则必须首先将该代理机构转换或绘制成另一个框架。这是否可行,不仅取决于有关RL框架,而且取决于情报本身的衡量方式。在本文中,我们为研究相对情报-保存RL框架之间的地图可变性打下了基础。我们在RL框架中界定了两种类型的绘图,称为弱和强翻译,在RL框架之间,并证明这些绘图的存在有助于根据保存相对情报的绘图进行两种类型的情报比较。我们调查这些绘图是否存在:(一) 代理机构先行的RL框架和环境先行的RL框架;以及(二) 12个不同的RL框架,在代理人或环境是否必须具有确定性方面有所不同。在RL框架中,我们考虑在代理人第一与环境第一框架和环境第一框架之间,反之,各种称为温和第一RL框架之间的自然测绘;我们展示了一些积极的结果(有些这样的制图是强或弱的翻译)和一些否定的后期假设。