If we changed the rules, would the wise trade places with the fools? Different groups formalize reinforcement learning (RL) in different ways. If an agent in one RL formalization is to run within another RL formalization's environment, the agent must first be converted, or mapped. A criterion of adequacy for any such mapping is that it preserves relative intelligence. This paper investigates the formulation and properties of this criterion of adequacy. However, prior to the problem of formulation is, we argue, the problem of comparative intelligence. We compare intelligence using ultrafilters, motivated by viewing agents as candidates in intelligence elections where voters are environments. These comparators are counterintuitive, but we prove an impossibility theorem about RL intelligence measurement, suggesting such counterintuitions are unavoidable. Given a mapping between RL frameworks, we establish sufficient conditions to ensure that, for any ultrafilter-based intelligence comparator in the destination framework, there exists an ultrafilter-based intelligence comparator in the source framework such that the mapping preserves relative intelligence. We consider three concrete mappings between various RL frameworks and show that they satisfy these sufficient conditions and therefore preserve suitably-measured relative intelligence.
翻译:如果我们改变规则,明智的交易地点会与愚昧者改变规则吗?不同的集团会以不同的方式使强化学习(RL)正规化。如果一个RL正规化的代理人要在另一个RL正规化的环境内运行,那么该代理人必须首先转换或绘制。任何这种绘图的适当性的标准是它保持相对的情报。本文调查这一适当性标准的拟订和性质。然而,在制订问题之前,比较情报的问题是比较的问题。我们比较使用超过滤器的情报,其动机是在选民所处的环境中将代理人视为情报选举的候选人。这些参照器是反直观的,但我们证明不可能在RL情报衡量方面有理论依据,但建议这种反侵入是不可避免的。鉴于RL框架之间的绘图,我们建立了充分的条件,以确保在目的地框架内的任何超过滤器情报参照器参照器都有一个超过滤器的情报参照器,例如绘图保存相对情报。我们考虑在各种RL框架之间进行三个具体的制图,显示它们满足这些条件,从而保持适当的相对情报。