This work is aiming to discuss and close some of the gaps in the literature on models using options (and more generally coagents). Briefly surveying the theory behind these models, it also aims to provide a unifying point of view on the many diverse examples that fall under a same category called coagent network. Motivated by the result of [10] on parameter sharing of options, we revisit the theory of (a)synchronous Coagent Network [8] by generalizing the result to the context where parameters are shared among the function approximators of coagents. The proof is more intuitive and uses the concept of execution paths in a coagent network. Theoretically, this informs us of some necessary modifications to the algorithms found in the literature which make them more mathematically accurate. It also allows us to introduce a new simple option framework, Feedforward Option Network, which outperforms the previous option models in time to convergence and stability in the famous nonstationary Four Rooms task. In addition, a stabilization effect is observed in hierarchical models which justify the unnecessity of the target network in training such models. Finally, we publish our code which allows us to be flexible in our experiments settings.
翻译:这项工作旨在讨论和弥合关于使用各种选项(以及更一般的共试剂)的模型的文献中的一些差距。 简要地调查这些模型背后的理论,还旨在就属于同一类别、称为共试网络的许多不同例子提供一个统一的观点。 受关于各种选项的参数共享的[10]结果的启发,我们重新审视了(a)同步共试剂网络[8]的理论,将结果概括到共同试剂的功能匹配者之间共享参数的背景上。 证据更直观,并在共同试剂网络中使用了执行路径的概念。理论上,这告诉我们对文献中发现的算法作了一些必要的修改,使其更加精确。 也使我们能够引入一个新的简单选择框架,即Feforward选择网络,它比以前的选择模型更符合著名的非静止四室任务中的趋同和稳定。 此外,在等级模型中观察到稳定效应,这证明在培训这种模型中目标网络的不相干之处是有道理的。 最后,我们公布我们的代码,使我们得以灵活地进行实验。