Faced with an ever-increasing complexity of their domains of application, artificial learning agents are now able to scale up in their ability to process an overwhelming amount of information coming from their interaction with an environment. However, this process of scaling does come with a cost of encoding and processing an increasing amount of redundant information that is not necessarily beneficial to the learning process itself. This work exploits the properties of the learning systems defined over partially observable domains by selectively focusing on the specific type of information that is more likely to express the causal interaction among the transitioning states of the environment. Adaptive masking of the observation space based on the $\textit{temporal difference displacement}$ criterion enabled a significant improvement in convergence of temporal difference algorithms defined over a partially observable Markov process.
翻译:面对其应用领域日益复杂的情况,人工学习代理人现在能够扩大处理来自其与环境互动的大量信息的能力,然而,这种扩大过程确实需要花费越来越多的多余信息的编码和处理,而这种编码和处理不一定有利于学习过程本身。这项工作利用了部分可观测领域界定的学习系统的特点,有选择地侧重于更可能表达环境转型状态之间因果关系的具体类型的信息。 以美元/时差差差为根据的观测空间的适应性遮掩使得根据部分可观测马尔科夫进程界定的时间差异算法的趋同显著改善。