A combination of deep reinforcement learning and supervised learning is proposed for the problem of active sequential hypothesis testing in completely unknown environments. We make no assumptions about the prior probability, the action and observation sets, and the observation generating process. Our method can be used in any environment even if it has continuous observations or actions, and performs competitively and sometimes better than the Chernoff test, in both finite and infinite horizon problems, despite not having access to the environment dynamics.
翻译:提出了一种深度强化学习和监督学习的组合方法,用于在完全未知环境中进行主动序列假设检验。我们不对先验概率、行动和观测集以及观测生成过程做任何假设。我们的方法可以在任何环境中使用,即使具有连续的观测或行动,并且在有限和无限的时域问题中表现出与Chernoff测试竞争力甚至更好的性能,尽管我们无法访问环境动态。