A key challenge in science and engineering is to design experiments to learn about some unknown quantity of interest. Classical experimental design optimally allocates the experimental budget to maximize a notion of utility (e.g., reduction in uncertainty about the unknown quantity). We consider a rich setting, where the experiments are associated with states in a {\em Markov chain}, and we can only choose them by selecting a {\em policy} controlling the state transitions. This problem captures important applications, from exploration in reinforcement learning to spatial monitoring tasks. We propose an algorithm -- \textsc{markov-design} -- that efficiently selects policies whose measurement allocation \emph{provably converges to the optimal one}. The algorithm is sequential in nature, adapting its choice of policies (experiments) informed by past measurements. In addition to our theoretical analysis, we showcase our framework on applications in ecological surveillance and pharmacology.
翻译:科学和工程方面的一个关键挑战是设计实验,以了解一些未知的兴趣数量。古典实验设计优化了实验预算分配,以尽量扩大实用性概念(例如减少未知数量的不确定性 ) 。我们考虑的是丰富环境,实验与 ~ em Markov 链中的国家相关,我们只能通过选择控制状态过渡的 ~ ~ EM 政策来选择它们。这个问题包含了重要的应用,从强化学习的探索到空间监测任务。我们提出了一个算法 -- -- \ textsc{markov-deign} -- 高效地选择其测量分配 \ emph{ 可能与最佳值相容的政策。算法的性质是顺序的,调整其根据过去测量结果选择的政策(解释)。除了我们的理论分析外,我们还展示了我们在生态监测和药理学方面的应用框架。