One of the fundamental assumptions in stochastic control of continuous time processes is that the dynamics of the underlying (diffusion) process is known. This is, however, usually obviously not fulfilled in practice. On the other hand, over the last decades, a rich theory for nonparametric estimation of the drift (and volatility) for continuous time processes has been developed. The aim of this paper is bringing together techniques from stochastic control with methods from statistics for stochastic processes to find a way to both learn the dynamics of the underlying process and control in a reasonable way at the same time. More precisely, we study a long-term average impulse control problem, a stochastic version of the classical Faustmann timber harvesting problem. One of the problems that immediately arises is an exploration-exploitation dilemma as is well known for problems in machine learning. We propose a way to deal with this issue by combining exploration and exploitation periods in a suitable way. Our main finding is that this construction can be based on the rates of convergence of estimators for the invariant density. Using this, we obtain that the average cumulated regret is of uniform order $O({T^{-1/3}})$.
翻译:连续时间过程的随机控制的基本假设之一是,基本(扩散)过程的动态是已知的,但通常在实践中没有实现。另一方面,在过去几十年中,对连续时间过程的漂移(和挥发性)的不参数估计有了丰富的理论。本文件的目的是将随机控制的技术与随机过程的统计方法结合起来,找到一种方法,同时以合理的方式了解基本过程和控制的动态。更准确地说,我们研究的是长期平均脉冲控制问题,这是古典Faustmann木材采伐问题的一个随机版本。立即出现的问题之一是对机器学习问题所熟知的勘探-开发两难困境。我们建议一种方法,通过以适当的方式将勘探和开采期结合起来来处理这个问题。我们的主要结论是,这种构建可以基于变量密度的估算者趋同率。我们利用这个方法,我们了解到平均累积的遗憾是统一的 USO QQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQ))