We consider a class of learning problems in which an agent liquidates a risky asset while creating both transient price impact driven by an unknown convolution propagator and linear temporary price impact with an unknown parameter. We characterize the trader's performance as maximization of a revenue-risk functional, where the trader also exploits available information on a price predicting signal. We present a trading algorithm that alternates between exploration and exploitation phases and achieves sublinear regrets with high probability. For the exploration phase we propose a novel approach for non-parametric estimation of the price impact kernel by observing only the visible price process and derive sharp bounds on the convergence rate, which are characterised by the singularity of the propagator. These kernel estimation methods extend existing methods from the area of Tikhonov regularisation for inverse problems and are of independent interest. The bound on the regret in the exploitation phase is obtained by deriving stability results for the optimizer and value function of the associated class of infinite-dimensional stochastic control problems. As a complementary result we propose a regression-based algorithm to estimate the conditional expectation of non-Markovian signals and derive its convergence rate.
翻译:我们考虑的是一类学习问题,即代理人清算风险资产,同时造成价格瞬时价格影响,而价格影响是由未知的进化传播器和以未知参数为特点的线性临时价格影响驱动的。我们将交易商的性能定性为收入风险功能的最大化,交易商还利用价格预测信号上的现有信息。我们提出一种交易算法,在勘探和开发阶段之间交替使用,实现亚线性遗憾的可能性很高。在勘探阶段,我们提议对价格影响内核的非参数估计采取一种新办法,只观察可见的价格过程,并得出以推进器独特性为特点的趋同率的锐分界。这些内核估计方法扩大了Tikhoonov地区的现有方法,以反向问题和独立利益为特点。在开发阶段的遗憾在于为无限尺寸控制问题相关类别的最优化和价值功能取得稳定结果。作为补充结果,我们提出了一种基于回归的算法,以估计非马克信号的有条件预期并得出其趋同率。