A natural goal when designing online learning algorithms for non-stationary environments is to bound the regret of the algorithm in terms of the temporal variation of the input sequence. Intuitively, when the variation is small, it should be easier for the algorithm to achieve low regret, since past observations are predictive of future inputs. Such data-dependent "pathlength" regret bounds have recently been obtained for a wide variety of online learning problems, including OCO and bandits. We obtain the first pathlength regret bounds for online control and estimation (e.g. Kalman filtering) in linear dynamical systems. The key idea in our derivation is to reduce pathlength-optimal filtering and control to certain variational problems in robust estimation and control; these reductions may be of independent interest. Numerical simulations confirm that our pathlength-optimal algorithms outperform traditional $H_2$ and $H_{\infty}$ algorithms when the environment varies over time.
翻译:在设计非静止环境的在线学习算法时,一个自然目标是将算法的遗憾与输入序列的时间变异联系起来。 直观地说,如果变异较小,算法应更容易实现低遗憾, 因为过去的观测是预测未来投入的。 最近为包括OCO和土匪在内的广泛的在线学习问题获得了这类数据依赖的“ 病态” 遗憾界限。 我们在线性动态系统中获得了第一个线性控制和估算(例如,Kalman过滤)的路径长遗憾界限。 我们的衍生关键思想是减少路径长的过滤和控制,以适应在稳健估计和控制方面的某些变异问题; 这些减法可能是独立的。 数字模拟证实,当环境随时间变化时,我们的路径长-最佳算法比传统的$H_2美元和$H ⁇ infty} 。