We study square loss in a realizable time-series framework with martingale difference noise. Our main result is a fast rate excess risk bound which shows that whenever a trajectory hypercontractivity condition holds, the risk of the least-squares estimator on dependent data matches the iid rate order-wise after a burn-in time. In comparison, many existing results in learning from dependent data have rates where the effective sample size is deflated by a factor of the mixing-time of the underlying process, even after the burn-in time. Furthermore, our results allow the covariate process to exhibit long range correlations which are substantially weaker than geometric ergodicity. We call this phenomenon learning with little mixing, and present several examples for when it occurs: bounded function classes for which the $L^2$ and $L^{2+\epsilon}$ norms are equivalent, ergodic finite state Markov chains, various parametric models, and a broad family of infinite dimensional $\ell^2(\mathbb{N})$ ellipsoids. By instantiating our main result to system identification of nonlinear dynamics with generalized linear model transitions, we obtain a nearly minimax optimal excess risk bound after only a polynomial burn-in time.
翻译:我们在一个可实现的时间序列框架中研究方位损失,使用马丁加尔差异噪音。我们的主要结果是一个快速的超高风险约束,表明每当轨迹超分性条件保持时,依赖数据的最小方位估计值在燃烧时间后与iid 比率顺序相匹配的风险。相比之下,从依赖数据中学习的许多现有结果都具有一定的比例,即使在燃烧时间过后,有效样本规模也会被基础过程混合时间的一个系数消缩。此外,我们的结果允许共变过程显示远范围的相关性,这些关系大大弱于几何偏差性。我们称这种现象为微混合学习,并在发生时举几个例子:约束函数类别,其值相当于$%2美元和$%2 ⁇ 2 ⁇ epsilon} 标准。 ERgodic 限定的状态Markov 链, 各种参数模型, 以及无限的维度 $@ell2\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\