High-dimensional time series are a core ingredient of the statistical modeling toolkit, for which numerous estimation methods are known. But when observations are scarce or corrupted, the learning task becomes much harder. The question is: how much harder? In this paper, we study the properties of a partially-observed Vector AutoRegressive process, which is a state-space model endowed with a stochastic observation mechanism. Our goal is to estimate its sparse transition matrix, but we only have access to a small and noisy subsample of the state components. Interestingly, the sampling process itself is random and can exhibit temporal correlations, a feature shared by many realistic data acquisition scenarios. We start by describing an estimator based on the Yule-Walker equation and the Dantzig selector, and we give an upper bound on its non-asymptotic error. Then, we provide a matching minimax lower bound, thus proving near-optimality of our estimator. The convergence rate we obtain sheds light on the role of several key parameters such as the sampling ratio, the amount of noise and the number of non-zero coefficients in the transition matrix. These theoretical findings are commented and illustrated by numerical experiments on simulated data.
翻译:高维时间序列是统计模型工具包的核心要素,对此有许多估算方法。但当观测稀少或腐败时,学习任务就会变得更难得多。问题在于:更难得多?在本文中,我们研究部分观测的矢量自动递减过程的特性,这是一个带有随机观察机制的州空间模型。我们的目标是估算其稀疏的过渡矩阵,但我们只能访问一个小而吵的州组成部分的子样本。有趣的是,抽样过程本身是随机的,可以显示时间相关性,这是许多现实数据获取方案的共同特征。我们首先描述一个基于Yule-Walker方程式和Dantzig选择器的天象仪的天象仪,我们对其非随机性误差进行上层约束。然后,我们提供一个匹配的小型缩微马克斯的下限,从而证明我们测量国家组成部分的近乎最佳性。我们获得的趋同率可以揭示若干关键参数的作用,例如抽样比率、噪音数量以及模拟数据模型和模拟数据模型的数值。