We consider a high-dimensional mean estimation problem over a binary hidden Markov model, which illuminates the interplay between memory in data, sample size, dimension, and signal strength in statistical inference. In this model, an estimator observes $n$ samples of a $d$-dimensional parameter vector $\theta_{*}\in\mathbb{R}^{d}$, multiplied by a random sign $ S_i $ ($1\le i\le n$), and corrupted by isotropic standard Gaussian noise. The sequence of signs $\{S_{i}\}_{i\in[n]}\in\{-1,1\}^{n}$ is drawn from a stationary homogeneous Markov chain with flip probability $\delta\in[0,1/2]$. As $\delta$ varies, this model smoothly interpolates two well-studied models: the Gaussian Location Model for which $\delta=0$ and the Gaussian Mixture Model for which $\delta=1/2$. Assuming that the estimator knows $\delta$, we establish a nearly minimax optimal (up to logarithmic factors) estimation error rate, as a function of $\|\theta_{*}\|,\delta,d,n$. We then provide an upper bound to the case of estimating $\delta$, assuming a (possibly inaccurate) knowledge of $\theta_{*}$. The bound is proved to be tight when $\theta_{*}$ is an accurately known constant. These results are then combined to an algorithm which estimates $\theta_{*}$ with $\delta$ unknown a priori, and theoretical guarantees on its error are stated.
翻译:在二进制隐藏的 Markov 模型中, 我们考虑一个高维平均值估算问题, 它揭示了数据内存、 样本大小、 尺寸和 统计推断中信号强度之间的相互作用。 在这个模型中, 估计器观察了 $\\ delta ⁇ in\ mathbb{R ⁇ d} 美元, 乘以随机符号 $S_ i (1\le i\le n$) 乘以一个随机符号 $S_ i i i\le n$, 被异性标准 Gaussian 噪音所腐蚀。 信号的序列 $@ i ⁇ i\ i\ in[ n] $ * 1\\ 美元 美元, 统计器从一个固定的同质的 Markov 链中抽取 $\ delta\ in [0\ 2美元] 。 由于 delta occildlection $, 这个模型可以顺利地将两个模型相推导: Qasta =0, lax lax lax modeal sudealdealdealalalal mess suprilate express a cal exprilate a cal exprilence a cal expressal expressionalbilence.