We address the problem of variable selection in a high-dimensional but sparse mean model, under the additional constraint that only privatised data are available for inference. The original data are vectors with independent entries having a symmetric, strongly log-concave distribution on $\mathbb{R}$. For this purpose, we adopt a recent generalisation of classical minimax theory to the framework of local $\alpha-$differential privacy. We provide lower and upper bounds on the rate of convergence for the expected Hamming loss over classes of at most $s$-sparse vectors whose non-zero coordinates are separated from $0$ by a constant $a>0$. As corollaries, we derive necessary and sufficient conditions (up to log factors) for exact recovery and for almost full recovery. When we restrict our attention to non-interactive mechanisms that act independently on each coordinate our lower bound shows that, contrary to the non-private setting, both exact and almost full recovery are impossible whatever the value of $a$ in the high-dimensional regime such that $n \alpha^2/ d^2\lesssim 1$. However, in the regime $n\alpha^2/d^2\gg \log(d)$ we can exhibit a critical value $a^*$ (up to a logarithmic factor) such that exact and almost full recovery are possible for all $a\gg a^*$ and impossible for $a\leq a^*$. We show that these results can be improved when allowing for all non-interactive (that act globally on all coordinates) locally $\alpha-$differentially private mechanisms in the sense that phase transitions occur at lower levels.
翻译:我们在一个高维但稀少的中值模型中解决变量选择问题,因为有额外限制,即只有精度数据才能进行推断。原始数据是独立条目的矢量,其对称、强烈的对数组合分布在$\mathbb{R}$上。为此目的,我们最近将经典微缩理论的概括化到本地的 $\ alpha-$差异隐私框架。我们提供了在最多为美元且非零坐标由常数 >0美元与非零坐标从美元坐标分离的类别中预期损耗的趋同率的下限和上限。作为对精确恢复和几乎完全恢复而言,我们提出了必要和充分的条件(根据日志系数),当我们把注意力限制在不互动机制上,而每个独立行动则协调我们的较低约束显示,与非私人环境相比,准确和几乎完全恢复速度都是不可能实现的,在高维值制度中(美元==美元=美元=美元/ d=美元),对于整个正值的回收机制来说,这些精确值在正值中是可能的。