Linear regression is a fundamental tool for statistical analysis. This has motivated the development of linear regression methods that also satisfy differential privacy and thus guarantee that the learned model reveals little about any one data point used to construct it. However, existing differentially private solutions assume that the end user can easily specify good data bounds and hyperparameters. Both present significant practical obstacles. In this paper, we study an algorithm which uses the exponential mechanism to select a model with high Tukey depth from a collection of non-private regression models. Given $n$ samples of $d$-dimensional data used to train $m$ models, we construct an efficient analogue using an approximate Tukey depth that runs in time $O(d^2n + dm\log(m))$. We find that this algorithm obtains strong empirical performance in the data-rich setting with no data bounds or hyperparameter selection required.
翻译:线性回归是一种基本的统计分析工具。这促使人们开发出满足差分隐私的线性回归方法,从而保证学习到的模型不会泄露任何一个用于构建其的数据点的信息。然而,现有的差分隐私解决方案都假设最终用户能够轻松指定好的数据范围和超参数。两者都面临着重大的实际障碍。在本文中,我们研究了一种算法,它使用指数机制从一组非私有回归模型中选择具有高Tukey深度的模型。给定用于训练m个模型的d维数据的n个样本,我们构造出一个使用近似的Tukey深度的高效算法,运行时间为 $O(d^2n + dm\log(m))$。我们发现,在无需数据范围或超参数选择的数据丰富性环境中,该算法获得了强大的实证性能。