Linear regression is a fundamental tool for statistical analysis. This has motivated the development of linear regression methods that also satisfy differential privacy and thus guarantee that the learned model reveals little about any one data point used to construct it. However, existing differentially private solutions assume that the end user can easily specify good data bounds and hyperparameters. Both present significant practical obstacles. In this paper, we study an algorithm which uses the exponential mechanism to select a model with high Tukey depth from a collection of non-private regression models. Given $n$ samples of $d$-dimensional data used to train $m$ models, we construct an efficient analogue using an approximate Tukey depth that runs in time $O(d^2n + dm\log(m))$. We find that this algorithm obtains strong empirical performance in the data-rich setting with no data bounds or hyperparameter selection required.
翻译:线性回归是统计分析的基本工具。 这促使开发了线性回归方法,这些方法也满足了差异隐私,从而保证了所学模型很少揭示用于构建该模型的任何数据点。 但是,现有的有差别的私人解决方案假定终端用户可以很容易地指定良好的数据边框和超参数。 两者都存在重大的实际障碍。 在本文中, 我们研究一种算法, 使用指数机制从非私人回归模型的收集中选择具有高 Tukey 深度的模型。 鉴于用于培训美元模型的美元维数据样本为零, 我们用大约的 Tukey 深度构建一个高效的模拟, 该深度运行时间为 $O( d ⁇ 2n + dm\log (m) $ 。 我们发现, 在数据丰富且无数据边框或超参数选择要求的数据环境中, 该算法获得了很强的经验性表现。