The vast majority of the work on adaptive data analysis focuses on the case where the samples in the dataset are independent. Several approaches and tools have been successfully applied in this context, such as differential privacy, max-information, compression arguments, and more. The situation is far less well-understood without the independence assumption. We embark on a systematic study of the possibilities of adaptive data analysis with correlated observations. First, we show that, in some cases, differential privacy guarantees generalization even when there are dependencies within the sample, which we quantify using a notion we call Gibbs-dependence. We complement this result with a tight negative example. Second, we show that the connection between transcript-compression and adaptive data analysis can be extended to the non-iid setting.
翻译:适应性数据分析的绝大多数工作侧重于数据集样本是独立的案例。在这方面,已经成功地应用了几种办法和工具,例如不同的隐私、最大信息、压缩参数等等。没有独立假设,情况远不那么清楚。我们开始系统研究适应性数据分析的可能性,并进行相关观测。首先,我们表明,在某些情况下,即使样本中存在依赖性,但差异性隐私也保证了普遍性,我们用我们称之为Gibbbs-依赖性的概念来量化。我们用一个严格的负面例子来补充这一结果。第二,我们表明,记录压缩和适应性数据分析之间的联系可以扩大到非二位设置。