Adaptive data analysis (ADA) involves a dynamic interaction between an analyst and a dataset owner, where the analyst submits queries sequentially, adapting them based on previous answers. This process can become adversarial, as the analyst may attempt to overfit by targeting non-generalizable patterns in the data. To counteract this, the dataset owner introduces randomization techniques, such as adding noise to the responses. This noise not only helps prevent overfitting but also enhances data privacy. However, it must be carefully calibrated to ensure that the statistical reliability of the responses is not compromised. In this paper, we extend the ADA problem to the context of distributed datasets. Specifically, we consider a scenario where a potentially adversarial analyst interacts with multiple distributed responders through adaptive queries. We assume that the responses are subject to noise introduced by the channel connecting the responders and the analyst. We demonstrate how, through a federated mechanism, this noise can be opportunistically leveraged to enhance the generalizability of ADA, thereby increasing the number of query-response interactions between the analyst and the responders. We illustrate that careful tuning of the transmission amplitude, based on the theoretically achievable bounds, can significantly impact the number of accurately answerable queries.
翻译:暂无翻译