The Adaptive Data Analysis (ADA) problem, where an analyst interacts with a dataset through statistical queries, is often studied under the assumption of adversarial analyst behavior. To decrease this gap, we propose a revised model of ADA that accounts for more constructive interactions between the analysts and the data, where the goal is to enhance inference accuracy. Specifically, we focus on distribution estimation as a central objective guiding analyst's queries. The problem is addressed within a non-parametric Bayesian framework, capturing the flexibility and dynamic evolution of analyst's beliefs. Our hierarchical approach leverages P\'olya trees (PTs) as priors over the distribution space, facilitating the adaptive selection of counting queries to efficiently reduce the estimation error without increasing the number of queries. Furthermore, with its interpretability and conjugacy, the proposed framework allows for intuitive conversion of subjective beliefs into objective priors and their effortless updates to posteriors. Using theoretical derivations, we formalize the PT-based solution as a computational algorithm. Simulations further demonstrate its effectiveness in distribution estimation tasks compared to the non-adaptive approach. By aligning with real-world applications, this structured ADA framework fosters opportunities for collaborative research in related areas, such as human-in-the-loop systems and cognitive studies of belief updating.
翻译:暂无翻译