以虚假发现为基础取得进展 (Making Progress Based on False Discoveries)

We consider the question of adaptive data analysis within the framework of convex optimization. We ask how many samples are needed in order to compute $\epsilon$-accurate estimates of $O(1/\epsilon^2)$ gradients queried by gradient descent, and we provide two intermediate answers to this question. First, we show that for a general analyst (not necessarily gradient descent) $\Omega(1/\epsilon^3)$ samples are required. This rules out the possibility of a foolproof mechanism. Our construction builds upon a new lower bound (that may be of interest of its own right) for an analyst that may ask several non adaptive questions in a batch of fixed and known $T$ rounds of adaptivity and requires a fraction of true discoveries. We show that for such an analyst $\Omega (\sqrt{T}/\epsilon^2)$ samples are necessary. Second, we show that, under certain assumptions on the oracle, in an interaction with gradient descent $\tilde \Omega(1/\epsilon^{2.5})$ samples are necessary. Our assumptions are that the oracle has only \emph{first order access} and is \emph{post-hoc generalizing}. First order access means that it can only compute the gradients of the sampled function at points queried by the algorithm. Our assumption of \emph{post-hoc generalization} follows from existing lower bounds for statistical queries. More generally then, we provide a generic reduction from the standard setting of statistical queries to the problem of estimating gradients queried by gradient descent. These results are in contrast with classical bounds that show that with $O(1/\epsilon^2)$ samples one can optimize the population risk to accuracy of $O(\epsilon)$ but, as it turns out, with spurious gradients.

翻译：我们考虑在 convex 优化框架内的适应性数据分析问题。我们询问需要多少样本来计算 $\ epsilon$- 准确的估算值为 $O (1/\ epsilon=2) 。我们提供两个中间答案来回答这个问题。首先, 我们显示对于普通分析师( 不一定是梯度下降) $( 1/\\ epsilon=3) 来说, 需要 0. omega (1/\\ epsilon} 3) 样本。这排除了隐蔽机制的可能性。我们的构建基于一个新的更低的基值( 可能符合其自身利益), 分析师可能会在一组固定和已知的 $T 的调整周期中提出一些不适应性的问题。我们的假设显示, 对于这样的分析师来说, $( sqrentreleg) ( t}/\ eepslon2) 样本是必需的。根据某些假设, 在与梯度的基值上, entremodeal lex exlection exlex extial dies 。