We show how any PAC learning algorithm that works under the uniform distribution can be transformed, in a blackbox fashion, into one that works under an arbitrary and unknown distribution $\mathcal{D}$. The efficiency of our transformation scales with the inherent complexity of $\mathcal{D}$, running in $\mathrm{poly}(n, (md)^d)$ time for distributions over $\{\pm 1\}^n$ whose pmfs are computed by depth-$d$ decision trees, where $m$ is the sample complexity of the original algorithm. For monotone distributions our transformation uses only samples from $\mathcal{D}$, and for general ones it uses subcube conditioning samples. A key technical ingredient is an algorithm which, given the aforementioned access to $\mathcal{D}$, produces an optimal decision tree decomposition of $\mathcal{D}$: an approximation of $\mathcal{D}$ as a mixture of uniform distributions over disjoint subcubes. With this decomposition in hand, we run the uniform-distribution learner on each subcube and combine the hypotheses using the decision tree. This algorithmic decomposition lemma also yields new algorithms for learning decision tree distributions with runtimes that exponentially improve on the prior state of the art -- results of independent interest in distribution learning.
翻译:利用分布分解提升统一学习器
翻译摘要:
我们展示了任意 PAC 学习算法在黑盒的方式下,如何转换为一个适用于任意未知分布 $\mathcal{D}$ 的学习器。我们的转换效率随着 $\mathcal{D}$ 的内在复杂度而缩放,对于在 $\{\pm 1\}^n$ 上的分布,其 pmf 由深度为 $d$ 的决策树计算得出,对于任意 $m$ 的样本复杂度,时间复杂度为 $\mathrm{Poly}(n, (md)^d)$。对于单调分布我们的转换只使用 $\mathcal{D}$ 的样本,对于一般的分布它使用了子立方体条件的样本。一个关键技术方法是我们能在 $\mathcal{D}$ 的访问下,产生一个最优决策树分解方法:将 $\mathcal{D}$ 近似为混合均匀分布的子立方体。有了这样的分解后,我们运行统一分布学习器在每个子立方体上,然后借助决策树来组合假设。这个算法分解引理也为学习决策树分布提供了新算法,其运行时间比之前的状态显著地优化,这些结果是分布学习的独立利益。