In part \textit{I} we proposed a structure for a general Hypotheses Space $\mathcal{H}$, the Learning Space $\mathbb{L}(\mathcal{H})$, which can be employed to avoid \textit{overfitting} when estimating in a complex space with relative shortage of examples. Also, we presented the U-curve property, which can be taken advantage of in order to select a Hypotheses Space without exhaustively searching $\mathbb{L}(\mathcal{H})$. In this paper, we carry further our agenda, by showing the consistency of a model selection framework based on Learning Spaces, in which one selects from data the Hypotheses Space on which to learn. The method developed in this paper adds to the state-of-the-art in model selection, by extending Vapnik-Chervonenkis Theory to \textit{random} Hypotheses Spaces, i.e., Hypotheses Spaces learned from data. In this framework, one estimates a random subspace $\hat{\mathcal{M}} \in \mathbb{L}(\mathcal{H})$ which converges with probability one to a target Hypotheses Space $\mathcal{M}^{\star} \in \mathbb{L}(\mathcal{H})$ with desired properties. As the convergence implies asymptotic unbiased estimators, we have a consistent framework for model selection, showing that it is feasible to learn the Hypotheses Space from data. Furthermore, we show that the generalization errors of learning on $\hat{\mathcal{M}}$ are lesser than those we commit when learning on $\mathcal{H}$, so it is more efficient to learn on a subspace learned from data.
翻译:在 & textit{ I} 部分中, 我们提议了一个通用假体空间的结构 $\ mathcal{ h}$, 学习空间 $\ mathbb{L} (mathcal{H}) $ 学习空间 $\ textit{ 校验} 用于在复杂空间和相对缺少示例的情况下进行估算时避免\ textitit{ 校验} 。 此外, 我们展示了 U- 曲线属性, 用于选择一个不需详细搜索 $\ mathb{ mathcal{ 校验} 的假体空间 。 在本文中, 我们通过显示基于学习空间的模型选择框架的一致性, 其中从数据选择 Hypothe Space 来学习。 本文中开发的方法增加了模型选择中的状态, 当将 Vapnik- Cheronenkis 模型扩展到 extitalitalitration{ {random} Hypothes Space, ire. i.