The parameters of a machine learning model are typically learned by minimizing a loss function on a set of training data. However, this can come with the risk of overtraining; in order for the model to generalize well, it is of great importance that we are able to find the optimal parameter for the model on the entire population -- not only on the given training sample. In this paper, we construct valid confidence sets for this optimal parameter of a machine learning model, which can be generated using only the training data without any knowledge of the population. We then show that studying the distribution of this confidence set allows us to assign a notion of confidence to arbitrary regions of the parameter space, and we demonstrate that this distribution can be well-approximated using bootstrapping techniques.
翻译:机械学习模式的参数通常通过将损失功能降到最低程度而从一组培训数据中吸取。然而,这可能会带来过度培训的风险;为了使模型能够很好地推广,非常重要的是,我们能够为整个人口模型找到最佳参数 -- -- 不仅在给定的培训样本中如此。在本文中,我们为机器学习模式的这一最佳参数构建了有效的信任套件,该最佳参数只能使用不为民众所知的培训数据生成。我们随后表明,研究这一信任套件的分布,使我们能够将信任概念分配给任意的参数空间区域,我们证明,这种分布可以利用靴子技术非常接近。