Making statements about the performance of trained models on tasks involving new data is one of the primary goals of machine learning, i.e., to understand the generalization power of a model. Various capacity measures try to capture this ability, but usually fall short in explaining important characteristics of models that we observe in practice. In this study, we propose the local effective dimension as a capacity measure which seems to correlate well with generalization error on standard data sets. Importantly, we prove that the local effective dimension bounds the generalization error and discuss the aptness of this capacity measure for machine learning models.
翻译:在涉及新数据的任务方面,说明经过培训的模型的绩效是机器学习的主要目标之一,即了解模型的普及能力。各种能力措施试图捕捉到这种能力,但通常不足以解释我们在实践中观察到的模型的重要特点。在本研究报告中,我们建议将地方有效层面作为能力措施,这似乎与标准数据集的普及错误密切相关。重要的是,我们证明,地方有效层面制约了一般化错误,并讨论了机器学习模型的这一能力措施是否适当。