Machine learning is the dominant approach to artificial intelligence, through which computers learn from data and experience. In the framework of supervised learning, a necessity for a computer to learn from data accurately and efficiently is to be provided with auxiliary information about the data distribution and target function through the learning model. This notion of auxiliary information relates to the concept of regularization in statistical learning theory. A common feature among real-world datasets is that data domains are multiscale and target functions are well-behaved and smooth. This paper proposes an entropy-based learning model that exploits this data structure and discusses its statistical and computational benefits. The hierarchical learning model is inspired by human beings' logical and progressive easy-to-hard learning mechanism and has interpretable levels. The model apportions computational resources according to the complexity of data instances and target functions. This property can have multiple benefits, including higher inference speed and computational savings in training a model for many users or when training is interrupted. We provide a statistical analysis of the learning mechanism using multiscale entropies and show that it can yield significantly stronger guarantees than uniform convergence bounds.
翻译:机器学习是人工智能的主要方法,计算机通过它从数据和经验中学习。在有监督的学习框架内,计算机必须准确和有效地从数据中学习,通过学习模式提供关于数据分布和目标功能的辅助信息。辅助信息的概念与统计学习理论的正规化概念有关。现实世界数据集的一个共同特点是数据领域是多尺度的,目标功能是妥善保存和平稳的。本文提议了一种基于银河的学习模式,利用这一数据结构并讨论其统计和计算效益。等级学习模式受人类逻辑和渐进的易硬件学习机制的启发,并具有可解释的水平。模型根据数据实例和目标功能的复杂性进行计算资源。这一属性可以产生多种好处,包括许多用户培训模式或培训中断时的推论速度更高和计算节余。我们用多尺度的元素对学习机制进行统计分析,并表明它能够产生比统一汇合界限要强得多的保证。