Hierarchical learning algorithms that gradually approximate a solution to a data-driven optimization problem are essential to decision-making systems, especially under limitations on time and computational resources. In this study, we introduce a general-purpose hierarchical learning architecture that is based on the progressive partitioning of a possibly multi-resolution data space. The optimal partition is gradually approximated by solving a sequence of optimization sub-problems that yield a sequence of partitions with increasing number of subsets. We show that the solution of each optimization problem can be estimated online using gradient-free stochastic approximation updates. As a consequence, a function approximation problem can be defined within each subset of the partition and solved using the theory of two-timescale stochastic approximation algorithms. This simulates an annealing process and defines a robust and interpretable heuristic method to gradually increase the complexity of the learning architecture in a task-agnostic manner, giving emphasis to regions of the data space that are considered more important according to a predefined criterion. Finally, by imposing a tree structure in the progression of the partitions, we provide a means to incorporate potential multi-resolution structure of the data space into this approach, significantly reducing its complexity, while introducing hierarchical variable-rate feature extraction properties similar to certain classes of deep learning architectures. Asymptotic convergence analysis and experimental results are provided for supervised and unsupervised learning problems.
翻译:分层学习算法逐渐逼近数据驱动优化问题的解,对于决策系统至关重要,特别是在时间和计算资源受限的情况下。在本研究中,我们引入了一种通用的分层学习架构,它基于可能具有多分辨率的数据空间的渐进分区。最优分区逐步逼近,通过解决一系列优化子问题得到,从而产生一个具有逐渐增加子集数量的分区序列。我们展示了每个优化问题的解可以使用无梯度随机逼近更新在线估计。因此,在分区的每个子集内可以定义一个函数逼近问题,使用两倍时间尺度随机逼近算法的理论进行解决。这模拟了一个退火过程,并定义了一种稳健的和可解释的启发式方法,以一种任务不可知的方式逐渐增加学习架构的复杂性,强调根据预定义的标准视为更重要的数据空间的区域。最后,通过在分区的进展中强加树结构,我们提供了一种将数据空间的潜在多分辨率结构纳入这种方法的手段,从而显着降低了其复杂性,同时引入了类似于某些深度学习架构的分层可变速率特征提取属性。我们提供了监督和无监督学习问题的渐进收敛分析和实验结果。