The last decade has witnessed an experimental revolution in data science and machine learning, epitomised by deep learning methods. Indeed, many high-dimensional learning tasks previously thought to be beyond reach -- such as computer vision, playing Go, or protein folding -- are in fact feasible with appropriate computational scale. Remarkably, the essence of deep learning is built from two simple algorithmic principles: first, the notion of representation or feature learning, whereby adapted, often hierarchical, features capture the appropriate notion of regularity for each task, and second, learning by local gradient-descent type methods, typically implemented as backpropagation. While learning generic functions in high dimensions is a cursed estimation problem, most tasks of interest are not generic, and come with essential pre-defined regularities arising from the underlying low-dimensionality and structure of the physical world. This text is concerned with exposing these regularities through unified geometric principles that can be applied throughout a wide spectrum of applications. Such a 'geometric unification' endeavour, in the spirit of Felix Klein's Erlangen Program, serves a dual purpose: on one hand, it provides a common mathematical framework to study the most successful neural network architectures, such as CNNs, RNNs, GNNs, and Transformers. On the other hand, it gives a constructive procedure to incorporate prior physical knowledge into neural architectures and provide principled way to build future architectures yet to be invented.
翻译:过去十年中,在数据科学和机器学习方面发生了一场实验性革命,以深层次学习方法为缩影。事实上,许多先前被认为无法完成的高层次学习任务,例如计算机视觉、玩游戏、或蛋白折叠等,实际上都是可行的,具有适当的计算规模。令人瞩目的是,深层次学习的精髓来自两个简单的算法原则:第一,代表或特征学习的概念,根据这种概念,每个任务都有适当的规律性概念,经常是等级性的特征,第二,地方梯度-白种方法的学习,通常作为反向调整实施。一方面,学习高维度通用功能是一个诅咒的估计问题,但大多数感兴趣的任务并不是通用的,而且具有因物理世界的基本低维度和结构而预先确定的基本规律性。这一案文涉及通过统一的几何原则来暴露这些规律性,这些原则可以适用于广泛的应用。在Felix Klein'的Erlangen 方案的精神下,这种“几何级统一”努力,通常是一个双重目的:一方面,它提供了一个共同的数学框架,用来研究最成功的普通的数学框架,而未来网络结构,例如GNISNR的架构,它提供最成功的结构。