The goal of this thesis is to develop the optimisation and generalisation theoretic foundations of learning in artificial neural networks. On optimisation, a new theoretical framework is proposed for deriving architecture-dependent first-order optimisation algorithms. The approach works by combining a "functional majorisation" of the loss function with "architectural perturbation bounds" that encode an explicit dependence on neural architecture. The framework yields optimisation methods that transfer hyperparameters across learning problems. On generalisation, a new correspondence is proposed between ensembles of networks and individual networks. It is argued that, as network width and normalised margin are taken large, the space of networks that interpolate a particular training set concentrates on an aggregated Bayesian method known as a "Bayes point machine". This correspondence provides a route for transferring PAC-Bayesian generalisation theorems over to individual networks. More broadly, the correspondence presents a fresh perspective on the role of regularisation in networks with vastly more parameters than data.
翻译:该论文的目的是发展人工神经网络中学习的最优化和一般理论基础。 关于优化,提出了一个新的理论框架,用于得出依赖结构的一级优化算法。该方法通过将损失函数的“功能性主要化”与“建筑外扰”的“建筑外扰圈”结合起来发挥作用,该方法将明确依赖神经结构的“功能性主要化”与“建筑外扰”结合起来。这个框架产生将超光度转换到跨学习问题的优化方法。关于一般化,在网络和单个网络的组合之间提出了新的通信。据论证,随着网络宽度和正常边缘的扩大,将特定培训组合的网络空间集中在被称为“拜耶斯点机器”的综合方法上。这一通信为将PAC-Bayesian光谱转换到单个网络提供了一条途径。更广泛地说,该通信对网络中常规化的作用提出了新的视角,其参数远远超过数据。