Wide networks are often believed to have nice optimization landscape, but what rigorous results can we prove? To understand the benefit of width, it is important to identify the difference between wide and narrow networks. In this work, we prove that from narrow to wide networks, there is a phase transition from having sub-optimal basins to no sub-optimal basins. Specifically, we prove two results: on the positive side, for any continuous activation functions, the loss surface of a class of wide networks has no sub-optimal basin, where "basin" is defined as the set-wise strict local minimum; on the negative side, for a large class of networks with width below a threshold, we construct strict local minima that are not global. These two results together show the phase transition from narrow to wide networks.
翻译:广域网通常被认为具有良好的优化景观,但我们能证明什么严格的效果?为了了解宽度的好处,必须确定宽度和狭窄的网络之间的差别。在这项工作中,我们证明从狭窄的网络到宽度的网络,有一个从亚最佳盆地到没有亚最佳盆地的阶段过渡。具体地说,我们证明两个结果:从积极的方面看,对于任何连续的激活功能来说,一个宽域网的流失表面没有亚最佳盆地,在那里,“地盘”被定义为设定的严格的地方最低限度;从消极方面看,对于宽度低于临界值的一大批网络来说,我们建造了严格的非全球性的本地小型网络。这两个结果一起显示了从狭窄的网络到宽度的阶段过渡。