We introduce a new family of deep neural network models. Instead of specifying a discrete sequence of hidden layers, we parameterize the derivative of the hidden state using a neural network. The output of the network is computed using a black-box differential equation solver. These continuous-depth models have constant memory cost, adapt their evaluation strategy to each input, and can explicitly trade numerical precision for speed. We demonstrate these properties in continuous-depth residual networks and continuous-time latent variable models. We also construct continuous normalizing flows, a generative model that can train by maximum likelihood, without partitioning or ordering the data dimensions. For training, we show how to scalably backpropagate through any ODE solver, without access to its internal operations. This allows end-to-end training of ODEs within larger models.
翻译:我们引入了一个由深神经网络模型组成的新组合。 我们不指定隐藏层的离散序列,而是使用神经网络对隐藏状态的衍生物进行参数化。 网络的输出使用黑盒差异方程式求解器计算。 这些连续深度模型具有恒定的内存成本, 根据每个输入量调整其评价策略, 并且可以明确用数字精确度来交换速度。 我们在连续深度的残余网络和连续时间潜伏变量模型中展示了这些特性。 我们还构建了连续的正常流, 这是一种基因化模型, 可以用最大的可能性来训练, 而不分割或命令数据尺寸 。 在培训中, 我们展示了如何通过任何 ODE 解算器, 而不使用内部操作 。 这样就可以在更大的模型中进行终端到终端的 ODE 培训 。