Recent work has shown potential in using Mixed Integer Programming (MIP) solvers to optimize certain aspects of neural networks (NNs). However the intriguing approach of training NNs with MIP solvers is under-explored. State-of-the-art-methods to train NNs are typically gradient-based and require significant data, computation on GPUs, and extensive hyper-parameter tuning. In contrast, training with MIP solvers does not require GPUs or heavy hyper-parameter tuning, but currently cannot handle anything but small amounts of data. This article builds on recent advances that train binarized NNs using MIP solvers. We go beyond current work by formulating new MIP models which improve training efficiency and which can train the important class of integer-valued neural networks (INNs). We provide two novel methods to further the potential significance of using MIP to train NNs. The first method optimizes the number of neurons in the NN while training. This reduces the need for deciding on network architecture before training. The second method addresses the amount of training data which MIP can feasibly handle: we provide a batch training method that dramatically increases the amount of data that MIP solvers can use to train. We thus provide a promising step towards using much more data than before when training NNs using MIP models. Experimental results on two real-world data-limited datasets demonstrate that our approach strongly outperforms the previous state of the art in training NN with MIP, in terms of accuracy, training time and amount of data. Our methodology is proficient at training NNs when minimal training data is available, and at training with minimal memory requirements -- which is potentially valuable for deploying to low-memory devices.
翻译:近期的研究表明,将混合整数规划(MIP)求解器用于优化神经网络(NN)的某些方面具有潜力。然而,使用MIP求解器训练NN的有趣方法尚未充分探索。目前训练NN的最先进的方法通常是基于梯度的,需要大量的数据、GPU上的计算以及广泛的超参数调优。相比之下,使用MIP求解器训练不需要GPU或繁重的超参数调优,但目前只能处理少量数据。本文基于最近将二值化NN训练使用MIP求解器的先进工作。我们通过制定新的MIP模型超越当前工作,提高训练效率,并能够训练重要的整数值神经网络(INN)。我们提供了两种新方法来进一步利用使用MIP训练NN的潜力。第一种方法在训练过程中优化NN中的神经元数量。这减少了在训练之前决定网络结构的需求。第二种方法解决了MIP可处理的训练数据量的问题:我们提供一种批处理训练方法,大大增加了MIP求解器可以用于训练的数据量。因此,我们提供了一种有前途的方法,可以使用比以前更多的数据使用MIP模型训练NN。两个真实世界的数据有限数据集的实验结果表明,我们的方法在准确性、训练时间和数据量方面明显优于以前的MIP训练NN的最新技术水平。我们的方法在最小的训练数据可用的情况下,以及在最小的内存要求下进行训练都是有效的 - 这对于部署到低内存设备可能非常有价值。