Recent work has shown potential in using Mixed Integer Programming (MIP) solvers to optimize certain aspects of neural networks (NNs). However the intriguing approach of training NNs with MIP solvers is under-explored. State-of-the-art-methods to train NNs are typically gradient-based and require significant data, computation on GPUs, and extensive hyper-parameter tuning. In contrast, training with MIP solvers does not require GPUs or heavy hyper-parameter tuning, but currently cannot handle anything but small amounts of data. This article builds on recent advances that train binarized NNs using MIP solvers. We go beyond current work by formulating new MIP models which improve training efficiency and which can train the important class of integer-valued neural networks (INNs). We provide two novel methods to further the potential significance of using MIP to train NNs. The first method optimizes the number of neurons in the NN while training. This reduces the need for deciding on network architecture before training. The second method addresses the amount of training data which MIP can feasibly handle: we provide a batch training method that dramatically increases the amount of data that MIP solvers can use to train. We thus provide a promising step towards using much more data than before when training NNs using MIP models. Experimental results on two real-world data-limited datasets demonstrate that our approach strongly outperforms the previous state of the art in training NN with MIP, in terms of accuracy, training time and amount of data. Our methodology is proficient at training NNs when minimal training data is available, and at training with minimal memory requirements -- which is potentially valuable for deploying to low-memory devices.
翻译:最近的工作显示,在使用混合整形编程(MIP)解析器优化神经网络的某些方面方面具有潜力。然而,利用MIP解析器培训NNS的令人感兴趣的方法尚未得到充分探索。培训NNS的先进方法通常以梯度为基础,需要大量数据、计算GPU和广泛的超参数调。相比之下,使用MIP解析器的培训不需要GPS或重超参数调,但目前只能处理少量的数据。这篇文章以最近用MIP解析器培训NNNS的双轨方法为基础。我们超越目前的工作,开发新的MIP模型来提高培训效率和培训NNNNNNCS的先进方法。我们提供了两种新的方法,以进一步利用MIP来培训NNNP解析器的潜在意义。第一个方法是利用NIP的低调调调调调调调调,但目前只能用少量的数据。这减少了在培训前决定网络架构的需要。第二个方法解决了培训中的数据数量,因此,在使用MIP的极值数据中,我们使用最有价值的数据方法可以提供最有价值的数据。