在线资源分配的反对深层学习 (Adversarial Deep Learning for Online Resource Allocation)

Online algorithm is an important branch in algorithm design. Designing online algorithms with a bounded competitive ratio (in terms of worst-case performance) can be hard and usually relies on problem-specific assumptions. Inspired by adversarial training from Generative Adversarial Net (GAN) and the fact that competitive ratio of an online algorithm is based on worst-case input, we adopt deep neural networks to learn an online algorithm for a resource allocation and pricing problem from scratch, with the goal that the performance gap between offline optimum and the learned online algorithm can be minimized for worst-case input. Specifically, we leverage two neural networks as algorithm and adversary respectively and let them play a zero sum game, with the adversary being responsible for generating worst-case input while the algorithm learns the best strategy based on the input provided by the adversary. To ensure better convergence of the algorithm network (to the desired online algorithm), we propose a novel per-round update method to handle sequential decision making to break complex dependency among different rounds so that update can be done for every possible action, instead of only sampled actions. To the best of our knowledge, our work is the first using deep neural networks to design an online algorithm from the perspective of worst-case performance guarantee. Empirical studies show that our updating methods ensure convergence to Nash equilibrium and the learned algorithm outperforms state-of-the-art online algorithms under various settings.

翻译：在线算法是算法设计中的一个重要分支。设计具有约束性竞争比率( 最坏的性能) 的在线算法可能非常困难, 通常取决于特定问题的假设。受Generation Adversarial Net (GAN) 的对抗性培训以及在线算法竞争比率基于最坏的输入, 我们采用深层神经网络来学习在线算法, 以便从头到尾的资源分配和定价问题, 目标是将离线最佳和学习的在线算法之间的性能差距缩小到最坏的输入。具体地说, 我们利用两个神经网络作为算法和对手分别作为算法的对准, 让他们玩一个零和数的游戏, 由对手负责生成最坏的输入, 而算法则根据对手提供的投入学习最佳的战略。为了确保算法网络( 与理想的在线算法) 更好地融合, 我们提出一个全局的全局性决策方法, 以打破不同回合之间的复杂依赖性关系, 从而可以对每一种可能的行动进行更新, 而不是仅仅抽样行动。。为了最准确的计算, 我们的工作是使用最坏的计算法, 我们的计算法是首先使用最坏的最坏的最坏的最坏的最坏的轨进化的进化的进化的进化的进化的进化的进化的进化的进化的进化的进化的进化的进化的进化的进制法, 从从从我们式算法, 从我们进化到从进化到从进的进化到从进化到从我们进制式的进制式的进制式的进制式的进制式的进制式的进制式的进制式法。