Column Generation (CG) is an iterative algorithm for solving linear programs (LPs) with an extremely large number of variables (columns). CG is the workhorse for tackling large-scale \textit{integer} linear programs, which rely on CG to solve LP relaxations within a branch and price algorithm. Two canonical applications are the Cutting Stock Problem (CSP) and Vehicle Routing Problem with Time Windows (VRPTW). In VRPTW, for example, each binary variable represents the decision to include or exclude a \textit{route}, of which there are exponentially many; CG incrementally grows the subset of columns being used, ultimately converging to an optimal solution. We propose RLCG, the first Reinforcement Learning (RL) approach for CG. Unlike typical column selection rules which myopically select a column based on local information at each iteration, we treat CG as a sequential decision-making problem: the column selected in a given iteration affects subsequent column selections. This perspective lends itself to a Deep Reinforcement Learning approach that uses Graph Neural Networks (GNNs) to represent the variable-constraint structure in the LP of interest. We perform an extensive set of experiments using the publicly available BPPLIB benchmark for CSP and Solomon benchmark for VRPTW. RLCG converges faster and reduces the number of CG iterations by 22.4\% for CSP and 40.9\% for VRPTW on average compared to a commonly used greedy policy. Our code is available at https://github.com/chichengmessi/reinforcement-learning-for-column-generation.git.
翻译:列生成( CG) 是解决线性程序的迭代算法( LPs), 其变量数量极多( 库拉姆斯 ) 。 CG 是处理大型 \ textit{ integer} 线性程序的工作马匹, 它依靠 CG 在一个分支和价格算法中解决 LP 放松 。 两个库式应用程序是 切制股问题 和 时间窗口 ( VRPTW ) 的车辆路运问题 。 例如, 在 VRPTW 中, 每一个二进制变量代表着包含或排除 文本{ rout} 的决定, 其中数量极多; CGG 逐步增加正在使用的列子集, 最终融合到一个最佳的解决方案 。 我们提议了 RLGG, 首个强化学习( REG), 与典型的柱式选择规则不同, 它以本地信息为基础, 我们将C- PLP 的连续决策编号视为一个问题 : 给选择的栏选的栏 影响随后的栏选取 。 这个视角本身是用于深度加固化的 C- PRCPPC/ 的C- sal- saluseruseruseruseral 平均 标准 的C- crealalalalalal 标准 。