In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms. For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime. As the latter is known to exhibit a heavy-tail distribution, an algorithm is normally stopped when exceeding a predefined upper time limit. As a consequence, machine learning methods used to optimize an algorithm selection strategy in a data-driven manner need to deal with right-censored samples, a problem that has received little attention in the literature so far. In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem. Moreover, we adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon. In an extensive experimental evaluation on an adapted version of the ASlib benchmark, we demonstrate that theoretically well-founded methods based on Thompson sampling perform specifically strong and improve in comparison to existing methods.
翻译:在在线算法选择(OAS)中,一个算法问题类的事例会向代理机构逐个介绍,代理机构必须从一套固定的候选算法中快速地从一套固定的候选算法中选择出一种假定的最佳算法。对于诸如相对性(SAT)等决策问题,质量通常是指算法的运行时间。据了解,算法的分布很重,当算法超过预定的上限时,算法通常就会停止。因此,以数据驱动的方式优化算法选择战略的机器学习方法需要以数据驱动的方式处理正确的审查样品,而这个问题迄今在文献中很少引起注意。在这项工作中,我们重新审视美洲组织的多臂波段算法,并讨论其处理该问题的能力。此外,我们将它们调整为时间性损失,允许部分审查数据,同时保持空间和时间的兼容性,独立于时间范围。在对改编订版的ASlib基准进行广泛的实验性评价时,我们证明,基于Thompson抽样的理论上有充分根据的方法特别强大,比现有方法要改进。