Preventing poaching through ranger patrols protects endangered wildlife, directly contributing to the UN Sustainable Development Goal 15 of life on land. Combinatorial bandits have been used to allocate limited patrol resources, but existing approaches overlook the fact that each location is home to multiple species in varying proportions, so a patrol benefits each species to differing degrees. When some species are more vulnerable, we ought to offer more protection to these animals; unfortunately, existing combinatorial bandit approaches do not offer a way to prioritize important species. To bridge this gap, (1) We propose a novel combinatorial bandit objective that trades off between reward maximization and also accounts for prioritization over species, which we call ranked prioritization. We show this objective can be expressed as a weighted linear sum of Lipschitz-continuous reward functions. (2) We provide RankedCUCB, an algorithm to select combinatorial actions that optimize our prioritization-based objective, and prove that it achieves asymptotic no-regret. (3) We demonstrate empirically that RankedCUCB leads to up to 38% improvement in outcomes for endangered species using real-world wildlife conservation data. Along with adapting to other challenges such as preventing illegal logging and overfishing, our no-regret algorithm addresses the general combinatorial bandit problem with a weighted linear objective.
翻译:通过护林员巡逻防止偷猎保护濒危野生动植物,直接有助于实现联合国可持续发展目标15的陆地生命目标。混合强盗被用来分配有限的巡逻资源,但现有办法忽略了以下事实:每个地点都有不同比例的多种物种,因此巡逻使每个物种在不同程度上受益。当某些物种更加脆弱时,我们应该为这些动物提供更多的保护;不幸的是,现有的组合强盗方法并不能提供优先处理重要物种的方法。为了缩小这一差距,(1) 我们提议了一个新颖的组合强盗目标,在奖励最大化和说明物种优先排序(我们称之为优先排序)之间进行交易。我们表明,这一目标可以表现为利普西茨持续不懈的奖励功能的加权线性总和。 (2) 我们提供分级CUCB算法,用于选择组合行动,优化我们的优先排序目标,并证明它能够实现无差别的无区别性。(3) 我们从经验上证明,分级CUCB导致利用现实世界野生动物养护数据,使濒危物种的成果得到高达38%的改善,我们称之为优先优先排序。我们可以将这个目标表现为利普西茨维利维茨,同时调整其他目标,例如防止非法采伐和直压。