We consider a problem wherein jobs arrive at random times and assume random values. Upon each job arrival, the decision-maker must decide immediately whether or not to accept the job and gain the value on offer as a reward, with the constraint that they may only accept at most $n$ jobs over some reference time period. The decision-maker only has access to $M$ independent realisations of the job arrival process. We propose an algorithm, Non-Parametric Sequential Allocation (NPSA), for solving this problem. Moreover, we prove that the expected reward returned by the NPSA algorithm converges in probability to optimality as $M$ grows large. We demonstrate the effectiveness of the algorithm empirically on synthetic data and on public fraud-detection datasets, from where the motivation for this work is derived.
翻译:我们考虑的是工作随机到来并假定随机价值的问题。在每次到来时,决策者必须立即决定是否接受工作并获得报价值作为奖励,但限制他们只能在某个参照期内接受最多不超过一美元的工作。决策者只能独立实现工作到来过程的收益。我们建议一种算法,即非等分序列分配(NPSA)来解决这个问题。此外,我们证明,预期由SURA算法得到的回报有可能随着美元的增长而达到最佳程度。我们从合成数据和公共欺诈检测数据集的经验角度展示了算法的有效性,而这项工作的动机就来自这些算法。