We consider a model where an agent has a repeated decision to make and wishes to maximize their total payoff. Payoffs are influenced by an action taken by the agent, but also an unknown state of the world that evolves over time. Before choosing an action each round, the agent can purchase noisy samples about the state of the world. The agent has a budget to spend on these samples, and has flexibility in deciding how to spread that budget across rounds. We investigate the problem of choosing a sampling algorithm that optimizes total expected payoff. For example: is it better to buy samples steadily over time, or to buy samples in batches? We solve for the optimal policy, and show that it is a natural instantiation of the latter. Under a more general model that includes per-round fixed costs, we prove that a variation on this batching policy is a 2-approximation.
翻译:我们考虑的是代理商反复决定并希望最大限度地获得全部报酬的模式。 报酬受代理商行动的影响, 但也受时间变化的未知世界状态的影响。 在选择每轮行动之前,代理商可以购买关于世界状况的噪音样本。 代理商有预算可以花在这些样本上, 在决定如何将预算分散到各轮方面有灵活性。 我们调查了选择一个能够优化预期总收益的抽样算法的问题。 比如: 最好在一段时间内稳步购买样品, 还是分批购买样品? 我们解决最佳政策,并表明这是后者的自然即时。 在包括每轮固定成本的更一般模式下,我们证明这种分批政策的变化是2倍一致的。