We introduce a new setting, optimize-and-estimate structured bandits. Here, a policy must select a batch of arms, each characterized by its own context, that would allow it to both maximize reward and maintain an accurate (ideally unbiased) population estimate of the reward. This setting is inherent to many public and private sector applications and often requires handling delayed feedback, small data, and distribution shifts. We demonstrate its importance on real data from the United States Internal Revenue Service (IRS). The IRS performs yearly audits of the tax base. Two of its most important objectives are to identify suspected misreporting and to estimate the "tax gap" -- the global difference between the amount paid and true amount owed. Based on a unique collaboration with the IRS, we cast these two processes as a unified optimize-and-estimate structured bandit. We analyze optimize-and-estimate approaches to the IRS problem and propose a novel mechanism for unbiased population estimation that achieves rewards comparable to baseline approaches. This approach has the potential to improve audit efficacy, while maintaining policy-relevant estimates of the tax gap. This has important social consequences given that the current tax gap is estimated at nearly half a trillion dollars. We suggest that this problem setting is fertile ground for further research and we highlight its interesting challenges. The results of this and related research are currently being incorporated into the continual improvement of the IRS audit selection methods.
翻译:我们引入了新的环境,优化和估计结构化强盗。在这里,一项政策必须选择一批武器,每批武器都有其自身的背景特征,让它能够最大限度地获得奖励,并保持对奖赏的准确(不偏颇)人口估计。这一环境是许多公共和私营部门应用所固有的,常常需要处理延迟反馈、小数据和分发变化。我们用美国国内税收署(IRS)的真实数据来表明其重要性。国税局每年对税基进行审计。其两个最重要的目标是查明可疑的错误报告,并估计“税收差距” -- -- 支付的数额和实收数额之间的全球差额。根据与国税局的独特合作,我们把这两个进程作为统一的优化和估计结构化的条幅。我们分析IRS问题的最佳和估计方法,并提出一个新的机制,对人口进行公正的估计,以获得与基线方法相类似的奖赏。这种方法有可能提高审计效率,同时保持与政策相关的税收差距的估计。鉴于当前税收差距估计为近半万亿美元,全球差额是全球差额。我们把这两个过程看成是统一的最佳选择方法。我们把这一问题纳入到目前令人感兴趣的研究中。 我们建议,这个选择方法的焦点是,这是一个与不断改进的方法。