The fundamental problem of causal inference -- that we never observe counterfactuals -- prevents us from identifying how many might be negatively affected by a proposed intervention. If, in an A/B test, half of users click (or buy, or watch, or renew, etc.), whether exposed to the standard experience A or a new one B, hypothetically it could be because the change affects no one, because the change positively affects half the user population to go from no-click to click while negatively affecting the other half, or something in between. While unknowable, this impact is clearly of material importance to the decision to implement a change or not, whether due to fairness, long-term, systemic, or operational considerations. We therefore derive the tightest-possible (i.e., sharp) bounds on the fraction negatively affected (and other related estimands) given data with only factual observations, whether experimental or observational. Naturally, the more we can stratify individuals by observable covariates, the tighter the sharp bounds. Since these bounds involve unknown functions that must be learned from data, we develop a robust inference algorithm that is efficient almost regardless of how and how fast these functions are learned, remains consistent when some are mislearned, and still gives valid conservative bounds when most are mislearned. Our methodology altogether therefore strongly supports credible conclusions: it avoids spuriously point-identifying this unknowable impact, focusing on the best bounds instead, and it permits exceedingly robust inference on these. We demonstrate our method in simulation studies and in a case study of career counseling for the unemployed.
翻译:因果推断的根本问题 -- -- 我们从未观察到反事实 -- -- 使我们无法确定有多少人可能受到拟议干预的负面影响。如果在A/B测试中,一半的用户点击(或购买,或观看,或更新,等等)标准经验A或新的B,假设是因为变化不会影响任何人,因为变化会积极影响半数用户群体,从不点击到点击,同时对另一半或两者之间产生消极影响。尽管这种影响无法为人们所了解,但这种影响显然对决定实施变革与否具有重大意义,无论是出于公平、长期、系统还是业务考虑。因此,我们从受标准经验A或新B(或新B)影响的部分(或其他相关估计)中得出最接近的(即尖锐)界限,因为数据仅以事实观察(无论是实验还是观察)为依据,因为变化对用户群体产生积极影响,因此,我们越能通过可观察的变量来限制个人,更接近于准确的度。由于这些界限涉及必须从数据中学习的未知的功能,因此我们发展出一个最坚定的、最坚定的、最坚定的、最准确的判断方法是有效的。