We present CausalSim, a causal inference framework for unbiased trace-driven simulation. Current trace-driven simulators assume that the interventions being simulated (e.g., a new algorithm) would not affect the validity of the traces. However, real-world traces are often biased by the choices of algorithms made during trace collection, and hence replaying traces under an intervention may lead to incorrect results. CausalSim addresses this challenge by learning a causal model of the system dynamics and latent factors capturing the underlying system conditions during trace collection. It learns these models using an initial randomized control trial (RCT) under a fixed set of algorithms, and then applies them to remove biases from trace data when simulating new algorithms. Key to CausalSim is mapping unbiased trace-driven simulation to a tensor completion problem with extremely sparse observations. By exploiting a basic distributional invariance property present in RCT data, CausalSim enables a novel tensor completion method despite the sparsity of observations. Our extensive evaluation of CausalSim on both real and synthetic datasets, including more than ten months of real data from the Puffer video streaming system show it improves simulation accuracy, reducing errors by 53% and 61% on average compared to expert-designed and supervised learning baselines. Moreover, CausalSim provides markedly different insights about ABR algorithms compared to the biased baseline simulator, which we validate with a real deployment.
翻译:我们提出CausalSim, 这是一种公正追踪驱动模拟的因果关系框架。 目前由追踪驱动的模拟器假设模拟的干预措施( 例如新算法)不会影响跟踪的有效性。 然而, 真实世界的痕迹往往受到追踪收集过程中的算法选择的偏差, 因而在干预中重播痕迹可能导致不正确的结果。 CausalSim 通过学习系统动态和潜在因素的因果模型, 来应对这一挑战, 了解系统动态和潜在因素的因果模型, 捕捉追踪收集过程中的系统条件。 它使用一套固定的算法, 初步随机控制试验( RCT) 来学习这些模型, 然后在模拟新算法时应用这些模型来消除跟踪数据的偏差。 CausalSim 的密钥是用极少的观察来绘制不偏重的追踪模拟模拟到一个成形的完成问题。 通过利用RCT数据中存在的基本分布变量, CausalSimalSimal 使得一种新型的超值完成方法, 尽管观察很紧张。 我们在真实和合成数据集中广泛评估了一次真实和合成的偏差性数据, 通过比正常的模拟系统来显示比VLAximal- hilal- dalimalal- disal 10个月的精确性数据, 。