In online experimentation, appropriate metrics (e.g., purchase) provide strong evidence to support hypotheses and enhance the decision-making process. However, incomplete metrics are frequently occurred in the online experimentation, making the available data to be much fewer than the planned online experiments (e.g., A/B testing). In this work, we introduce the concept of dropout buyers and categorize users with incomplete metric values into two groups: visitors and dropout buyers. For the analysis of incomplete metrics, we propose a clustering-based imputation method using $k$-nearest neighbors. Our proposed imputation method considers both the experiment-specific features and users' activities along their shopping paths, allowing different imputation values for different users. To facilitate efficient imputation of large-scale data sets in online experimentation, the proposed method uses a combination of stratification and clustering. The performance of the proposed method is compared to several conventional methods in both simulation studies and a real online experiment at eBay.
翻译:二、翻译后的摘要:
在在线实验中,合适的指标(如购买)提供了支持假设和增强决策过程的强有力证据。然而,该实验中不完整的指标往往会发生,使得可用数据要比计划的在线实验(如A / B测试)少得多。在这项工作中,我们介绍了退单购买者的概念,并将具有不完整度量值的用户分类为访客和退单购买者两组。为了分析不完整的指标数据,我们提出了一种基于K最近邻的聚类插补方法。我们的方法考虑了实验特定的特征和用户沿其购物路径的活动,允许不同用户使用不同的插值值。为了方便在大规模在线实验中高效地插值,我们的插值方法使用分层和聚类的组合。我们的方法在eBay的模拟研究和一个真实的在线实验中与几种常规方法的性能进行了比较。