In online experimentation, appropriate metrics (e.g., purchase) provide strong evidence to support hypotheses and enhance the decision-making process. However, incomplete metrics are frequently occurred in the online experimentation, making the available data to be much fewer than the planned online experiments (e.g., A/B testing). In this work, we introduce the concept of dropout buyers and categorize users with incomplete metric values into two groups: visitors and dropout buyers. For the analysis of incomplete metrics, we propose a clustering-based imputation method using $k$-nearest neighbors. Our proposed imputation method considers both the experiment-specific features and users' activities along their shopping paths, allowing different imputation values for different users. To facilitate efficient imputation of large-scale data sets in online experimentation, the proposed method uses a combination of stratification and clustering. The performance of the proposed method is compared to several conventional methods in both simulation studies and a real online experiment at eBay.
翻译:在网上实验中,适当的衡量标准(例如购买)提供了有力的证据来支持假设,并加强了决策过程。然而,在线实验中经常出现不完全的衡量标准,使得现有数据比计划的在线实验(例如A/B测试)要少得多。在这项工作中,我们引入了辍学买家的概念,并将不完全的衡量值的用户分为两类:访问者和辍学买家。为了分析不完全的衡量标准,我们建议采用以集群为基础的估算方法,使用美元最远的邻居。我们提议的估算方法既考虑到试验的具体特点,又考虑到用户在购物道路上的活动,允许不同用户使用不同的估算值。为了便利在网上实验中有效估算大型数据集,拟议方法采用分层和组合组合组合组合的组合。在模拟研究和eBay的真正在线实验中,将拟议方法的绩效与若干常规方法进行比较。