Online A/B testing has been widely used by software companies to evaluate the impact of new technologies by offering it to a groups of users and comparing against the unmodified product. However, running online A/B testing needs not only efforts in design, implementation and stakeholders' approval to be served in production, but also several weeks to collect the data in iterations. To address these issues, a recent emerging topic, called \textit{offline A/B testing}, is getting increasing attention, with the goal to conduct offline evaluation of a new technology by estimating historical logged data. Although this approach is promising due to lower implementation effort, faster turnaround time and no potential user harm, for it to be effectively prioritized as requirements in practice, several limitations need to be addressed, including its discrepancy with online A/B test results, and lack of systematic updates on new data. In response, in this vision paper, we introduce AutoOffAB, an idea to automatically runs variants of offline A/B testing against recent logging and update the offline evaluation results, which are used to make decisions on requirements more reliably and systematically.
翻译:暂无翻译