Progress in conversational information access (CIA) systems has been hindered by the difficulty of evaluating such systems with reproducible experiments. While user simulation offers a promising solution, the lack of infrastructure and tooling to support this evaluation paradigm remains a significant barrier. To address this gap, we introduce SimLab, the first cloud-based platform providing a centralized solution for the community to benchmark both conversational systems and user simulators in a controlled and reproducible setting. We articulate the requirements for such a platform and propose a general infrastructure to meet them. We then present the design and implementation of an initial version of SimLab and showcase its features through an initial simulation-based evaluation task in conversational movie recommendation. Furthermore, we discuss the platform's sustainability and future opportunities for development, inviting the community to drive further progress in the fields of CIA and user simulation.
翻译:暂无翻译