开放冰壶之家：混合双人冰壶数据集 (Opening the House: Datasets for Mixed Doubles Curling)

We introduce the most comprehensive publicly available datasets for mixed doubles curling, constructed from eleven top-level tournaments from the CurlIT (https://curlit.com/results) Results Booklets spanning 53 countries, 1,112 games, and nearly 70,000 recorded shots. While curling analytics has grown in recent years, mixed doubles remains under-served due to limited access to data. Using a combined text-scraping and image-processing pipeline, we extract and standardize detailed game- and shot-level information, including player statistics, hammer possession, Power Play usage, stone coordinates, and post-shot scoring states. We describe the data engineering workflow, highlight challenges in parsing historical records, and derive additional contextual features that enable rigorous strategic analysis. Using these datasets, we present initial insights into shot selection and success rates, scoring distributions, and team efficiencies, illustrating key differences between mixed doubles and traditional 4-player curling. We highlight various ways to analyze this type of data including from a shot-, end-, game- or team-level to display its versatilely. The resulting resources provide a foundation for advanced performance modeling, strategic evaluation, and future research in mixed doubles curling analytics, supporting broader analytical engagement with this rapidly growing discipline.

翻译：我们推出了目前最全面的公开混合双人冰壶数据集，该数据集基于CurlIT（https://curlit.com/results）结果手册中涵盖53个国家、1,112场比赛、近70,000次投掷记录的十一项顶级赛事构建而成。尽管近年来冰壶数据分析有所发展，但由于数据获取受限，混合双人项目仍缺乏充分研究。通过结合文本抓取与图像处理的流程，我们提取并标准化了详细的比赛与投掷层级信息，包括运动员统计数据、后手权归属、强力局使用情况、冰壶坐标及投掷后得分状态。我们描述了数据工程的工作流程，强调了解析历史记录中的挑战，并推导出支持严谨战略分析的附加情境特征。利用这些数据集，我们初步揭示了投掷选择与成功率、得分分布及团队效率的规律，阐明了混合双人冰壶与传统四人制冰壶的关键差异。我们展示了从投掷、局次、比赛或团队层级分析此类数据的多种方法，以体现其多维度应用潜力。最终构建的资源为混合双人冰壶分析中的高级表现建模、战略评估及未来研究奠定了基础，有助于推动这一快速发展领域的更广泛分析参与。