现代深度学习在板球击球动作分类中的应用：一项综合性基准研究 (Modern Deep Learning Approaches for Cricket Shot Classification: A Comprehensive Baseline Study)

Cricket shot classification from video sequences remains a challenging problem in sports video analysis, requiring effective modeling of both spatial and temporal features. This paper presents the first comprehensive baseline study comparing seven different deep learning approaches across four distinct research paradigms for cricket shot classification. We implement and systematically evaluate traditional CNN-LSTM architectures, attention-based models, vision transformers, transfer learning approaches, and modern EfficientNet-GRU combinations on a unified benchmark. A critical finding of our study is the significant performance gap between claims in academic literature and practical implementation results. While previous papers reported accuracies of 96\% (Balaji LRCN), 99.2\% (IJERCSE), and 93\% (Sensors), our standardized re-implementations achieve 46.0\%, 55.6\%, and 57.7\% respectively. Our modern SOTA approach, combining EfficientNet-B0 with a GRU-based temporal model, achieves 92.25\% accuracy, demonstrating that substantial improvements are possible with modern architectures and systematic optimization. All implementations follow modern MLOps practices with PyTorch Lightning, providing a reproducible research platform that exposes the critical importance of standardized evaluation protocols in sports video analysis research.

翻译：基于视频序列的板球击球动作分类仍是体育视频分析领域的难题，需要有效建模空间与时间特征。本文首次系统比较了板球击球动作分类中四种不同研究范式的七种深度学习方法，建立了综合性基准研究。我们在统一基准上实现并系统评估了传统CNN-LSTM架构、注意力机制模型、视觉Transformer、迁移学习方法及现代EfficientNet-GRU组合模型。本研究的核心发现是学术文献宣称结果与实际实现性能之间存在显著差距：虽然先前文献报道的准确率分别为96%（Balaji LRCN）、99.2%（IJERCSE）和93%（Sensors），但我们的标准化复现仅获得46.0%、55.6%和57.7%的准确率。我们采用EfficientNet-B0与基于GRU的时间模型相结合的现代SOTA方法，取得了92.25%的准确率，证明现代架构与系统优化能带来显著提升。所有实现均采用PyTorch Lightning遵循现代MLOps实践，提供了可复现的研究平台，揭示了标准化评估协议在体育视频分析研究中的关键重要性。