Automatic transcription of guitar strumming is an underrepresented and challenging task in Music Information Retrieval (MIR), particularly for extracting both strumming directions and chord progressions from audio signals. While existing methods show promise, their effectiveness is often hindered by limited datasets. In this work, we extend a multimodal approach to guitar strumming transcription by introducing a novel dataset and a deep learning-based transcription model. We collect 90 min of real-world guitar recordings using an ESP32 smartwatch motion sensor and a structured recording protocol, complemented by a synthetic dataset of 4h of labeled strumming audio. A Convolutional Recurrent Neural Network (CRNN) model is trained to detect strumming events, classify their direction, and identify the corresponding chords using only microphone audio. Our evaluation demonstrates significant improvements over baseline onset detection algorithms, with a hybrid method combining synthetic and real-world data achieving the highest accuracy for both strumming action detection and chord classification. These results highlight the potential of deep learning for robust guitar strumming transcription and open new avenues for automatic rhythm guitar analysis.
翻译:吉他弹奏的自动转录是音乐信息检索(MIR)中一个研究不足且具有挑战性的任务,特别是从音频信号中同时提取弹奏方向和弦进行。尽管现有方法显示出潜力,但其有效性常受限于数据集规模。在本工作中,我们通过引入一个新颖的数据集和一个基于深度学习的转录模型,扩展了吉他弹奏转录的多模态方法。我们使用ESP32智能手表运动传感器和结构化录音协议采集了90分钟的真实世界吉他录音,并辅以4小时带标签的合成弹奏音频数据集。训练了一个卷积循环神经网络(CRNN)模型,仅使用麦克风音频来检测弹奏事件、分类其方向并识别对应的和弦。我们的评估表明,该方法相比基线起始点检测算法有显著提升,其中结合合成数据与真实世界数据的混合方法在弹奏动作检测与和弦分类上均取得了最高准确率。这些结果凸显了深度学习在实现鲁棒吉他弹奏转录方面的潜力,并为自动节奏吉他分析开辟了新途径。