The imitation of percussive sounds via the human voice is a natural and effective tool for communicating rhythmic ideas on the fly. Thus, the automatic retrieval of drum sounds using vocal percussion can help artists prototype drum patterns in a comfortable and quick way, smoothing the creative workflow as a result. Here we explore different strategies to perform this type of query, making use of both traditional machine learning algorithms and recent deep learning techniques. The main hyperparameters from the models involved are carefully selected by feeding performance metrics to a grid search algorithm. We also look into several audio data augmentation techniques, which can potentially regularise deep learning models and improve generalisation. We compare the final performances in terms of effectiveness (classification accuracy), efficiency (computational speed), stability (performance consistency), and interpretability (decision patterns), and discuss the relevance of these results when it comes to the design of successful query-by-vocal-percussion systems.
翻译:通过人类声音模仿震动声音是一种自然而有效的工具,可以用来在空中传播有节奏的思想。因此,用声震自动检索鼓声可以帮助艺术家以舒适和快速的方式模拟鼓声模式,从而平滑创造性工作流程。在这里,我们探索不同战略,利用传统的机器学习算法和最近的深层次学习技术来进行这类查询。从所涉模型中的主要超参数是通过将性能指标输入电网搜索算法来仔细选择的。我们还研究几种扩增音数据技术,这些技术有可能使深层学习模型正规化,并改进一般化。我们比较了最后的性能(精度)、效率(剖析速度)、稳定性(性能一致性)和可判读性(决定模式),并讨论了这些结果在设计成功的逐伏震荡系统时的相关性。