We introduce Shennong, a Python toolbox and command-line utility for speech features extraction. It implements a wide range of well-established state of art algorithms including spectro-temporal filters such as Mel-Frequency Cepstral Filterbanks or Predictive Linear Filters, pre-trained neural networks, pitch estimators as well as speaker normalization methods and post-processing algorithms. Shennong is an open source, easy-to-use, reliable and extensible framework. The use of Python makes the integration to others speech modeling and machine learning tools easy. It aims to replace or complement several heterogeneous software, such as Kaldi or Praat. After describing the Shennong software architecture, its core components and implemented algorithms, this paper illustrates its use on three applications: a comparison of speech features performances on a phones discrimination task, an analysis of a Vocal Tract Length Normalization model as a function of the speech duration used for training and a comparison of pitch estimation algorithms under various noise conditions.
翻译:我们引入了Shennong, 即Python工具箱和语音特征提取的指令线工具, 实施一系列成熟的先进算法, 包括光谱时过滤器, 如Mel- Frequence Cepstral过滤器或预测线性过滤器、 预先训练的神经网络、 音量估计器以及演讲者正常化方法和后处理算法。 Shennong 是一个开放源, 易于使用, 可靠且可扩展的框架。 Python 的使用使得与其他人的语音模型和机器学习工具的整合变得容易。 它旨在替换或补充若干不同软件, 如 Kaldi 或 Praat 。 在描述Shennong 软件结构、 其核心构件和实施的算法之后, 本文说明了它在三种应用中的用途: 比较电话歧视任务上的语音特征性能, 分析Vocal Tract 梯度常识化模型, 作为用于培训的语音持续时间函数, 以及在不同噪音条件下的定位估计算法的比较 。