Labeling speech down to the identity and time boundaries of phones is a labor-intensive part of phonetic research. To simplify this work, we created a free open-source tool generating phone sequences from Czech text and time-aligning them with audio. Low architecture complexity makes the design approachable for students of phonetics. Acoustic model ReLU NN with 56k weights was trained using PyTorch on small CommonVoice data. Alignment and variant selection decoder is implemented in Python with matrix library. A Czech pronunciation generator is composed of simple rule-based blocks capturing the logic of the language where possible, allowing modification of transcription approach details. Compared to tools used until now, data preparation efficiency improved, the tool is usable on Mac, Linux and Windows in Praat GUI or command line, achieves mostly correct pronunciation variant choice including glottal stop detection, algorithmically captures most of Czech assimilation logic and is both didactic and practical.
翻译:标注语音到音素的边界,是语音学研究中一项费力的工作。为了简化这项工作,我们创建了一个免费开源的工具,可以从Czech文本生成音素序列,并将其与音频进行时间对齐。低架构复杂度使得设计易于让语音学的学生使用。使用PyTorch在小型CommonVoice数据上训练了具有56k权重的线性整流单元神经网络。对齐和变体选择编码器采用了Python和矩阵库进行实现。当语言允许时,由一些简单的基于规则的模块组成的Czech发音生成器捕捉,并允许修改转录方法的细节。与使用到目前为止的工具相比,数据准备效率提高了,该工具可在Mac、Linux和Windows的Praat GUI或命令行中使用,实现了大多数正确的发音变体选择,包括声门塞检测,算法捕捉了大多数Czech同化逻辑,既有教学性又实用。