The NURC Project that started in 1969 to study the cultured linguistic urban norm spoken in five Brazilian capitals, was responsible for compiling a large corpus for each capital. The digitized NURC/SP comprises 375 inquiries in 334 hours of recordings taken in S\~ao Paulo capital. Although 47 inquiries have transcripts, there was no alignment between the audio-transcription, and 328 inquiries were not transcribed. This article presents an evaluation and error analysis of three automatic speech recognition models trained with spontaneous speech in Portuguese and one model trained with prepared speech. The evaluation allowed us to choose the best model, using WER and CER metrics, in a manually aligned sample of NURC/SP, to automatically transcribe 284 hours.
翻译:1969年开始的NURC项目研究巴西五国首都讲的文化语言城市规范,负责为每个首都编集大量资料,数字化NURC/SP包括334小时在圣保罗省首府录制的375次查询,虽然47次查询有记录誊本,但音频调录没有统一,328次查询没有转录,文章对三种自动语音识别模型进行了评价和错误分析,三种自动语音识别模型是用葡萄牙语自发语言培训的,一种模型是用预先准备的演讲培训的。