Cross-lingual dubbing of lecture videos requires the transcription of the original audio, correction and removal of disfluencies, domain term discovery, text-to-text translation into the target language, chunking of text using target language rhythm, text-to-speech synthesis followed by isochronous lipsyncing to the original video. This task becomes challenging when the source and target languages belong to different language families, resulting in differences in generated audio duration. This is further compounded by the original speaker's rhythm, especially for extempore speech. This paper describes the challenges in regenerating English lecture videos in Indian languages semi-automatically. A prototype is developed for dubbing lectures into 9 Indian languages. A mean-opinion-score (MOS) is obtained for two languages, Hindi and Tamil, on two different courses. The output video is compared with the original video in terms of MOS (1-5) and lip synchronisation with scores of 4.09 and 3.74, respectively. The human effort also reduces by 75%.
翻译:交叉语言对讲座视频进行跨语言的调试,要求将原声频、校正和清除、域名发现、文本到文本翻译成目标语言、用目标语言节奏拼凑文本、文本到语音合成、然后是异口同声的唇语与原视频相配。当源语言和目标语言属于不同语言家庭时,这项任务就具有挑战性,导致生成的音频持续时间不同。原始发言者的节奏,特别是外语发言的节奏,使任务更加复杂。本文描述了以印度语半自动重新制作英语讲座视频的挑战。为9种印度语进行双调讲座开发了一个原型。两种语言,即印地语和泰米尔语,都获得了一种中值语言的中值语言,即印语和泰米尔语。输出视频与原视频分别用MOS(1-5)和唇同步4.09和3.74的分数进行了比较,从而与原始视频进行了对比,从而使得任务更加艰巨。人类的努力也减少了75%。