Spoken language change detection (LCD) refers to identifying the language transitions in a code-switched utterance. Similarly, identifying the speaker transitions in a multispeaker utterance is known as speaker change detection (SCD). Since tasks-wise both are similar, the architecture/framework developed for the SCD task may be suitable for the LCD task. Hence, the aim of the present work is to develop LCD systems inspired by SCD. Initially, both LCD and SCD are performed by humans. The study suggests humans require (a) a larger duration around the change point and (b) language-specific prior exposure, for performing LCD as compared to SCD. The larger duration requirement is incorporated by increasing the analysis window length of the unsupervised distance-based approach. This leads to a relative performance improvement of 29.1% and 2.4%, and a priori language knowledge provides a relative improvement of 31.63% and 14.27% on the synthetic and practical codeswitched datasets, respectively. The performance difference between the practical and synthetic datasets is mostly due to differences in the distribution of the monolingual segment duration.
翻译:口语变换检测(LCD)是指在编码开关的语句中识别语言转换。同样,在多发音的语句中识别语言变换也被称为语音变换检测(SCD)。由于任务彼此相似,SCD任务制定的结构/框架可能适合LCD任务。因此,目前工作的目标是开发由SCD启发的LCD系统。最初,LCD和SCD都是由人操作的。研究表明,人需要:(a) 在变换点周围有更长的时间,(b) 与SCD相比,进行LCD在语言前接触需要更长的时间。较大的时间要求是通过增加未受监督的远程方法的分析窗口长度而纳入的。这导致相对性能改善29.1%和2.4%,而先前语言知识则使合成和实用代码转换数据集的性能分别相对改善31.63%和14.27 %。实际和合成数据集之间的性能差异主要是由于单语段时间段分布的差异。