Spoken Language Identification (LID) is an important sub-task of Automatic Speech Recognition(ASR) that is used to classify the language(s) in an audio segment. Automatic LID plays an useful role in multilingual countries. In various countries, identifying a language becomes hard, due to the multilingual scenario where two or more than two languages are mixed together during conversation. Such phenomenon of speech is called as code-mixing or code-switching. This nature is followed not only in India but also in many Asian countries. Such code-mixed data is hard to find, which further reduces the capabilities of the spoken LID. Hence, this work primarily addresses this problem using data augmentation as a solution on the on the data scarcity of the code-switched class. This study focuses on Indic language code-mixed with English. Spoken LID is performed on Hindi, code-mixed with English. This research proposes Generative Adversarial Network (GAN) based data augmentation technique performed using Mel spectrograms for audio data. GANs have already been proven to be accurate in representing the real data distribution in the image domain. Proposed research exploits these capabilities of GANs in speech domains such as speech classification, automatic speech recognition, etc. GANs are trained to generate Mel spectrograms of the minority code-mixed class which are then used to augment data for the classifier. Utilizing GANs give an overall improvement on Unweighted Average Recall by an amount of 3.5% as compared to a Convolutional Recurrent Neural Network (CRNN) classifier used as the baseline reference.
翻译:口音语言识别(LID) 是用于在音频段中对语言进行分类的自动语音识别(ASR)的重要子任务。 自动LID在多语种国家发挥着有益的作用。 在不同国家, 识别一种语言变得很困难, 因为多语种的情景是对话期间两种或两种以上语言混杂在一起。 这种语音现象被称为代码混合或代码转换。 这种性质不仅在印度而且在许多亚洲国家都遵循。 这种代码混合数据很难找到, 从而进一步降低口语语言识别的能力。 因此, 这项工作主要通过数据增强来解决这个问题, 将数据增强作为解决代码转换类别数据缺乏问题的解决方案。 这项研究的重点是印地语系、 代码混合或代码转换。 这个研究不仅在印度, 而且在许多亚洲国家都遵循这种性质。 使用 Mel 光谱系统进行的数据增强技术。 因此, GAN 已经证明, 将数据递增数据递增的数据递增量作为GAN 类内已培训的语音数据分类, 将GAN 的内流流流流流流数据转换为GAN 。