This paper proposes an unsupervised data selection method by using a submodular function based on contrastive loss ratios of target and training data sets. A model using a contrastive loss function is trained on both sets. Then the ratio of frame-level losses for each model is used by a submodular function. By using the submodular function, a training set for automatic speech recognition matching the target data set is selected. Experiments show that models trained on the data sets selected by the proposed method outperform the selection method based on log-likelihoods produced by GMM-HMM models, in terms of word error rate (WER). When selecting a fixed amount, e.g. 10 hours of data, the difference between the results of two methods on Tedtalks was 20.23% WER relative. The method can also be used to select data with the aim of minimising negative transfer, while maintaining or improving on performance of models trained on the whole training set. Results show that the WER on the WSJCAM0 data set was reduced by 6.26% relative when selecting 85% from the whole data set.
翻译:本文建议采用基于目标和培训数据集对比性损失比率的子模块函数,采用不受监督的数据选择方法。 使用对比性损失函数的模型在两组中都接受了培训。 然后, 子模块函数使用每个模型的框架水平损失比率。 通过使用子模块函数, 选择了一组与目标数据集匹配的自动语音识别培训数据集。 实验显示, 以拟议方法选定的数据集培训的模型比根据GM- HMM模型生成的日志相似值(单词误差率(WER)计算的选择方法优于选择方法。 在选择一个固定数量(如10小时的数据)时, Tedtalk 上两种方法的结果之间的差别是20.23% WER 相对的。 该方法还可以用于选择数据,以最小化负转移为目的,同时保持或改进整个培训数据集所培训的模型的性能。 结果显示, 从整个数据集中选择 85% 时, WSJCAM0 数据集上的 WER 相对减少了6.6% 。