A Multilingual Keyword Spotting (KWS) system detects spokenkeywords over multiple locales. Conventional monolingual KWSapproaches do not scale well to multilingual scenarios because ofhigh development/maintenance costs and lack of resource sharing.To overcome this limit, we propose two locale-conditioned universalmodels with locale feature concatenation and feature-wise linearmodulation (FiLM). We compare these models with two baselinemethods: locale-specific monolingual KWS, and a single universalmodel trained over all data. Experiments over 10 localized languagedatasets show that locale-conditioned models substantially improveaccuracy over baseline methods across all locales in different noiseconditions.FiLMperformed the best, improving on average FRRby 61% (relative) compared to monolingual KWS models of similarsizes.
翻译:多语言关键词点点(KWS)系统在多个地方探测到口语关键词。 由于开发/维护成本高且缺乏资源共享,常规单语言KWSApproachs在多语种情景中规模不高。 为了克服这一限制,我们提议了两个带有本地特色组合和地貌线性调制(FILM)的当地通用模型。我们将这些模型与两个基线方法进行了比较:一个是针对本地的单语言KWS,一个是对所有数据都受过培训的单一通用模型。 10个本地语言数据集的实验显示,在不同的噪音条件下,当地条件的模型大大改善了所有地方基线方法的准确性。 FiLM 取得了最佳效果, 与单一语言的KWS型类似大小模型相比,平均 FRRRBY 61% (relative) 。</s>