个人化语音活动探测培训 (Enrollment-less training for personalized voice activity detection)

We present a novel personalized voice activity detection (PVAD) learning method that does not require enrollment data during training. PVAD is a task to detect the speech segments of a specific target speaker at the frame level using enrollment speech of the target speaker. Since PVAD must learn speakers' speech variations to clarify the boundary between speakers, studies on PVAD used large-scale datasets that contain many utterances for each speaker. However, the datasets to train a PVAD model are often limited because substantial cost is needed to prepare such a dataset. In addition, we cannot utilize the datasets used to train the standard VAD because they often lack speaker labels. To solve these problems, our key idea is to use one utterance as both a kind of enrollment speech and an input to the PVAD during training, which enables PVAD training without enrollment speech. In our proposed method, called enrollment-less training, we augment one utterance so as to create variability between the input and the enrollment speech while keeping the speaker identity, which avoids the mismatch between training and inference. Our experimental results demonstrate the efficacy of the method.

翻译：我们提出了一种新的个性化语音活动检测(PVAD)学习方法,在培训期间不需要注册数据。 PVAD是一项任务,用目标发言人的注册语言来检测一个特定目标发言者在框架级别的演讲部分。由于PVAD必须学习演讲者的语言变异以澄清发言者之间的界限, 有关PVAD的研究使用大型数据集, 其中包括每个发言者的许多发音。然而, 培训PVAD模式的数据集往往有限, 因为编制这样的数据集需要大量费用。此外, 我们无法使用用于培训标准 VAD的数据集, 因为他们往往缺乏演讲者标签。为了解决这些问题, 我们的关键想法是使用一种发音, 既作为一种注册语言,又作为培训期间对PVAD的投入, 从而使得PVAD培训无需注册语言。在我们建议的方法中, 称为无注册培训, 我们增加一种发音, 以便在输入和注册演讲之间创造差异, 同时保持发言者身份, 避免培训与推论之间的不匹配。我们的实验结果展示了方法的功效。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

深度概率图模型，Deep Probabilistic Models

专知会员服务

29+阅读 · 2021年8月2日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

【视频目标检测与跟踪：综述论文】Video Object Segmentation and Tracking: A Survey

专知会员服务

66+阅读 · 2020年6月4日

【深度伪造综述论文】The Creation and Detection of Deepfakes: A Survey

专知会员服务

55+阅读 · 2020年4月26日