Speaker recognition on household devices, such as smart speakers, features several challenges: (i) robustness across a vast number of heterogeneous domains (households), (ii) short utterances, (iii) possibly absent speaker labels of the enrollment data (passive enrollment), and (iv) presence of unknown persons (guests). While many commercial products exist, there is less published research and no publicly-available evaluation protocols or open-source baselines. Our work serves to bridge this gap by providing an accessible evaluation benchmark derived from public resources (VoxCeleb and ASVspoof 2019 data) along with a preliminary pool of open-source baselines. This includes four algorithms for active enrollment (speaker labels available) and one algorithm for passive enrollment.
翻译:发言人对家庭设备(如智能演讲者)的认可,具有若干挑战:(一) 众多不同领域(家庭)的稳健性;(二) 短话;(三) 可能没有入学数据(被动招生)的语音标签;(四) 身份不明者(客人)的存在;虽然存在许多商业产品,但出版物的研究较少,也没有公开可用的评价协议或公开来源基线;我们的工作通过提供来自公共资源的无障碍评价基准(VoxCeleb和ASVspoof 2019年数据)以及初步的开放源基线库来弥补这一差距,其中包括主动招生的四种算法(有语音标签)和被动招生的一种算法。