Few-shot keyword spotting (KWS) systems often utilize a sliding window of fixed size. Because of the varying lengths of different keywords or their spoken instances, choosing the right window size is a problem: A window should be long enough to contain all necessary information needed to recognize a keyword but a longer window may contain irrelevant information such as multiple words or noise and thus makes it difficult to reliably detect on- and offsets of keywords. In this work, TempAdaCos, an angular margin loss for obtaining embeddings with temporal structure, that can be used to detect keywords with dynamic time warping is proposed. In experiments conducted on KWS-DailyTalk, a few-shot keyword spotting (KWS) dataset presented in this work, it is shown that using these embeddings outperforms using other representations or a sliding window. Furthermore, it is shown that using time-reversed segments of the keywords while training the system improves the performance.
翻译:暂无翻译