This work introduces BRILLsson, a novel binary neural network-based representation learning model for a broad range of non-semantic speech tasks. We train the model with knowledge distillation from a large and real-valued TRILLsson model with only a fraction of the dataset used to train TRILLsson. The resulting BRILLsson models are only 2MB in size with a latency less than 8ms, making them suitable for deployment in low-resource devices such as wearables. We evaluate BRILLsson on eight benchmark tasks (including but not limited to spoken language identification, emotion recognition, health condition diagnosis, and keyword spotting), and demonstrate that our proposed ultra-light and low-latency models perform as well as large-scale models.
翻译:这项工作引入了BRILLsson, 这是一种基于神经网络的新型二进制代表性学习模式,用于开展广泛的非语义演讲任务。我们从一个大型的、具有实际价值的TRILLsson模型进行知识提炼,只有用于培训TRILLsson的数据集的一小部分,对模型进行了培训。由此产生的BRILLsson模型只有2MB, 其长度小于8米, 适合在诸如磨损装置等低资源装置中部署。我们评估BRILsson的8项基准任务(包括但不限于语音识别、情感识别、健康状况诊断和关键词识别),并表明我们提议的超光电灯和低长模型既能又能和大型模型。