Semantic parsing models with applications in task oriented dialog systems require efficient sequence to sequence (seq2seq) architectures to be run on-device. To this end, we propose a projection based encoder-decoder model referred to as pQRNN-MAtt. Studies based on projection methods were restricted to encoder-only models, and we believe this is the first study extending it to seq2seq architectures. The resulting quantized models are less than 3.5MB in size and are well suited for on-device latency critical applications. We show that on MTOP, a challenging multilingual semantic parsing dataset, the average model performance surpasses LSTM based seq2seq model that uses pre-trained embeddings despite being 85x smaller. Furthermore, the model can be an effective student for distilling large pre-trained models such as T5/BERT.
翻译:任务导向对话框系统中应用的语义分解模型要求高效序列序列序列(seq2seq) 结构将运行在设备上。 为此,我们提议了一个基于投影的编码解码模型,称为 pQRNN-MAtt。 基于投影方法的研究仅限于只使用编码器的模型, 我们认为这是第一个将其扩展至后代2seq结构的研究。 结果的量化模型大小小于3.5MB, 并且非常适合在设计时使用关键应用程序。 我们在多语种语义解析数据集的MTOP上显示, 平均模型性能超过基于LSTM的后代2eq模型, 该模型尽管小于85x, 却使用预先训练的嵌入模型。 此外, 该模型可以成为将诸如T5/BERT等大型预训练模型蒸馏出来的有效学生。