Transformer-based approaches have been successfully used to obtain state-of-the-art accuracy on natural language processing (NLP) tasks with semi-structured tables. These model architectures are typically deep, resulting in slow training and inference, especially for long inputs. To improve efficiency while maintaining a high accuracy, we propose a new architecture, DoT, a double transformer model, that decomposes the problem into two sub-tasks: A shallow pruning transformer that selects the top-K tokens, followed by a deep task-specific transformer that takes as input those K tokens. Additionally, we modify the task-specific attention to incorporate the pruning scores. The two transformers are jointly trained by optimizing the task-specific loss. We run experiments on three benchmarks, including entailment and question-answering. We show that for a small drop of accuracy, DoT improves training and inference time by at least 50%. We also show that the pruning transformer effectively selects relevant tokens enabling the end-to-end model to maintain similar accuracy as slower baseline models. Finally, we analyse the pruning and give some insight into its impact on the task model.
翻译:以变换器为基础的方法已被成功地用于获得自然语言处理(NLP)任务中具有半结构化表格的最先进的精确度。 这些模型结构一般是深度的,导致培训速度和推断速度缓慢,特别是对于长期投入而言。为了提高效率,同时保持高精度,我们提议了一个新的结构,即DOT,一个双变压器模型,将问题分解成两个子任务:一个浅色的调整变压器,选择最高K级标牌,然后是一个深度的任务特定变压器,作为输入K级标牌。此外,我们修改特定任务的关注度,以纳入调整分数。两个变压器通过优化任务特定损失的优化共同培训。我们在三个基准上进行实验,包括要求和答题。我们显示,如果精度小一点,DT可以提高培训和推算时间,至少50%。我们还显示,运行变压器有效地选择了相关的代号,使最终到终端模型能够保持相似的精确度,作为较慢的基线模型。最后,我们分析其影响。