In this work, to limit the number of required attention inference hops in memory-augmented neural networks, we propose an online adaptive approach called A2P-MANN. By exploiting a small neural network classifier, an adequate number of attention inference hops for the input query is determined. The technique results in elimination of a large number of unnecessary computations in extracting the correct answer. In addition, to further lower computations in A2P-MANN, we suggest pruning weights of the final FC (fully-connected) layers. To this end, two pruning approaches, one with negligible accuracy loss and the other with controllable loss on the final accuracy, are developed. The efficacy of the technique is assessed by using the twenty question-answering (QA) tasks of bAbI dataset. The analytical assessment reveals, on average, more than 42% fewer computations compared to the baseline MANN at the cost of less than 1% accuracy loss. In addition, when used along with the previously published zero-skipping technique, a computation count reduction of up to 68% is achieved. Finally, when the proposed approach (without zero-skipping) is implemented on the CPU and GPU platforms, up to 43% runtime reduction is achieved.
翻译:在这项工作中,为了限制记忆增强神经网络中需要注意的推论跳跳的次数,我们建议采用名为 A2P-MANN 的在线适应性方法。通过利用一个小型神经网络分类器,可以确定输入查询的足够数量的引力推导跳。技术的结果是,在提取正确答案的过程中消除了大量不必要的计算。此外,为了进一步降低A2P-MANN的计算,我们建议对最后FC(完全连接的)层进行裁剪。为此,我们提出了两种调整方法,一种是精度损失微不足道,另一种是最终精确度可控制损失。通过使用一个小神经网络分类器,确定了输入查询的足够数量的引力推引力推导。该技术的功效是通过使用20个答答(QA)任务(bBAbI数据集)来评估的。分析评估显示,平均而言,比基准MANN的计算少42%以上,其成本低于1%。此外,在使用先前公布的零落位技术时,将计算到68 %的计算结果在GPU上。最后,在拟议的零位时将实现。