We introduce a novel run-time method for significantly reducing the accuracy loss associated with quantizing BERT-like models to 8-bit integers. Existing methods for quantizing models either modify the training procedure,or they require an additional calibration step to adjust parameters that also requires a selected held-out dataset. Our method permits taking advantage of quantization without the need for these adjustments. We present results on several NLP tasks demonstrating the usefulness of this technique.
翻译:我们引入了一种新的运行时间方法,以大幅降低与将类似BERT的模型量化为8位数整数有关的准确性损失。 现有的模型量化方法要么修改培训程序,要么要求增加一个校准步骤来调整参数,而这也需要一个选定的搁置数据集。 我们的方法允许在不需要这些调整的情况下利用量化。 我们展示了几项NLP任务的结果,证明了这一技术的有用性。