With the evergrowing sizes of pre-trained models (PTMs), it has been an emerging practice to only provide the inference APIs for users, namely model-as-a-service (MaaS) setting. To adapt PTMs with model parameters frozen, most current approaches focus on the input side, seeking for powerful prompts to stimulate models for correct answers. However, we argue that input-side adaptation could be arduous due to the lack of gradient signals and they usually require thousands of API queries, resulting in high computation and time costs. In light of this, we present Decoder Tuning (DecT), which in contrast optimizes task-specific decoder networks on the output side. Specifically, DecT first extracts prompt-stimulated output scores for initial predictions. On top of that, we train an additional decoder network on the output representations to incorporate posterior data knowledge. By gradient-based optimization, DecT can be trained within several seconds and requires only one PTM query per sample. Empirically, we conduct extensive natural language understanding experiments and show that DecT significantly outperforms state-of-the-art algorithms with a $10^3\times$ speed-up.
翻译:随着经过培训的模型(PTMs)规模日益扩大,一种新出现的做法是仅为用户提供推断性API,即模型-服务(MaaS)设置。为了对PTM进行模型参数冻结的调整,大多数目前的方法侧重于输入方,寻求强大的提示,以刺激模型的正确答案。然而,我们争辩说,由于缺少梯度信号,投入方适应可能十分艰巨,它们通常需要数千个API查询,导致高计算和时间成本。鉴于此,我们介绍Decoder Tuning(DecT),它与它相对应,优化了输出方的特定任务解码网络。具体地说,DecT首先提取了初步预测的快速模拟输出分数。除此之外,我们培训了一个新的输出表达式解码网络,以纳入远端数据知识。通过基于梯度的优化,DecT可以在几秒钟内接受培训,每样本只需要一次PTM查询一次。我们进行广泛的自然语言理解实验,并显示DecT在10-Q-Q-art算法上大大超过10美元的速度。