提高语文理解方面中转功用跨模式比较损失 (Cross-Model Comparative Loss for Enhancing Neuronal Utility in Language Understanding)

Current natural language understanding (NLU) models have been continuously scaling up, both in terms of model size and input context, introducing more hidden and input neurons. While this generally improves performance on average, the extra neurons do not yield a consistent improvement for all instances. This is because some hidden neurons are redundant, and the noise mixed in input neurons tends to distract the model. Previous work mainly focuses on extrinsically reducing low-utility neurons by additional post- or pre-processing, such as network pruning and context selection, to avoid this problem. Beyond that, can we make the model reduce redundant parameters and suppress input noise by intrinsically enhancing the utility of each neuron? If a model can efficiently utilize neurons, no matter which neurons are ablated (disabled), the ablated submodel should perform no better than the original full model. Based on such a comparison principle between models, we propose a cross-model comparative loss for a broad range of tasks. Comparative loss is essentially a ranking loss on top of the task-specific losses of the full and ablated models, with the expectation that the task-specific loss of the full model is minimal. We demonstrate the universal effectiveness of comparative loss through extensive experiments on 14 datasets from 3 distinct NLU tasks based on 4 widely used pretrained language models, and find it particularly superior for models with few parameters or long input.

翻译：当前的自然语言理解模式(NLU)在模型大小和输入环境方面不断扩大,引入了更隐蔽和输入神经元。虽然这总体上提高了平均性能,但额外的神经元并没有对所有情况都产生一致的改进。这是因为一些隐藏的神经元是多余的,而输入神经元中的杂音往往会分散模型的注意力。以前的工作主要侧重于通过额外的后处理或预处理(如网络剪裁和背景选择)来减少低功用神经元,以避免这一问题。除此之外,我们能否使模型减少冗余参数和抑制输入噪音,从本质上增强每个神经元的效用?如果模型能够有效地利用神经元,而不管神经元是哪些被放大的(残疾),那么被排挤的子模型不应比原始的完整模型更好。根据模型之间的这种比较原则,我们建议对一系列广泛的任务进行跨模范的比较损失,例如网络剪裁和背景选择。比较损失基本上是在任务特定损失的完整和缩放模型之上的排序损失,通过内在的功能增强每个神经元的效用,预期一个特定任务特定损失模型将它作为整个14项的比较性实验的高级实验,我们通过不同的第4个模型,通过不同的模型,通过不同的模型,通过不同的整个模型,通过不同的模型,可以广泛计算,通过不同的计算,通过不同的计算,通过不同的计算,可以找到不同的损失。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日