Is it possible to use natural language to intervene in a model's behavior and alter its prediction in a desired way? We investigate the effectiveness of natural language interventions for reading-comprehension systems, studying this in the context of social stereotypes. Specifically, we propose a new language understanding task, Linguistic Ethical Interventions (LEI), where the goal is to amend a question-answering (QA) model's unethical behavior by communicating context-specific principles of ethics and equity to it. To this end, we build upon recent methods for quantifying a system's social stereotypes, augmenting them with different kinds of ethical interventions and the desired model behavior under such interventions. Our zero-shot evaluation finds that even today's powerful neural language models are extremely poor ethical-advice takers, that is, they respond surprisingly little to ethical interventions even though these interventions are stated as simple sentences. Few-shot learning improves model behavior but remains far from the desired outcome, especially when evaluated for various types of generalization. Our new task thus poses a novel language understanding challenge for the community.
翻译:能否使用自然语言来干预模型的行为,并以理想的方式改变其预测? 我们调查自然语言干预对于阅读综合系统的有效性,在社会陈规定型观念的背景下研究这一问题。 具体地说,我们提出一项新的语言理解任务,语言伦理干预(LEI),其目标是通过向模型传达特定背景的道德和公平原则来修正问答模式的不道德行为。为此,我们利用最近的方法来量化系统的社会陈规定型观念,以不同种类的道德干预和在这种干预下所需的模式行为来补充这些模式。 我们的零点评价发现,即使是今天强大的神经语言模式也是极差的道德认知对象,也就是说,即使这些干预只是简单的句子,它们对道德干预的反应却极其少见。 少见的学习改进了模式行为,但远未达到预期的结果,特别是在对各种通用性评估时。 因此,我们的新任务给社区带来了新的语言理解挑战。