In this paper, we propose SCANING, an unsupervised framework for paraphrasing via controlled noise injection. We focus on the novel task of paraphrasing algebraic word problems having practical applications in online pedagogy as a means to reduce plagiarism as well as ensure understanding on the part of the student instead of rote memorization. This task is more complex than paraphrasing general-domain corpora due to the difficulty in preserving critical information for solution consistency of the paraphrased word problem, managing the increased length of the text and ensuring diversity in the generated paraphrase. Existing approaches fail to demonstrate adequate performance on at least one, if not all, of these facets, necessitating the need for a more comprehensive solution. To this end, we model the noising search space as a composition of contextual and syntactic aspects and sample noising functions consisting of either one or both aspects. This allows for learning a denoising function that operates over both aspects and produces semantically equivalent and syntactically diverse outputs through grounded noise injection. The denoising function serves as a foundation for learning a paraphrasing function which operates solely in the input-paraphrase space without carrying any direct dependency on noise. We demonstrate SCANING considerably improves performance in terms of both semantic preservation and producing diverse paraphrases through extensive automated and manual evaluation across 4 datasets.
翻译:在本文中,我们建议“CANING”这个不受监督的通过受控的噪音注射进行抛射的框架。我们侧重于在在线教学法中实际应用的参数化代数字问题的新任务,在在线教学法中,作为减少图象破坏的手段,以及确保学生理解而不是腐蚀记忆的手段。这一任务比在用普通语言拼写词法问题解决方案一致性方面保存关键信息的困难,要复杂得多,管理文本的长度,并确保生成的副词句的多样性。在至少一个甚至所有这些方面,现有方法未能显示适当的表现,因此需要更全面的解决办法。为此,我们将无源搜索空间建成由背景和合成方面构成的组合,以及由两个方面或两个方面构成的混合功能样本。这有利于学习一种分解功能,既在两个方面运作,又产生语义上的等同和合成的多种产出。现有方法的解译功能未能显示至少一个甚至所有这些方面的充分性能,因此需要一个更全面的解决办法。为此,我们把搜索空间建成一个基础,在任何分级的语音学中,在任何分级的分数上,我们只能进行着一个分级的分级的分级分析。