Current math word problem (MWP) solvers are usually Seq2Seq models trained by the (one-problem; one-solution) pairs, each of which is made of a problem description and a solution showing reasoning flow to get the correct answer. However, one MWP problem naturally has multiple solution equations. The training of an MWP solver with (one-problem; one-solution) pairs excludes other correct solutions, and thus limits the generalizability of the MWP solver. One feasible solution to this limitation is to augment multiple solutions to a given problem. However, it is difficult to collect diverse and accurate augment solutions through human efforts. In this paper, we design a new training framework for an MWP solver by introducing a solution buffer and a solution discriminator. The buffer includes solutions generated by an MWP solver to encourage the training data diversity. The discriminator controls the quality of buffered solutions to participate in training. Our framework is flexibly applicable to a wide setting of fully, semi-weakly and weakly supervised training for all Seq2Seq MWP solvers. We conduct extensive experiments on a benchmark dataset Math23k and a new dataset named Weak12k, and show that our framework improves the performance of various MWP solvers under different settings by generating correct and diverse solutions.
翻译:目前数学问题解答器通常是Seq2Seq2Seq的模型(一个问题;一个解决办法),每个模型都是由(一个问题;一个解决办法)双对培训的Seq2Seqeq模型,每个模型都用问题描述和显示推理流来找到正确的答案。然而,一个MWP问题自然具有多种解答方程式。对一个MWP解答器的培训排除了其他正确的解决办法,从而限制了MWP解答器的通用性。这一限制的一个可行解决办法是增加对某个问题的多重解决办法。然而,很难通过人类的努力收集多样化和准确的扩大解决方案。在本文件中,我们为MWP解答器设计了新的培训框架,引入了解决方案缓冲和解答器。缓冲包括由MWP解答器产生的鼓励培训数据多样性的解决方案。歧视者控制了缓冲解决方案的质量。我们的框架可灵活地适用于所有Seq2SeqMWP解答器的广泛设置。我们通过引入一个名为Wegnal的新的数据库,通过一个名为MWP解算器进行广泛的测试。