Large Language Models (LMs) have achieved state-of-the-art performance on many Natural Language Processing (NLP) benchmarks. With the growing number of new benchmarks, we build bigger and more complex LMs. However, building new LMs may not be an ideal option owing to the cost, time and environmental impact associated with it. We explore an alternative route: can we modify data by expressing it in terms of the model's strengths, so that a question becomes easier for models to answer? We investigate if humans can decompose a hard question into a set of simpler questions that are relatively easier for models to solve. We analyze a range of datasets involving various forms of reasoning and find that it is indeed possible to significantly improve model performance (24% for GPT3 and 29% for RoBERTa-SQuAD along with a symbolic calculator) via decomposition. Our approach provides a viable option to involve people in NLP research in a meaningful way. Our findings indicate that Human-in-the-loop Question Decomposition (HQD) can potentially provide an alternate path to building large LMs.
翻译:大型语言模型(LMS)在许多自然语言处理(NLP)基准方面达到了最新水平。随着新基准数量的不断增加,我们建设了更大、更复杂的LMS。然而,由于与其相关的成本、时间和环境影响,建设新的LMS可能不是一个理想的选择。我们探索了另一种途径:我们能否用模型的优点来表达数据,从而修改数据,从而使模型更容易回答一个问题?我们调查人类能否将一个棘手的问题分解成一套比较容易模型解决的更简单的问题。我们分析了一系列涉及各种推理的数据集,并发现通过拆解来显著改进模型性能(GPT324%,RoBERTA-SuAD29%,以及象征性的计算器)。我们的方法提供了一个可行的选择,让人们以有意义的方式参与NLP研究。我们的调查结果表明,人类在热点问题解剖(HQD)可能提供建造大型LMs的替代路径。