Chain-of-thought (CoT) prompting enables large language models (LLMs) to solve complex reasoning tasks by generating an explanation before the final prediction. Despite it's promising ability, a critical downside of CoT prompting is that the performance is greatly affected by the factuality of the generated explanation. To improve the correctness of the explanations, fine-tuning language models with explanation data is needed. However, there exists only a few datasets that can be used for such approaches, and no data collection tool for building them. Thus, we introduce CoTEVer, a tool-kit for annotating the factual correctness of generated explanations and collecting revision data of wrong explanations. Furthermore, we suggest several use cases where the data collected with CoTEVer can be utilized for enhancing the faithfulness of explanations. Our toolkit is publicly available at https://github.com/SeungoneKim/CoTEVer.
翻译:催化思维链(CoT)使大型语言模型(LLMs)能够在最后预测之前作出解释,从而解决复杂的推理任务。尽管它有很有希望的能力,但CoT推动的一个关键缺点是,其表现受到所产生解释的实际情况的极大影响。为了提高解释的正确性,需要微调语言模型,并附上解释数据。然而,只有为数不多的数据集可用于这些方法,而且没有用于建立这些方法的数据收集工具。因此,我们引入了CoTEVer,这是一个说明所产生解释的实际正确性并收集错误解释修订数据的工具。此外,我们建议使用一些实例,利用与CoTEVer收集的数据来提高解释的准确性。我们的工具包可在https://github.com/SeungoneKim/CoTEVer上公开查阅。</s>