In this paper we examine the limitations of Large Language Models (LLMs) for complex reasoning tasks. While current approaches leverage formal languages as intermediate representation of reasoning problems, they struggle with generating intermediate formal specifications and refining these representations. To address these issues, this paper proposes Logic-LM++, an improvement on Logic-LM. It uses the ability of LLMs to do pairwise comparisons, allowing the evaluation of the refinements suggested by the LLM. The paper demonstrates that Logic-LM++ outperforms Logic-LM and LLM based techniques on natural language reasoning tasks on two datasets, FOLIO and AR-LSAT. Logic-LM++ show an average improvement of 13.5% on standard prompting, 11% on chain of thought prompting and 5% on Logic-LM.
翻译:暂无翻译