The potential of large language models (LLMs) to reason like humans has been a highly contested topic in Machine Learning communities. However, the reasoning abilities of humans are multifaceted and can be seen in various forms, including analogical, spatial and moral reasoning, among others. This fact raises the question whether LLMs can perform equally well across all these different domains. This research work aims to investigate the performance of LLMs on different reasoning tasks by conducting experiments that directly use or draw inspirations from existing datasets on analogical and spatial reasoning. Additionally, to evaluate the ability of LLMs to reason like human, their performance is evaluted on more open-ended, natural language questions. My findings indicate that LLMs excel at analogical and moral reasoning, yet struggle to perform as proficiently on spatial reasoning tasks. I believe these experiments are crucial for informing the future development of LLMs, particularly in contexts that require diverse reasoning proficiencies. By shedding light on the reasoning abilities of LLMs, this study aims to push forward our understanding of how they can better emulate the cognitive abilities of humans.
翻译:大型语言模型(LLMs)推理能力是否能够像人类一样具有多方面、多形式的推理能力一直是机器学习社区争议的焦点。然而,人类的推理能力多种多样,包括类比、空间和道德推理等。这个事实提出了一个问题,LLMs是否在所有这些不同领域中表现得同样出色。本研究旨在通过进行直接使用或受现有类比和空间推理数据集启发的实验,调查LLMs在不同推理任务上的表现。此外,为了评估LLMs像人类一样推理的能力,它们的表现也将在更加开放和自然的语言问题上进行评估。我的研究结果表明,LLMs在类比和道德推理方面表现出色,但在空间推理任务中表现不如人类专业。我认为这些实验对于引导未来LLMs的发展是至关重要的,特别是在需要多样化推理技能的语境中。通过阐明LLMs的推理能力,本研究旨在推动我们对其如何更好地模拟人类认知能力的理解。