When artificial intelligence mistakes memorization for intelligence, it creates a dangerous mirage of reasoning. Existing studies treat memorization and self-knowledge deficits in LLMs as separate issues and do not recognize an intertwining link that degrades the trustworthiness of LLM responses. In our study, we utilize a novel framework to ascertain if LLMs genuinely learn reasoning patterns from training data or merely memorize them to assume competence across problems of similar complexity focused on STEM domains. Our analysis shows a noteworthy problem in generalization: LLMs draw confidence from memorized solutions to infer a higher self-knowledge about their reasoning ability, which manifests as an over 45% inconsistency in feasibility assessments when faced with self-validated, logically coherent task perturbations. This effect is most pronounced in science and medicine domains, which tend to have maximal standardized jargon and problems, further confirming our approach. Significant wavering within the self-knowledge of LLMs also shows flaws in current architectures and training patterns, highlighting the need for techniques that ensure a balanced, consistent stance on models' perceptions of their own knowledge for maximum AI explainability and trustworthiness. Our code and results are available publicly at https://github.com/Sahil-R-Kale/mirage_of_mastery
翻译:当人工智能将记忆误认为智能时,便会产生一种危险的推理幻象。现有研究将大语言模型(LLMs)中的记忆与自我认知缺陷视为独立问题,未能认识到二者之间存在相互交织的联系,这种联系会降低LLM响应的可信度。在本研究中,我们采用一种新颖框架来探究LLMs是否真正从训练数据中学习推理模式,抑或仅仅通过记忆来假设自身具备解决类似复杂度问题的能力——研究聚焦于STEM领域。我们的分析揭示了一个显著的泛化问题:LLMs从记忆的解决方案中获得信心,进而推断出关于自身推理能力的更高自我认知,这表现为在面对经过自我验证、逻辑一致的任务扰动时,其可行性评估存在超过45%的不一致性。该效应在科学和医学领域最为明显,这些领域往往拥有最大程度的标准化术语和问题,进一步证实了我们的方法。LLMs自我认知中的显著波动也揭示了当前架构和训练模式的缺陷,凸显了需要开发相应技术以确保模型对自身知识的认知保持平衡一致,从而实现最大程度的人工智能可解释性与可信度。我们的代码与结果已公开于 https://github.com/Sahil-R-Kale/mirage_of_mastery。