In recent years, transformer-based language representation models (LRMs) have achieved state-of-the-art results on difficult natural language understanding problems, such as question answering and text summarization. As these models are integrated into real-world applications, evaluating their ability to make rational decisions is an important research agenda, with practical ramifications. This article investigates LRMs' rational decision-making ability through a carefully designed set of decision-making benchmarks and experiments. Inspired by classic work in cognitive science, we model the decision-making problem as a bet. We then investigate an LRM's ability to choose outcomes that have optimal, or at minimum, positive expected gain. Through a robust body of experiments on four established LRMs, we show that a model is only able to `think in bets' if it is first fine-tuned on bet questions with an identical structure. Modifying the bet question's structure, while still retaining its fundamental characteristics, decreases an LRM's performance by more than 25\%, on average, although absolute performance remains well above random. LRMs are also found to be more rational when selecting outcomes with non-negative expected gain, rather than optimal or strictly positive expected gain. Our results suggest that LRMs could potentially be applied to tasks that rely on cognitive decision-making skills, but that more research is necessary before they can robustly make rational decisions.
翻译:近年来,以变压器为基础的语言代表模式(LRMs)在困难的自然语言理解问题上取得了最先进的结果,例如问题回答和文本总结。这些模式被纳入现实世界应用,评估其做出合理决定的能力是一项重要的研究议程,具有实际影响。这一条通过精心设计的一套决策基准和实验对变压器的合理决策能力进行了调查。在认知科学经典工作的启发下,我们将决策问题作为赌注来模拟。然后我们调查一个LRM是否有能力选择具有最佳或至少具有预期积极收益的结果。通过对四个既定LRMs进行强有力的实验,我们表明,只有首先对同一结构的赌注问题进行精确调整,才能“在赌注中思考”这些模式的合理决策能力。 修改Bet 问题的结构,同时保持其基本特点,平均地将LRM的绩效降低到25 ⁇ 以上,尽管绝对性业绩仍然大大高于随机性。在选择最优或最起码的成绩之前,发现在选择具有预期的、更合理性结果时,才能更合理地进行“在赌注上”思考,而更严格地认为,在选择我们不预期的、更能决定时,这种结果是能够更能地依靠潜在的决定。