While large language models (LLMs) like GPT-3 have achieved impressive results on multiple choice question answering (MCQA) tasks in the zero, one, and few-shot settings, they generally lag behind the MCQA state of the art (SOTA). MCQA tasks have traditionally been presented to LLMs like cloze tasks. An LLM is conditioned on a question (without the associated answer options) and its chosen option is the one assigned the highest probability after normalization (for length, etc.). A more natural prompting approach is to present the question and answer options to the LLM jointly and have it output the symbol (e.g., "A") associated with its chosen answer option. This approach allows the model to explicitly compare answer options, reduces computational costs, and mitigates the effects of tokenization scheme and answer option representations on answer selection. For the natural approach to be effective the LLM it is used with must be able to associate answer options with the symbols that represent them. The LLM needs what we term multiple choice symbol binding (MCSB) ability. This ability varies greatly by model. We show that a model with high MCSB ability performs much better with the natural approach than with the traditional approach across 20 diverse datasets and largely closes the gap with the SOTA, suggesting that the MCQA ability of LLMs has been previously underestimated.
翻译:虽然像GPT-3这样的大型语言模型(LLMs)在零、一和几发环境中的多个选择回答问题(MCQA)任务中取得了令人印象深刻的成果,但它们通常落后于MCQA的艺术状态(SOTA)。MCQA的任务传统上一直向LLMs提出,比如木棍任务。LLM的条件是有一个问题(没有相关的答案选项),它所选择的选项是正常化之后的最大概率(长度等)。更自然的促进办法是向LLM联合提出问答选项,并让它产生与其选择的回答选项相关的符号(例如“A”)。这一方法使得该模型能够明确比较答案选项,降低计算成本,减轻象征性化方案的效果,并回答答案选择的选项。LLMM的自然方法必须能够将答案选项与代表这些选项的符号联系起来(长度等)。LLMMM需要将多重选择符号(MCSB)合并成一个功能。这种能力因模型而大不相同。我们展示了SBSBA的模型,它具有比SBCSB的更高能力,而具有高的模型,它具有高的模型与高额的模型与高额的SBSBCSB能力,我们更接近的模型显示高的SBSBSBSB的能力。我们具有了高的模型。