复杂的质量保证和语言模型混合结构、调查</s> (Complex QA and language models hybrid architectures, Survey)

This paper provides a survey of the state of the art of hybrid language models architectures and strategies for "complex" question-answering (QA, CQA, CPS). Very large language models are good at leveraging public data on standard problems but once you want to tackle more specific complex questions or problems you may need specific architecture, knowledge, skills, tasks, methods, sensitive data, performance, human approval and versatile feedback... This survey extends findings from the robust community edited research papers BIG, BLOOM and HELM which open source, benchmark and analyze limits and challenges of large language models in terms of tasks complexity and strict evaluation on accuracy (e.g. fairness, robustness, toxicity, ...). It identifies the key elements used with Large Language Models (LLM) to solve complex questions or problems. Recent projects like ChatGPT and GALACTICA have allowed non-specialists to grasp the great potential as well as the equally strong limitations of language models in complex QA. Hybridizing these models with different components could allow to overcome these different limits and go much further. We discuss some challenges associated with complex QA, including domain adaptation, decomposition and efficient multi-step QA, long form QA, non-factoid QA, safety and multi-sensitivity data protection, multimodal search, hallucinations, QA explainability and truthfulness, time dimension. Therefore we review current solutions and promising strategies, using elements such as hybrid LLM architectures, human-in-the-loop reinforcement learning, prompting adaptation, neuro-symbolic and structured knowledge grounding, program synthesis, and others. We analyze existing solutions and provide an overview of the current research and trends in the area of complex QA.

翻译：本文对混合语言模型结构和“复合”问答战略(QA、CQA、CQA、CPS)的先进程度进行了调查。大量语言模型在利用关于标准问题的公开数据方面十分有效,但一旦你想解决更具体的复杂问题或问题,你可能需要具体的架构、知识、技能、任务、方法、敏感数据、性能、人力核准和多方面反馈。这份调查扩展了社区编辑的扎实研究论文BIG、BLOOM和HELM的研究结果,这些论文开放了源头、基准和分析大语言模型在任务复杂性和准确性(例如公平、稳健、毒性、.)方面的限制和挑战。它确定了大语言模型用于解决复杂问题或问题的关键要素。诸如ChatGPT和GALCTAA等近期项目使非专家能够抓住复杂QA的巨大潜力以及复杂的语言模型的同样严重的局限性。将这些模型与不同组成部分结合起来,可以克服这些不同的限制,并进一步讨论与以下一些挑战:QA的复杂质量-质量、强化、其他知识-结构的升级和多级研究。</s>