视觉聊天:与视觉基础模型交谈、绘图和编辑</s> (Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models)

ChatGPT is attracting a cross-field interest as it provides a language interface with remarkable conversational competency and reasoning capabilities across many domains. However, since ChatGPT is trained with languages, it is currently not capable of processing or generating images from the visual world. At the same time, Visual Foundation Models, such as Visual Transformers or Stable Diffusion, although showing great visual understanding and generation capabilities, they are only experts on specific tasks with one-round fixed inputs and outputs. To this end, We build a system called \textbf{Visual ChatGPT}, incorporating different Visual Foundation Models, to enable the user to interact with ChatGPT by 1) sending and receiving not only languages but also images 2) providing complex visual questions or visual editing instructions that require the collaboration of multiple AI models with multi-steps. 3) providing feedback and asking for corrected results. We design a series of prompts to inject the visual model information into ChatGPT, considering models of multiple inputs/outputs and models that require visual feedback. Experiments show that Visual ChatGPT opens the door to investigating the visual roles of ChatGPT with the help of Visual Foundation Models. Our system is publicly available at \url{https://github.com/microsoft/visual-chatgpt}.

翻译：热能、热能、热能、多领域交流能力与推理能力都具有卓越的语言界面,因此吸引了各方面的兴趣。然而,由于热能、热能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高能、高、高、高、高能、高、高、高、高、高、高、高、高、高能、高能、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高、高能、高能、高、高、高、高、高、高、高能、高能、高、高、高、高</s>

相关内容

ChatGPT

关注 257

ChatGPT（全名：Chat Generative Pre-trained Transformer），美国OpenAI 研发的聊天机器人程序 [1] ，于2022年11月30日发布。ChatGPT是人工智能技术驱动的自然语言处理工具，它能够通过学习和理解人类的语言来进行对话，还能根据聊天的上下文进行互动，真正像人类一样来聊天交流，甚至能完成撰写邮件、视频脚本、文案、翻译、代码，写论文任务。 [1] https://openai.com/blog/chatgpt/