ArtGPT-4: Artistic Vision-Language Understanding with Adapter-enhanced MiniGPT-4 - 专知论文

会员服务 ·

0

MoDELS · 可理解性 · 缩放 · GPT-4 · state-of-the-art ·

2023 年 5 月 12 日

ArtGPT-4: Artistic Vision-Language Understanding with Adapter-enhanced MiniGPT-4

翻译：暂无翻译

Zhengqing Yuan,Huiwen Xue,Xinyi Wang,Yongming Liu,Zhuanzhe Zhao,Kun Wang

from arxiv, 13 pages

In recent years, large language models (LLMs) have made significant progress in natural language processing (NLP), with models like ChatGPT and GPT-4 achieving impressive capabilities in various linguistic tasks. However, training models on such a large scale is challenging, and finding datasets that match the model's scale is often difficult. Fine-tuning and training models with fewer parameters using novel methods have emerged as promising approaches to overcome these challenges. One such model is MiniGPT-4, which achieves comparable vision-language understanding to GPT-4 by leveraging novel pre-training models and innovative training strategies. However, the model still faces some challenges in image understanding, particularly in artistic pictures. A novel multimodal model called ArtGPT-4 has been proposed to address these limitations. ArtGPT-4 was trained on image-text pairs using a Tesla A100 device in just 2 hours, using only about 200 GB of data. The model can depict images with an artistic flair and generate visual code, including aesthetically pleasing HTML/CSS web pages. Furthermore, the article proposes novel benchmarks for evaluating the performance of vision-language models. In the subsequent evaluation methods, ArtGPT-4 scored more than 1 point higher than the current \textbf{state-of-the-art} model and was only 0.25 points lower than artists on a 6-point scale. Our code and pre-trained model are available at \url{https://huggingface.co/Tyrannosaurus/ArtGPT-4}.

翻译：暂无翻译

1

相关内容

MoDELS

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

96+阅读 · 2020年4月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

自然语言处理顶会EMNLP2018接受论文列表！

自然语言处理顶会EMNLP2018接受论文列表！

专知

87+阅读 · 2018年8月26日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【论文推荐】最新八篇图像描述生成相关论文—比较级对抗学习、正则化RNNs、深层网络、视觉对话、婴儿说话、自我检索

【论文推荐】最新八篇图像描述生成相关论文—比较级对抗学习、正则化RNNs、深层网络、视觉对话、婴儿说话、自我检索

专知

10+阅读 · 2018年4月12日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

长链非编码RNA- - MIR17HG靶向TGFβ/BMP信号通路调控腭发育的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

Tob蛋白及相关基因通路对骨-肌腱接合部组织形态重塑的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

高表达ACE2基因的内皮祖细胞对肺动脉高压的干预作用及机制

国家自然科学基金

0+阅读 · 2012年12月31日

非酒精性脂肪发病新机制：PNPLA3参与炎症相关肝细胞损伤与凋亡的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

炎症通过mTOR信号通路导致脂肪组织储脂能力下降

国家自然科学基金

0+阅读 · 2011年12月31日

新型轻质高温γ1 +γ双相TiAl-Nb金属间化合物的强韧化机制

国家自然科学基金

0+阅读 · 2011年12月31日

超窄滞后Ti-Ni-Cu-X（X=Pd, Pt, Au）记忆合金薄膜的马氏体相变与记忆效应稳定性机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

附睾蛋白酶抑制剂(EPPIN)基因转录调控的分子机理

国家自然科学基金

0+阅读 · 2009年12月31日

hTERT调控相关miRNA的鉴定及功能研究

国家自然科学基金

0+阅读 · 2009年12月31日

Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition

Arxiv

0+阅读 · 2023年6月28日

Investigating Cross-Domain Behaviors of BERT in Review Understanding

Arxiv

0+阅读 · 2023年6月27日

SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL

Arxiv

0+阅读 · 2023年6月25日

Enhancing Dynamic Image Advertising with Vision-Language Pre-training

Arxiv

0+阅读 · 2023年6月25日

Vision-Language Models can Identify Distracted Driver Behavior from Naturalistic Videos

Arxiv

0+阅读 · 2023年6月22日

Understanding Social Reasoning in Language Models with Language Models

Arxiv

0+阅读 · 2023年6月21日

Augmented Large Language Models with Parametric Knowledge Guiding

Arxiv

20+阅读 · 2023年5月8日

Transformers Meet Visual Learning Understanding: A Comprehensive Review

Arxiv

28+阅读 · 2022年3月24日

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Arxiv

12+阅读 · 2020年2月19日

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

Arxiv

19+阅读 · 2020年2月15日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

96+阅读 · 2020年4月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【牛津大学博士论文】将序列结构与几何结构融入深度神经网络

工程视角：影响战争进程的小型无人机

企业级AI应用开发：从技术选型到生产落地

AI生成代码缺陷综述

相关资讯

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

自然语言处理顶会EMNLP2018接受论文列表！

自然语言处理顶会EMNLP2018接受论文列表！

专知

87+阅读 · 2018年8月26日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【论文推荐】最新八篇图像描述生成相关论文—比较级对抗学习、正则化RNNs、深层网络、视觉对话、婴儿说话、自我检索

【论文推荐】最新八篇图像描述生成相关论文—比较级对抗学习、正则化RNNs、深层网络、视觉对话、婴儿说话、自我检索

专知

10+阅读 · 2018年4月12日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

相关论文

Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition

Arxiv

0+阅读 · 2023年6月28日

Investigating Cross-Domain Behaviors of BERT in Review Understanding

Arxiv

0+阅读 · 2023年6月27日

SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL

Arxiv

0+阅读 · 2023年6月25日

Enhancing Dynamic Image Advertising with Vision-Language Pre-training

Arxiv

0+阅读 · 2023年6月25日

Vision-Language Models can Identify Distracted Driver Behavior from Naturalistic Videos

Arxiv

0+阅读 · 2023年6月22日

Understanding Social Reasoning in Language Models with Language Models

Arxiv

0+阅读 · 2023年6月21日

Augmented Large Language Models with Parametric Knowledge Guiding

Arxiv

20+阅读 · 2023年5月8日

Transformers Meet Visual Learning Understanding: A Comprehensive Review

Arxiv

28+阅读 · 2022年3月24日

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Arxiv

12+阅读 · 2020年2月19日

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

Arxiv

19+阅读 · 2020年2月15日

相关基金

长链非编码RNA- - MIR17HG靶向TGFβ/BMP信号通路调控腭发育的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

Tob蛋白及相关基因通路对骨-肌腱接合部组织形态重塑的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

高表达ACE2基因的内皮祖细胞对肺动脉高压的干预作用及机制

国家自然科学基金

0+阅读 · 2012年12月31日

非酒精性脂肪发病新机制：PNPLA3参与炎症相关肝细胞损伤与凋亡的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

炎症通过mTOR信号通路导致脂肪组织储脂能力下降

国家自然科学基金

0+阅读 · 2011年12月31日

新型轻质高温γ1 +γ双相TiAl-Nb金属间化合物的强韧化机制

国家自然科学基金

0+阅读 · 2011年12月31日

超窄滞后Ti-Ni-Cu-X（X=Pd, Pt, Au）记忆合金薄膜的马氏体相变与记忆效应稳定性机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

附睾蛋白酶抑制剂(EPPIN)基因转录调控的分子机理

国家自然科学基金

0+阅读 · 2009年12月31日

hTERT调控相关miRNA的鉴定及功能研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员