GPT-4o论文 - 专知

会员服务 ·

GPT-4o

GPT-4o（“o”代表“omni”）朝着更自然的人机交互迈出了一步——它可以接受任何组合的文本、音频和图像作为输入，并生成任何组合的文本、音频和图像输出。它对音频输入的响应时间最短可达232毫秒，平均为320毫秒，这与人类在对话中的响应时间相似。在英语文本和代码处理上，它的性能与GPT-4 Turbo相当，但在非英语文本处理方面有显著改进，同时在API中速度更快且成本降低50%。与现有模型相比，GPT-4o在视觉和音频理解方面尤其出色。

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Arxiv

0+阅读 · 10月24日

GeoBenchX: Benchmarking LLMs in Agent Solving Multistep Geospatial Tasks

GeoBenchX: Benchmarking LLMs in Agent Solving Multistep Geospatial Tasks

Arxiv

0+阅读 · 10月22日

IM-Chat: A Multi-agent LLM Framework Integrating Tool-Calling and Diffusion Modeling for Knowledge Transfer in Injection Molding Industry

Arxiv

0+阅读 · 10月22日

Facts are Harder Than Opinions -- A Multilingual, Comparative Analysis of LLM-Based Fact-Checking Reliability

Arxiv

0+阅读 · 10月21日

The GPT-4o Shock Emotional Attachment to AI Models and Its Impact on Regulatory Acceptance: A Cross-Cultural Analysis of the Immediate Transition from GPT-4o to GPT-5

Arxiv

0+阅读 · 10月18日

SketchMind: A Multi-Agent Cognitive Framework for Assessing Student-Drawn Scientific Sketches

Arxiv

0+阅读 · 10月20日

David vs. Goliath: A comparative study of different-sized LLMs for code generation in the domain of automotive scenario generation

Arxiv

0+阅读 · 10月15日

Visual Interestingness Decoded: How GPT-4o Mirrors Human Interests

Arxiv

0+阅读 · 10月15日

Assessing Web Search Credibility and Response Groundedness in Chat Assistants

Assessing Web Search Credibility and Response Groundedness in Chat Assistants

Arxiv

0+阅读 · 10月15日

The Silent Judge: Unacknowledged Shortcut Bias in LLM-as-a-Judge

Arxiv

0+阅读 · 10月14日

Large language models management of medications: three performance analyses

Arxiv

0+阅读 · 10月14日

An Empirical Study of Python Library Migration Using Large Language Models

Arxiv

0+阅读 · 10月12日

A Systematic Study on Generating Web Vulnerability Proof-of-Concepts Using Large Language Models

Arxiv

0+阅读 · 10月11日

Hallucination Filtering in Radiology Vision-Language Models Using Discrete Semantic Entropy

Arxiv

0+阅读 · 10月10日

Multimodal Safety Evaluation in Generative Agent Social Simulations

Arxiv

0+阅读 · 10月9日

参考链接

父主题

微信扫码咨询专知VIP会员