OpenAI 发布 GPT-5.4:最先进的前沿模型
OpenAI Introduces GPT-5.4: The Most Capable Frontier Model
OpenAI 发布 GPT-5.4,将其在推理、编码和智能体工作流方面的最新进展整合到单一前沿模型中。该模型融入了 GPT-5.3-Codex 行业领先的编码能力,同时改进了在工具、软件环境以及涉及电子表格、演示文稿和文档的专业任务中的表现。
GPT-5.4 是 OpenAI 发布的首个具备原生最先进计算机使用能力的通用模型,使智能体能够操作计算机并在跨应用程序中执行复杂工作流。它支持多达 100 万 token 的上下文,允许智能体在长时间范围内规划、执行和验证任务。
在 GDPval 基准测试中,GPT-5.4 实现了新的最先进水平,在 83.0% 的比较中达到或超过行业专业人士的表现,而 GPT-5.2 为 70.9%。在 OSWorld-Verified 测试中,GPT-5.4 达到了 75.0% 的成功率,远超 GPT-5.2 的 47.3%,甚至超过了人类 72.4% 的表现。
GPT-5.4 还是 OpenAI 最事实准确的模型:在一组用户标记事实错误的去标识化提示中,GPT-5.4 的单个声明为假的可能性比 GPT-5.2 低 33%,完整回复包含任何错误的可能性低 18%。
OpenAI releases GPT-5.4, combining recent advances in reasoning, coding, and agentic workflows into a single frontier model. It incorporates the industry-leading coding capabilities of GPT-5.3-Codex while improving how the model works across tools, software environments, and professional tasks involving spreadsheets, presentations, and documents.
GPT-5.4 is the first general-purpose model OpenAI has released with native, state-of-the-art computer-use capabilities, enabling agents to operate computers and carry out complex workflows across applications. It supports up to 1M tokens of context, allowing agents to plan, execute, and verify tasks across long horizons.
On the GDPval benchmark, GPT-5.4 achieves a new state of the art, matching or exceeding industry professionals in 83.0% of comparisons, compared to 70.9% for GPT-5.2. On OSWorld-Verified, GPT-5.4 achieves a state-of-the-art 75.0% success rate, far exceeding GPT-5.2's 47.3% and surpassing human performance at 72.4%.
GPT-5.4 is also OpenAI's most factual model yet: on a set of de-identified prompts where users flagged factual errors, GPT-5.4's individual claims are 33% less likely to be false and its full responses are 18% less likely to contain any errors, relative to GPT-5.2.