On March 5, OpenAI released GPT-5.4. If you only read the headline, you might think it’s just another model update, a bigger number, slightly better benchmarks, move along.

It’s not. GPT-5.4 is the first general-purpose AI model from OpenAI that can actually operate your computer. Not just answer questions about it. Not just write code for it. Operate it. Click buttons, open applications, navigate between windows, fill out forms, and complete multi-step workflows across your desktop.

This is what the AI industry has been calling “agentic AI,” and GPT-5.4 is OpenAI’s biggest bet on it yet.

What “Computer Use” Actually Means

Let’s be specific, because “AI that uses your computer” can mean a lot of things.

GPT-5.4 can see your screen. It can issue keyboard and mouse commands. It can write code to interact with your operating system. And it can chain all of this together into workflows that span multiple applications.

Say you need to pull data from a spreadsheet, format it into a report, email it to three people, and update a project tracker. Right now, that’s 20 minutes of clicking between apps. With GPT-5.4’s computer use capabilities, you describe what you want, and it does the clicking for you.

OpenAI scored 75% accuracy on OSWorld-Verified, a benchmark that tests how well AI can actually operate a computer. That might not sound perfect, but consider that this benchmark asks the AI to complete real tasks across real operating systems. Three out of four times, it gets it right without human intervention.

The model also topped the WebArena-Verified and Mercor APEX-Agents benchmarks, both of which test professional workflow automation.

The Numbers That Matter

Beyond computer use, GPT-5.4 is a significant technical upgrade:

Context window: 1 million tokens. That’s roughly 750,000 words, or about 3,000 pages of text. In the Codex app, this means the model can work with entire codebases at once, not just individual files. For context, GPT-4 launched with a maximum of 128,000 tokens. This is almost 8x that.

33% fewer errors in individual responses compared to GPT-5.2. OpenAI measured this across a broad range of tasks, from coding to analysis to writing.

18% fewer mistakes overall when measured across complete workflows, not just single responses.

92.8% on GPQA Diamond, which tests graduate-level scientific reasoning. For perspective, many human PhD students score in the 60-70% range on this benchmark.

57.7% on SWE-Bench Pro, a coding benchmark that asks the AI to solve real software engineering problems from actual GitHub repositories.

83% on GDPval across 44 different professional occupations, a benchmark that tests whether AI can perform knowledge work at a professional level.

And perhaps most importantly for cost-conscious users: GPT-5.4 uses significantly fewer tokens to solve problems than GPT-5.2. At $2.50 per million input tokens and $15 per million output tokens, it’s more capable AND more efficient.

Why This Release Is Different

Every major AI lab has been racing toward “agentic” capabilities for the past year. Anthropic launched Claude’s computer use features in late 2024. Google has been building agentic features into Gemini. But OpenAI is doing something different by baking computer use directly into its flagship model rather than treating it as a separate feature.

This matters because it means the same model that’s good at reasoning, coding, and analysis is also the one controlling your computer. There’s no handoff between a “thinking” model and a “doing” model. GPT-5.4 thinks and acts in the same context.

OpenAI is also pushing this squarely at enterprise customers. The model launched simultaneously in ChatGPT, the API, and Codex (OpenAI’s coding platform). GPT-5.4 Thinking is available to Plus, Team, and Pro users. GPT-5.4 Pro is available via the API and for Enterprise and Edu subscribers.

The message is clear: OpenAI wants to be the platform that automates professional work, not just assists with it.

The Elephant in the Room

This launch didn’t happen in a vacuum. OpenAI has had a rough few weeks.

The company recently made the controversial decision to work with the U.S. Department of Defense, which reportedly cost it 1.5 million users and sparked significant internal opposition from staff. Meanwhile, Anthropic publicly refused similar partnerships, drawing a sharp contrast in how the two leading AI companies approach government and military work.

OpenAI also just closed a $110 billion funding round, the largest private financing in history, led by Amazon ($50 billion), SoftBank ($30 billion), and Nvidia ($30 billion). That puts the company’s valuation at $730 billion. For context, that’s roughly the market cap of major public companies like Berkshire Hathaway.

With that much money raised and that much controversy surrounding the company, GPT-5.4 needed to be good. And on the technical merits, it appears to be.

What This Means for Regular People

If you’re a ChatGPT Plus subscriber ($20/month), you’ll get access to GPT-5.4 Thinking, which means better reasoning, fewer errors, and improved research capabilities compared to what you’ve been using.

The computer use features are more relevant for teams and businesses right now. The workflows that benefit most are repetitive professional tasks, the kind where you’re switching between apps, copying data, and following procedures. Think expense reports, data entry, report generation, and project management updates.

For developers, the 1 million token context window is the headline feature. Being able to feed an entire codebase into a single prompt changes how you work with AI. Instead of carefully selecting which files to include, you include everything and let the model figure out what’s relevant.

The Bigger Picture

GPT-5.4 represents a shift in what AI models are supposed to do. For the past three years, the AI industry has been building increasingly powerful chatbots. Better at answering questions, better at writing, better at coding. But fundamentally, they’ve been tools that respond when you ask them something.

Agentic AI flips that. Instead of asking the AI a question and getting an answer, you give the AI a goal and it figures out how to accomplish it. It plans. It acts. It adapts when something goes wrong. And with computer use, it does all of this in the same environment you work in, your actual desktop.

We’re still early in this transition. A 75% success rate on computer tasks means one in four attempts will need human correction. The model still hallucinates sometimes. Complex multi-step workflows can go sideways.

But the direction is unmistakable. The AI on your computer is no longer just answering your questions. It’s starting to do your work.

And that changes everything about how we’ll think about productivity, employment, and the tools we use every day.


GPT-5.4 is available now in ChatGPT for Plus, Team, and Pro subscribers. API access is available for developers and Enterprise/Edu customers.