+7 (000) 000 00 00  

interestingengineering.com

ChatGPT gets major agent upgrade, now automates workflows from start to finish

ChatGPT’s latest upgrade brings us a step closer to Iron Man’s Jarvis; it can now...

ChatGPT’s latest upgrade brings us a step closer to Iron Man’s Jarvis; it can now do the work for you.

From planning a Sunday brunch to creating a competitor analysis slide deck, the upgraded AI can browse the web, click through interfaces, and complete tasks from start to finish, without needing step-by-step guidance.

OpenAI has introduced the new “agent mode” for Pro, Plus, and Team users.

Once activated, ChatGPT can analyze websites, interact with APIs, run code, and deliver files like slideshows or spreadsheets.

The system uses its own virtual computer to combine browsing, reasoning, and tool use, like a full-fledged digital assistant.

This means users can ask ChatGPT to “analyze three competitors and build a slideshow,” and it will do just that by clicking, filtering, typing, and delivering editable output.

The upgrade unifies two earlier capabilities. The operator could click and scroll through websites, while deep research focused on in-depth information analysis.

These tools often worked best in different situations, and many tasks sat in the gap between them.

OpenAI says combining their strengths allows ChatGPT to “actively engage websites—clicking, filtering, and gathering more precise, efficient results.”

Web, terminal, and code in one loop

With support for connectors like Gmail or GitHub, the agent can plug into a user’s apps and workflows.

When authentication is required, users can take over the browser to log in securely, after which ChatGPT resumes the task.

It can now shift between browsing webpages, downloading files, analyzing them in a terminal, and continuing the workflow, all in one uninterrupted loop.

The system remembers context between steps and supports interruption. If users need to change instructions midway, they can jump in, steer the direction, and the agent will adjust without starting over. OpenAI calls it “far more interactive and flexible than previous models.”

The model already outperforms its predecessors in evaluations. On Humanity’s Last Exam, it achieved a state-of-the-art 41.6 pass@1 score.

It also reached 27.4% accuracy on FrontierMath, considered one of the hardest math benchmarks.

Bar chart comparing FrontierMath accuracy scores, showing ChatGPT agent leading with 27.4%, ahead of OpenAI o4-mini (19.3%) and o3 (10.3%).
Credit – OpenAI

Safety guardrails amid expanded reach

With this expanded capability comes greater risk. Since the agent can interact with websites and access personal connectors, OpenAI has introduced multiple safeguards.

Tasks that involve consequences, like purchases or emails, require user confirmation.

For high-risk actions like financial transfers, the model is trained to refuse altogether.

A major threat OpenAI prepared for is prompt injection. These attacks hide malicious instructions in website code that can manipulate the AI’s behavior.

To mitigate this, OpenAI has trained the agent to resist such injections, placed monitoring systems to detect attacks, and made sure user action is required before any significant step. It warns users to disable connectors when they’re not needed.

Browsing sessions also remain private. OpenAI says, “ChatGPT does not collect or store any data you enter during these sessions, such as passwords, because the model doesn’t need it, and it’s safer if it never sees it.”

While the rollout marks a major leap, OpenAI considers this an early-stage release. Some features, like slideshow formatting and spreadsheet editing, are still in beta.

But the company says it’s working to expand capabilities, reduce errors, and support even more advanced real-world tasks in the months ahead.