OpenAI Introduces o3-pro, a New AI Model Tailored for Technical Tasks Such as Mathematics and Coding. More Accurate, Yet Still Based on Simulated Reasoning.
Amazon co-founder MacKenzie Scott has donated over $19 billion to charity in just five years
Diamond batteries powered by nuclear waste promise 28,000 years of clean energy
AI That Thinks Out Loud
Just released a few hours ago for ChatGPT Pro and Team subscribers, o3-pro has replaced the former high-end model o1-pro. Designed to handle complex tasks, it tackles technical fields such as advanced mathematics, physics, and coding. Unlike more “generalist” models, o3-pro operates on a principle of simulated reasoning: it “thinks out loud” by producing a series of intermediate steps instead of delivering an immediate answer. The result: increased accuracy, though with a bit more slowness.
Enhanced Model for Technical Tasks
o3-pro utilizes multiple integrated tools: image analysis, Python code execution, web search, and file analysis. These enhancements enable it to provide more thorough responses, especially in terms of logic and data processing. In OpenAI’s internal benchmarks, o3-pro outperforms its predecessors and direct competitors, notably on tests like math olympiads (93% success rate) and doctoral-level science (84% success rate). It also proves to be clearer, more comprehensive, and better at adhering to guidelines.
Lower Pricing to Attract Developers
On the API front, OpenAI has made a significant move. o3-pro is now offered at $20 per million input tokens, and $80 per output token, which is 87% lower than o1-pro. The standard model o3 has also been reduced from $10 to $2 (input) and from $40 to $8 (output). The goal: to make these models more accessible to developers and small businesses who previously found the cost prohibitive.
What’s the Verdict?
Despite these advancements, models like o3-pro don’t truly “think.” They sequence patterns learned from training data, with varying degrees of rigor. They can fail on entirely new problems, or continue making mistakes without realizing it. Useful? Yes. Infallible? Not yet.
These are increasingly interesting advancements, but it’s starting to get a bit confusing with the multitude of models by OpenAI. It would be beneficial if Sam Altman’s promise to consolidate all models into one, allowing the AI to select the best model based on the query, could be implemented, at least optionally. Do you use ChatGPT? If so, which model?
In the wake of reading this article, remember to follow us on X!
