On the Frontier Math benchmark by EpochAI, o3 solved 25.2 % of issues, whereas no different mannequin has exceeded 2 %—suggesting a leap in mathematical reasoning capabilities over the earlier mannequin.
Benchmarks vs. real-world worth
Ideally, potential purposes for a real PhD-level AI mannequin would come with analyzing medical analysis information, supporting local weather modeling, and dealing with routine elements of analysis work.
The excessive value factors reported by The Info, if correct, counsel that OpenAI believes these methods might present substantial worth to companies. The publication notes that SoftBank, an OpenAI investor, has dedicated to spending $3 billion on OpenAI’s agent merchandise this 12 months alone—indicating vital enterprise curiosity regardless of the prices.
In the meantime, OpenAI faces monetary pressures which will affect its premium pricing technique. The corporate reportedly misplaced roughly $5 billion final 12 months masking operational prices and different bills associated to working its providers.
Information of OpenAI’s stratospheric pricing plans come after years of comparatively inexpensive AI providers which have conditioned customers to count on highly effective capabilities at comparatively low prices. ChatGPT Plus stays $20 per 30 days and Claude Professional prices $30 month-to-month—each tiny fractions of those proposed enterprise tiers. Even ChatGPT Professional’s $200/month subscription is comparatively small in comparison with the brand new proposed charges. Whether or not the efficiency distinction between these tiers will match their thousandfold value distinction is an open query.
Regardless of their benchmark performances, these simulated reasoning fashions nonetheless battle with confabulations—situations the place they generate plausible-sounding however factually incorrect data. This stays a vital concern for analysis purposes the place accuracy and reliability are paramount. A $20,000 month-to-month funding raises questions on whether or not organizations can belief these methods to not introduce refined errors into high-stakes analysis.
In response to the information, a number of individuals quipped on social media that corporations might rent an precise PhD pupil for less expensive. “In case you’ve gotten forgotten,” wrote xAI developer Hieu Pham in a viral tweet, “most PhD college students, together with the brightest stars who can do method higher work than any present LLMs—are usually not paid $20K / month.”
Whereas these methods present sturdy capabilities on particular benchmarks, the “PhD-level” label stays largely a advertising time period. These fashions can course of and synthesize data at spectacular speeds, however questions stay about how successfully they will deal with the inventive considering, mental skepticism, and unique analysis that outline precise doctoral-level work. Then again, they may by no means get drained or want medical insurance, and they’re going to doubtless proceed to enhance in functionality and drop in value over time.