THE ORION AND THEREAFTER- UNRAVELLING AI MODELS

We move from where we left yesterday, Orion for the uninitiated in the AI field is GPT-4.5, a much awaited LLM model by OpenAI after the launch of GPT-4 around two years back. Two years is a very long time in the exponentially paced AI industry. As per the media the unreleased Orion was better at some language tasks than GPT-4, but its capabilities have not significantly improved at coding and mathematics over the last model. As per the Wall Street Journal, the results of the two long training runs of Orion, lasting a couple of months in each case, fell short of the company’s expectations.

There have been some known skeptics to the current approaches to AI and among the leading ones is Prof.Gary Marcus, of Cognitive Science at New York University. A non-believer in scaling laws, he says that Open AI cannot overcome the fundamental limitations of GPT-style models. As per the Prof. these models cannot differentiate between fact and the faction and hence the “hallucination,” that is associated with these models. In the same vein he says GPT-5 is not likely to solve these challenges and be conspicuously better than GPT-4.

The OpenAI spokesperson leaves the question open-ended regarding marked improvement in performance of GPT -5. He went ahead with its o1 and o3 models as well computer-using Operator agent AI system and Deep Research. Though showing faith in the so far proven scaling laws, Altman now also refers to resources to train and run an AI model. This is another way of speaking about the chain of thought approach. Ilya Sutskever, OpenAI’s former Chief Scientist, feels that “pre-training as we know it will unquestionably end.” With as many as 1.8 trillion parameters, GPT-4 would have already exhausted the internet and large private datasets. Where will data come for the next foundational models?

Synthetic data can have its own drawbacks and can lead to a phenomenon called “model collapse.” Costs also mushroom as the Data Centers get bigger and bigger. With Stargate and other funding and investment mechanisms, money may still be organised. Most AI companies now realise that “there is low hanging fruit” with the reasoning models. “There are a lot of things we can grab on these reasoning models without having to train GPT-5.” The future is about reasoning and chain of thought. DeepSeek has proven it to the world. Anthropic, the OpenAI rival, says that synthetic data still remains promising. Earlier this week it released the Claude 3.7 Sonnet model. It is the first model on the hybrid approach. The model is able to determine whether it can give an instant answer or if it needs to use chain of thought and spend more time and compute to provide an answer.

HYBRID AI MODELS CAN BE FUTURE HENCEFORTH. A WORK IN PROGRESS.
Sanjay Sahay

Have a nice evening.

Leave a Comment

Your email address will not be published. Required fields are marked *


The reCAPTCHA verification period has expired. Please reload the page.

Scroll to Top