THE AGENT COMPANY – Sanjay Sahay

The IT behemoths which are slowly transforming themselves into AI behemoths would like the world to believe that the marauding AI army is out to rewrite the world, and gaining capabilities in that direction exponentially. The Artificial General Intelligence, AGI, is round the corner, and then the takeover would be complete. From Gen AI to Agentic AI is the current AI transition, which is projected to transform the way we work, and a combination of these might give the workforce a run for its money. So, we will be back to the company run by AI agents.

Do we start with the question as to how far are we from AGI? Or the more pertinent question would be, how far are we from Sentient AI? For the uninitiated, Sentient AI refers to AI technologies that can process emotions and perceive the world like human beings. Currently, the AI is not sentient; it doesn’t understand or perceive the world in any way. The score on individual tasks does not add up into a combination of expertise sufficient enough to run the show even in the most average manner. In crude terms, for the AI industry, the world is a market, which has been nearly captured. Rest would be a formality.

The neutral researchers have another story, that too from the fold of Carnegie Mellon University. The name of the experiment is TheAgentCompany, as expected. It’s a fake software company of AI agents; AI models designed to perform tasks on its own. The artificial workers came from Google, OpenAI, Anthropic and Meta. They filled all roles from a financial analyst to a software engineer to a chief technical officer. The AI agents were provided with a real world environment of the day to day work of a real software company. AI agents navigated file directories, virtually toured new office spaces and wrote software reviews for engineers. The Business Insider was first to report that results were dismal.

The best performing model Anthropic’s Claude 3.5 Sonnet struggled to finish 24% of the jobs assigned. Even this performance has a prohibitive cost, averaging 30 steps with a cost $6 per task. Gemini 2.0 Flash, averaged 40 steps per finished task with a pathetic success rate of 11.4%. The worst AI employee was *Amazon’s Nova Pro v1, which finished just 1.7% of its assignments at an average of almost 20 steps. The nagging uses were lack of common sense, weak social skills and poor understanding of how to navigate the internet. Smaller tasks fine, but AI is “clearly not ready for more complex gigs humans excel at.” Machines aren’t coming for your job anytime soon.

AI WILL NOT TAKE YOUR JOB BUT THE PERSON WHO IS CONVERSANT WITH IT, WILL.
Sanjay Sahay

Have a nice evening.

Leave a Comment Cancel Reply