WHERE DOES META AI MODELS SCORE!

Not a week passes in the world of AI without some news / product / service promising something so revolutionary, that it redefines the way we work and output we can deliver. The drudgery of work; professional and otherwise is decreasing at an exponential pace. AI is bringing about the much required quality and predictability at work. The timelines have not only been abridged, it can be planned to precision. The AI compute promises you that. The euphemistically created maxim, the world on your fingertips is turning out to be a reality.

There are two ways we would explain the world at your fingertips, leaving the resources issue out for the moment. First is that AlphaZero, an AI model, learned to play chess in just four hours. It was developed by DeepMind in 2017. Four hours and the whole of human history of chess, what a battle. The second is to ask *any question about this world and the *LLM is competent enough to answer. Where are we heading to now? Google DeepMind CEO opened his mind, in a recent podcast, stating Google plans to eventually combine Gemini AI models with its video generating models.

Gemini was designed to be multimodal from the very beginning. Universal digital assistant is the aim. It needs help across the board in the real world. Amazon has announced plans to launch an “any-to-any” model this year. Will it be a one off offering? No, that is not the case. The AI industry is slowly, yet consistently moving towards “omni,” that is where all the AI research and learning is getting consolidated. The AI industry’s game plan is to create models that can understand and synthesize many forms of media. Recent developments of AI behemoths proves it all. Google’s newest model can handle audio besides images and text.

OpenAI’s default model ChatGPT can now create images, including, of course, Studio Ghibli-style art. Video is the new frontier in making “any-to-any” or omni models, whatever you call them, a grand success. The training of Google’s Veo mostly comes from YouTube, a platform that Google owns. This Google claims would be in accordance to its agreement with the YouTube creators. For this purpose the company broadened its terms of service last year. The fact of the matter is that these “any-to-any” models require a lot of training data – images, videos, audio, text and so on. The trajectory of the AI industry is crystal clear.

ANY-TO-ANY SEEMS TO BE THE BIGGEST WATERSHED MOMENT BEFORE AGI.
Sanjay Sahay

Have a nice evening.

Leave a Comment

Your email address will not be published. Required fields are marked *


The reCAPTCHA verification period has expired. Please reload the page.

Scroll to Top