LLM BATTLE – MULTIMODALITY WILL BE THE CLINCHER

DailyPost 2640
LLM BATTLE – MULTIMODALITY WILL BE THE CLINCHER

We have landed in the AI age much sooner than predicted. Having reached the first destination, the pace seems to speeding up every single day. With barely one year of the public advent of ChatGPT, the challengers are out with the daggers drawn. ChatGPT for sure has the first mover advantage, but will that be enough to lead the market at least for a few years, if not decades. With the foundational work done and known, the competitors who have, they feel, missed by whisker, are throwing themselves completely, to win the game.

This is happening as it is has become very clear that the company with a distinct AI edge would be the winner in this race. It would become the next tech ruling deity of the world. The main change from ChatGPT, large language model, is multimodality, that is being attached to these models, which seems to be the gamechanger. Some amount of multimodality is claimed by ChatGPT, but it happens to be a hugely language centric model as it stands now. The challenger is none other than Gemini, the product of Google DeepMind, the brain behind it being Demis Hassabis, who was the poster boy of the AI world till Sam Altman pulled rug from right below him.

As Sunder Pichai claims that Google is in the timeless mission to organise world’s information and make it usable to all. Cognition needs to be mastered by the machine. Gemini is the challenger in the market. It truly multimodal and created ground up and not just a stitching of modalities. It would understand the world the way we do, and input and output flow in a human manner, but the results would be superhuman because of the models’ capabilities would be superhuman. Gemini is both foundational and impact making across the Google ecosystem. There are endless opportunities to put this foundational model to use. The combination of modes provides insights which can be pathbreaking. That information otherwise could have not been made available to the world.

A recent video demo shows the full range of multimodality. Starting with computer vision, it unwinds through audio, text, connects and what not. Gemini Ultra beats ChatGPT on 7 of the 8 benchmark categories. The categories being general, reasoning, math, code, image, audio, video etc and then its relevant subcategories. The multimodal reasoning capabilities as projected is mindboggling. A bespoke interface demo was impressive with nothing coded up in advance. Search of scientific paper on very precise yet fully flexible methodology brought its full potential to the fore. It can be used in humungous areas of law, finance etc. The context length of 32768 tokens will give you a fair idea of its capability. The video understanding capability leaves nothing more to be desired. Who knows tomorrows robotics will get attached to such models, it will make it fully multimodal and physical too? Multimodal LLMs will then start interacting with the world gainfully.

THE WINNERS OF THIS AI BATTLE WILL RULE THE WORLD; COMPANIES, INSTITUTIONS AND COUNTRIES.
Sanjay Sahay

Have a nice evening.

Leave a Comment

Your email address will not be published. Required fields are marked *


The reCAPTCHA verification period has expired. Please reload the page.

Scroll to Top