NATIVELY MULTIMODAL AI MODEL

DailyPost 2645
NATIVELY MULTIMODAL AI MODEL

There are good enough chances that you would not have heard of this terminology, even when you have been hearing of ChatGPT on a regular basis since it hit the global tech world by storm on 30th of Nov2023. Disruption is the name of the game in the tech world and given the dependence of today’s world on tech, it becomes a universal disruptor. ChatGPT has been ripping apart the world in its favour as no other product in human history. When we start discussing the nitty gritty of a high technology then it is clear cut indication that the technology has arrived.

Google seemed to be getting into the side lines as OpenAI took the centre-stage with GPTs and then the conversational AI gamechanger ChatGPT. Sam Altman, the CEO of OpenAI, became the most watched person of the tech world. Google controlled more than 90% of the search market and that it would come up with a formidable answer was expected. They have been serious contenders of the AI throne at least since the acquisition of DeepMind nearly a decade back under the stewardship of the AI prodigy Demis Hassabis. The answer is there in the open in the form of Gemini, a product which promises to challenge the supremacy and the likely chances of tech hegemony of ChatGPT.

The main tech breakthrough of Gemini that is being flaunted as the gamechanger, is that Gemini unlike ChatGPT is a “natively multimodal AI modal.” ChatGPT on the other hand, based on GPT, is a large language model model, its genesis being primarily on language. This is termed as just a multimodal LLM. A comparison with GPT-4 would not be out of place. GPT-4 is also multimodal model, is not one dense model. “It is based on the “Mixture of Experts” architecture with 16 different models stitched together for different tasks.”

Gemini on the contrary has been designed ground up to be multimodal with text, image, audio and code, “all trained together to form a powerful AI system.” The native multimodal capability of Gemini facilitates it to process simultaneously information across modalities seamlessly. The difference is crystal clear with examples of audio processing, as an outstanding teaching aid by the quality of teaching, reasoning and substantiation, proficiency with most programming languages etc. Gemini Pro beats OpenAI’s GPT-3.5 model on several benchmarks. Gemini Ultra is more than a fitting reply OpenAI, it being the largest and most capable model with full suite of multimodal capabilities from Google. The AI war has just begun, future of AI and the pathbreaking companies will keep on being written at a threatening pace.

FOR AI USERS THE AI WAR PROMISES A FULL-BLOWN AI CAPABILITY AT OUR FINGERTIPS, SOONER THAN LATER.
Sanjay Sahay

Have a nice evening.

Leave a Comment Cancel Reply