DailyPost 3030
PHYSICAL AI; AI’s NEXT FRONTIER!
That AI is quintessential to our existence is a foregone conclusion with LLMs already becoming a way of life and finding its way in a variety of tools impacting our existence in a big way. The LLMs have become multimodal and are getting integrated into everything at a frightening pace. What then is the next step in AI? Is something extremely far reaching is about to happen in this space? Or is there some space of our regular functioning that is left out? Generative AI is the current step and then we have agentic AI which is going to happen this year.
Beyond this and different from it is the Physical AI, which happens to be the next AI frontier, which can broadly be defined as a transformation of robotics, a small example of it can be self-driving cars too. Model performance here is directly related to data availability but here the physical world data is costly to capture, curate and label. Nvidia’s Cosmos utilises a powerful generative model called the World Foundation Models (WFMs). As they are pre-trained on massive datasets of real-world videos, they gain the capability to understand and predict behaviour of objects.
By using advanced tokenization video is efficiently compressed and processed. This method is able to bring down the computational cost of training and using WFMs. A data processing pipeline as a part of the platform accelerates the preparation and training data. Testing and validating model performance, a critical element is a part of the platform. The WFMs need to reach the level of LLMs of the AI world to deliver at that level. The transformer model has to come to the Physical AI, the creation of tokens, processing and finally ending with one token / output / text.
Instead of a prompt it is a request for physical action, instead of a token ending up in text, it has to end up in getting the physical action taken. This is very critical to the future of robotics. Nvidia CEO says the technology is round the corner. We need to create the world models as opposed to GPT. It has to understand the language of the world; the physical dynamics. It needs to understand geometric and spatial relationships. Object permanence has also to be understood by these models. Where does the humungous data come from for training the humanoid robots? Nvidia is leading the way with its tools and platforms, including the leading ones like Cosmos and Omniverse. What OpenAI, GPTs and finally ChatGPT did to make AI in the LLM mode utilitarian and all pervasive, is being done by Nvidia for Physical AI, if we go through their current capabilities, pronouncements and crystal clear clarity about Physical AI future.
FROM LARGE LANGUAGE MODELS TO MASTERING ROBOTICS SEEMS TO THE TRAJECTORY OF AI.
Sanjay Sahay