TEXT TO VIDEO THE NEW FRONTIER
When we look at the Indian political scene and tech leadership being provided to this country, suffice to say, that to make to the top is beyond a dream, even to get recognized is a tall order. It can only happen only when the governments and parties accept that if there is something that can change the fate of this nation, it is technology. We are certainly not talking about the political technology so visible all over the public domain, more so in the social media. Political IT we can term it. Whether they want to miss out on Artificial Intelligence, governments can take a call and just allow us to become a booming market for the digital colonisers.
The AI revolution is in full swing, despite detractors talking on the contrary. Release of ChatGPT has been a watershed moment in our recent march of technology and lots of myths and misconceptions have been laid to rest for all times to come. Video rules the roost in political propaganda and electioneering, electorate is fed on small and incisive videos, and it cuts across all barriers. Imagine if AI could do a better job. The large language models were making barriers creating new use cases every and spearheading the march towards Artificial General Intelligence, simultaneously text to image and image to text models appeared on to the scene. DALL-E to Adobe’s Firefly and others have proven what it can deliver and that they have the potential to disrupt any number of industries.
Now a giant leap ahead. Text to video is the final frontier or so it seems now. NVIDIA has just released a tool and a research paper which is talking the industry by storm. It is termed insane. This is way things are being described in the AI world. Text to video was thought to be impossible. Just imagine a video being created to your satisfaction on the text prompt – ”sunset time lapse with moving clouds and colors in the sky, 4k, high resolution.” It is already pretty close to reality. The NVIDIA research paper explains, that in doing so, we turn the publicly available state-of-the -art text-to-image LDM Stable Diffusion into an efficient and expressive text-to-video model with resolution up to 1280×2048.
So, it was transformed into text to video editor, tweaking is still being done, and it is still not known, when it would be released. The way DALL-E2 transformed can give us some fair idea of the pace and validation of exponential development from the current stage onwards. As of now certain types of text to videos work and certain types of text to video does not work. One more prompt displayed is “a fantasy landscape trending on artstation, 4k, high resolution.” Another one “Turtle swimming in ocean” seems to be struggling with moving parts. It is not Midjourney’s level quality. Another interesting and usual example is Driving Scene Video Generation, which can be immensely useful in a variety of connected areas. Nascent yet solid enough to make it potential clear.
DISRUPT OR BE DISRUPTED AND WORST STILL BE IRRELEVANT.