It seems that DeepSeek has changed it all. The rules of the game of the foundational AI models has to be redefined. DeepSeek shook US markets while garnering humongous international attention for primarily two reasons. First it was for having developed an advanced AI model that could rival the ones of well established tech giants and second, it was achieved at a cost and resources which was miniscule in comparison. Without getting into the further details of the DeepSeek story, what it brought to the fore is a process for creation of these models, that has been termed as distillation.
While DeepSeek has been making news, the innovative AI tech process did not catch much of the world’s imagination. If distillation is path breaking, we need to know what it is in the first place. Distillation in the creation of LLM is a process that transfers knowledge from a large, complex model (teacher) to a smaller, more efficient model (student). There are five basic points to this model. First being knowledge transfer, wherein the student model learns to replicate the teacher model’s behaviour, improving its efficiency. Second is soft label learning. Instead of raw data, the student model learns from the teacher’s output probabilities, which translates into richer information rather than just correct answers.
The third basic point is what we call model compression by which the size and the computational cost of the model is reduced, helping in fast and easy deployment. Performance optimization is another key point. The student model is fine tuned to retain essential knowledge only. Fifth and the final point is training efficiency; the process allows smaller models to reach near-teacher level performance with significantly fewer resources. What does it mean for the fast moving AI enabled world? The technique is being widely used to make LLMs more accessible for real world applications, including mobile devices and edge computing.
While DeepSeek was a LLM tsunami of kinds, the other models functioning on this technology and process are; DistilBERT, DistilGPT2, TinyBERT, Gemini Nano etc. DeepSeek’s Distilled Models are initialized from other pretrained models, such as LLaMA and Qwen, which are fine tuned on synthetic data generated by their R1 model. Generative AI is giving way to another gigantic AI revolution in the making; the one of Agentic AI. 2025 has already been accepted as the year of Agentic AI. Given the unique capabilities distillation provides to its precision guided AI models, autonomous Agent AI being able to do most of the tasks is in the offing.
IF YOU CAN FOLLOW THE TECHNOLOGY TRENDS, IT BECOMES EASIER TO UNDERSTAND PRODUCTS AND THE OUTCOMES.
Sanjay Sahay
Have a nice evening.