The pace, precision and quantum of AI is dependent on the tech and nature of compute which is provided at the backend. It can be called as the primordial technology which really decides on the training of AI models and the way it is able to perform. The nascent compute models of AI were rules-based symbolic systems from the 1950s, but the first compute model to make its presence felt in a big way, in a way pioneering AI has been Recurrent Neural Networks (RNNs). It provided a new way for machines to understand and generate human language, which was a significant leap forward from previous models.
RNNs have been highly transformative, particularly in the fields of natural language processing (NLP) and speech recognition. They were the first neural network architecture to effectively handle sequential data with variable lengths and remember context over time. RNNs are designed to process sequential data and have a ‘memory’ that allows them to use information from previous inputs to influence the current output. This was the beginning of a compute showing clear signs of the AI age, but the transformational and mainstay compute of the AI breakthrough was still to come.
It arrived in the form of Transformers, which have kept on improving since its inception, and every single AI model has been built on it. A Transformer is a deep neural network architecture designed for processing sequential data. The Transformer model came into being in 2017, with the publication of the seminal paper, “Attention Is All You Need.” It was invented by eight researchers at Google Brian. The Transformer architecture has been instrumental in the recent explosion of AI. Its contribution lies in the enabling of AI models, revolutioning NLP and going beyond text, a multimodal existence.
Are the transformers reaching an end is a question floated at times. Does not seem likely now. Mixture of Recursions (MoR) is making waves. It unifies two key efficiency paradigms: parameter sharing and adaptive computation. MoR reuses a shared stack of layers multiple times. The lightweight router it uses dynamically decides how many times each individual word or “token” needs to be processed. It is two times faster for inference, requires 50% less memory, and needs half the training compute needed for transformers. This comes from Google DeepMind fold once again, lends significant credibility and attention of the AI community.
MoR’s CLAIMS OF SUPERIOR EFFICIENCY MAKES IT A CHALLENGER FOR THE NEXT GEN AI COMPUTE.
Sanjay Sahay
Have a nice evening.