DailyPost 1844

OpenAI with its revolutionary NLP tool GPT 3 was taking its giant steps in AI, promising a different world, compared to whatever we would have even imagined of. These are the pathbreaking technologies similar to what has been our computer vision experience 2012 onwards. GPT 3 promised a transformational world in NLP. GPT 4 also seems to be in the offing. Based on the GPT 3 platform, Codex has been a revolutionary platform, though it’s a long way off from self-programming. The latest entry and the most promising one is NVIDIA, Microsoft’s new language model with 530 billion parameters, leaving GPT 3 behind. The number of parameters is three times the existing largest models; GPT 3 (175 billion parameters), Turing NLG (17 billion parameters) and Megatron -LM (8 billion parameters).

Megatron-Turing Natural Language Generation (MT-NLG) recently introduced, is one of the largest transformer language models and is powered by DeepSpeed and Megatron transformer models. Microsoft had earlier partnered with OpenAI acquiring exclusive rights to use GPT 3 for commercial use. There has been an exponential growth in some of the popular language models over the past few years from ELMo (94 M) in 2018 to the Megatron – Turing NLG (530 B) which has recently been made public. The training of Megatron-Turing NLG has in itself been a technological marvel. This was achieved by combining SOTA- GPU – accelerated training infrastructure with a distributed learning stack.

Getting the full potential of these supercomputers to reach the nature of compute required is a problem. ”It requires parallelism across thousands of GPUs, along with efficiency and scalability on both memory and compute.” NVIDIA Megatron – LM and Microsoft DeepSpeed created a very efficient and scalable 3D parallel system. This system was capable of blending data, pipeline and tensor -slicing based parallelism, which could overcome the current technical obstacles. ”The training dataset was based on The Pile and consisted of 825 GB worth of English text corpus.” To diversify the training, Common Crawl (CC) snapshots, RealNews and CC – Stories datasets were also collected.  The researchers thus ended up with a set of 15 datasets consisting of 339 billion tokens.

The MT- NLG platform was evaluated by selecting eight tasks from five different areas of NLP namely; text prediction task, reading comprehension tasks, common sense reasoning tasks, natural language inference and word sense disambiguation task. Microsoft’s DeepSpeed and NVIDIA’s Megatron innovations will benefit existing and future AI models. AI models would become cheaper and faster to train. NVIDIA team said, ”We look forward to how ‘MT-NLG’ will shape tomorrow’s products and motivate the community to push the boundaries of (NLP) natural language processing even further.”


Sanjay Sahay

Leave a Comment

Your email address will not be published. Required fields are marked *

The reCAPTCHA verification period has expired. Please reload the page.

Scroll to Top