InstructGPT – Sanjay Sahay

DailyPost 1963
InstructGPT

The pace of natural language processing is really exponential and newer models are appearing every few months and that too way better than the last one. At every stage, their model of learning adds a new technique or the training model picked up is increasingly humungous or both. GPT3 was an important milestone in the AI history and a major breakthrough in the models getting closer to the human understanding of things, what AI is all about. OpenAI has developed a new version of GPT3. GPT3 as we all know has been a game changing model. OpenAI says it has been able to do away with some of the most toxic issues that had plagued it, in its just released new avatar. The updated model is called InstructGPT.

InstructGPT is better at following the instructions of the people using it. This element is known as alignment in AI jargon. This single change has led to momentous results. The AI model now produces less offensive language, less information and fewer mistakes overall. Trained on the large amount of internet data, its predecessor GPT3, was in some form of jungle grazing, whatever came its way. This is the problem for today’s chatbots and text generation tools. While learning from the internet, these models as expected soak up toxic language, prejudices, going all the way to falsehoods.

“Given this major limitation of GPT3, *OpenAI has made InstructGPT the default model for users of its application programming interface (API) – a service that gives access to the company’s language models for a fee.” GPT3 is still available but not being recommended by OpenAI. This is the pace at which AI models in the NLP space are losing their relevance, after having contributed immensely to its progress. What is really heartening is that the alignment techniques are being used for the first time in a real product. The starting point has been a fully trained GPT3 model. Another round of training was added to it, using reinforcement learning. This was a method used to teach the model what it should say and when. This would be used on the preference of human users.

Research learning and iteration has been the key to this inflection point of the AI developmental story. In a recent exercise 40 people were hired to score GPT3’s responses, if it was more in line with the apparent intention of the prompt writer, it was scored higher. This feedback was then used as a reward in a reinforcement learning algorithm that trained Instruct GPT to match responses to prompt in ways that the judges preferred.” The final finding of OpenAI is that users of its API favoured InstructGPT over GPT3 more than 70% of the time. Now the grammatical errors are not being found. It is already being put to real life applications; Yabble is using it to create natural language summaries of its clients’ business data. There is a clear cut improvement in the new model’s ability to understand the natural language and also to follow instructions. Reaching the human levels is still a long way off.

NATURAL LANGUAGE PROCESSING IS THE NEW FRONTIER BEING SURMOUNTED AT A BREAKNECK SPEED.

Sanjay Sahay

Leave a Comment Cancel Reply