DailyPost 2702

Multimodal Large Language Model has established itself as the tool for future Artificial Intelligence transformation of the world. It is unstoppable is accepted. It would rule the roost till some other super transformative tool hits the world may be in a decade or so and mostly built on the great work done so far in this field, moving in the direction of complete AGI and further to ASI. When you get into this field, you would have to deal with “tokens”, being thrown all around the place. It is the currency which operates the LLMs. But what exactly are these tokens and why do they matter so much when it comes to AI.

In simple terms tokens can be termed as the “building blocks of text used by LLMs,” ChatGPT, GPT-3 et al. It would be easy to visualize these tokens as the “letters” that make up the “words” and “sentences” that AI systems use as their lingua franca. Tokens are both input and output of this great intelligence machine, on the verge of AGI. “Tokens are the segments of text that are fed into and generated by machine learning model.” Though there is inbuilt scientific operational logic to it, it is not watertight in structure. It can be individual characters, whole words, parts of words or it can be bigger lengths of text.

For further clarity, the thumb rule is that one token is equal to 4 characters of common English text. It can be translated to three-fourths of a word (100 token =75 words). Text is broken down into tokens, allowing the AI system analyze and “digest” human language, is called tokenisation. Tokens hence are critical to AI revolution. “Tokens become the data used to train, improve and train AI systems.” OpenAI Tokenizer is a tool of this nature. Operationally and financially how do tokens matter? All LLMs have a token limit, going beyond it can lead to errors, confusion and poor-quality AI response.

Cost wise too, it has an impact. The cost of the services being provided by OpenAI, Anthropic, Alphabet, and Microsoft charge based on token usage. The typical pricing unit is per 1000 tokens. It helps thus in controlling the expenses of the user. Being at the core of prompts, being concise and focused counts, bringing down the chances of overloading the AI with tangents. Long conversations need to be broken into shorter exchanges, not to cross the token limits. Huge blocks of texts could be better avoided. A tokenizer tool would be immense help to keep a count of tokens and estimate costs too. Given the limitations and utility of focus in this exercise, a step-by-step approach is beneficial.


Have a nice evening.

Leave a Comment

Your email address will not be published. Required fields are marked *

The reCAPTCHA verification period has expired. Please reload the page.

Scroll to Top