DailyPost 2104

However, much as we would like it to be different and predictable, the technology world is turning topsy turvy quite instantaneously, Covid being the most recent and comprehensive example. To be sure of technology’s trajectory has gone beyond us, if we are able to fall on its bandwagon, as the first adopters, would itself bring great success to individuals, companies, organizations and countries alike. Until now non-real-world data was fake data, synthetic data was unheard of. Today, it has gained currency for a variety of reasons and most of them are extremely valid ones. The pitfalls of real data are coming out in the open in lots of areas. The real-world status quo, we may not like to replicate in algorithms, which are not supposed to carry that burden. We intend to move to a more objective and fairer world.

Understanding synthetic without the context and the real-world data issues faced so far, would be extremely difficult. Going only by theoretical constructs to solve real life problems does not yield fruitful results. Ways and means have to be found out. Synthetic data fills precisely this gap. The difference between synthetic and fake has to be understood first. Synthetic data / images are created to make the sample real, a first step in the world of machine learning and artificial intelligence. As per recent reports tech firms are turning to synthetic images to train AI to be more fair. And this trend has been growing. Gartner has done a study on this trend and estimates that 60% of all data used to train AI will be synthetic by 2024 and would overshadow real data for AI training by 2030. This would have been something unimaginable even a few years back. The talk of synthetic data itself is very recent.

Microsoft Corp plans to stop selling software guessing a person’s moods as it could be discriminatory. Computer vision software, used in self-driving cars and facial recognition has long had issues with errors primarily pertaining to women and people of colour. The novel method of handling this problem now is, training AI on synthetic images to make it less biased. It is like training pilots on simulators rather than in unpredictable real world conditions. It can provide a wide diversity of scenario modelling. They spent hundreds of hours using flight simulators designed to cover a broad array of different scenarios they could experience in the air. Aviation history now confirms this approach. AI can work on carefully labelled data to work properly.

Big data out of real-world data can be invasive, time-consuming and neglectful of large swathes of population. Synthetic images on a broad array of people, will not only make software more trustworthy, but also completely transform the economics of data. Data being the new oil may take an economic hit. Efforts by Simi Lindgren of getting 70,000 licensed faces from a database wasn’t diverse or inclusive enough. 1000 crowd sourced faces also did not serve the purpose. She decided to create her own data to plug the gap using GANs, general adversarial networks. The end result was ”a balanced dataset of diverse people, with diverse skin tones and diverse concerns.” Today 80% of her face database is not of real people. Synthetic data as a service has already become a business opportunity. Synthetic medical histories are being worked upon. Facebook acquired a synthetic data start-up A.I.Reverie last October. Synthetic data is the future.

Sanjay Sahay

Leave a Comment

Your email address will not be published. Required fields are marked *

The reCAPTCHA verification period has expired. Please reload the page.

Scroll to Top