
Researchers warn we could run out of data to train AI by 2026. What then? Premium
The Hindu
Researchers have warned the industry might be running out of training data – the fuel that runs powerful AI systems.
As artificial intelligence (AI) reaches the peak of its popularity, researchers have warned the industry might be running out of training data – the fuel that runs powerful AI systems. This could slow down the growth of AI models, especially large language models, and may even alter the trajectory of the AI revolution.
But why is a potential lack of data an issue, considering how much there are on the web? And is there a way to address the risk?
We need a lot of data to train powerful, accurate and high-quality AI algorithms. For instance, ChatGPT was trained on 570 gigabytes of text data, or about 300 billion words.
Similarly, the stable diffusion algorithm (which is behind many AI image-generating apps such as DALL-E, Lensa and Midjourney) was trained on the LIAON-5B dataset comprising of 5.8 billion image-text pairs. If an algorithm is trained on an insufficient amount of data, it will produce inaccurate or low-quality outputs.
The quality of the training data is also important. Low-quality data such as social media posts or blurry photographs are easy to source, but aren’t sufficient to train high-performing AI models.
Text taken from social media platforms might be biased or prejudiced, or may include disinformation or illegal content which could be replicated by the model. For example, when Microsoft tried to train its AI bot using Twitter content, it learned to produce racist and misogynistic outputs.
This is why AI developers seek out high-quality content such as text from books, online articles, scientific papers, Wikipedia, and certain filtered web content. The Google Assistant was trained on 11,000 romance novels taken from self-publishing site Smashwords to make it more conversational.

“Through several targeted attacks against the minorities in the name of religion, the BJP and the Sangh Parivar organisations are on a mission to fragment the State into religious segments,” said Viduthalai Chiruthaigal Katchi (VCK) leader and MP Thol. Thirumavalavan. He headed the protest organised by VCK here on Monday against the BJP and Sangh Parivar organisation for inciting violence based on religion. Speaking there, he said, “The RSS’s plan is specifically to turn Hindus into paupers and the Sangh Parivar organisations through intimidating the minorities, have been trying to incite communal frenzy in the State.”












