
AI and the brain: similar in scale, different in design Premium
The Hindu
Explore the parallels and differences between AI architectures and the human brain's design and functionality in processing information.
In recent years, Artificial Intelligence (AI) has undergone a massive growth spurt. However, not so long ago, Large Language Models (LLMs) such as ChatGPT, Gemini, and Claude were curiosities. You could trick them, confuse them, or make them contradict themselves. Today, they have evolved into versatile companions that can write software, assist scientific research, extract insights from a large set of documents, and offer structured guidance across a wide range of domains. Today’s multi-modal AI systems no longer operate on text alone; they interpret images, analyse audio, generate video, and combine these streams in seamless ways. Language, reasoning, and creativity, capacities we associated with ourselves are now appearing, at least on the surface, in machines.
Tracing the foundations of these AI systems, one can observe that the core idea behind them is not new. Artificial neural networks have existed since the late 20th century, and their conceptual roots go back even further. In 1943, Warren McCulloch and Walter Pitts proposed a simple mathematical model of a neuron. The McCulloch–Pitts neuron takes numerical inputs, multiplies them by adjustable weights, sums the results, and applies a non-linear function to produce an output. This is similar to how one takes input from multiple people and makes a decision if enough people agree on a course of action. Individually, such units are extremely simple. Yet a powerful mathematical insight, known as the universal approximation theorem, shows that networks composed of enough of these simple units can approximate virtually any function connecting input to output. With sufficient scale, they can process remarkably complex patterns.
For a long time, that was the limiting factor. Neural networks existed, but the hardware and data required to make them powerful were not available. What changed over the past 15 years was not the invention of neural networks but the availability of enormous computational power and data. Graphics Processing Units (GPUs), originally developed for video games, enabled researchers to train networks with millions and eventually billions of parameters. At the same time, new architectural ideas improved how these networks were organised. Convolutional neural networks proved effective for image recognition by exploiting spatial structure. Recurrent neural networks were designed to handle sequences such as speech and text by allowing information to persist over time. The major breakthrough, however, came with the transformer architecture, which introduced attention mechanisms that allow models to dynamically weigh which parts of their input matter most at any given moment
GPT, short for Generative Pre-trained Transformer, builds on this architecture. It is trained on vast collections of text to predict the next word in a sequence. Although this objective appears simple, when implemented at an enormous scale and trained on extensive datasets, the model begins to capture grammar, facts, stylistic patterns, conceptual relationships, and even fragments of reasoning embedded in language. Intelligence, in this framework, emerges from the statistical regularities underlying the text the model is trained on.
As these systems grow, they are beginning to rival the human brain in sheer numbers. GPT-3 contained 175 billion parameters, while newer models are estimated to reach into the trillions, approaching the roughly 100 trillion synapses in the human brain. Despite this apparent convergence in scale, AI and biological intelligence operate on fundamentally different principles.
To take advantage of modern computing hardware, models such as GPT-3 process information in a strictly feed-forward manner. Input enters the network, flows through stacked layers, and produces an output. Each layer transforms the representation and passes it forward without revisiting earlier computations during the same pass. This design enables efficient training across thousands of GPUs simultaneously and allows rapid scaling.













