In 1958 Jack Kilby at Texas Instruments engineered a silicon chip with a single transistor. By 1965 Fairchild Semiconductor had learned how to make a piece of silicon with 50 of the things. As Gordon Moore, one of Fairchild’s founders, observed that year, the number of transistors that could fit on a piece of silicon was doubling on a more or less annual basis.
In 2023 Apple released the iPhone 15 Pro, powered by the a17 bionic chip, with 19bn transistors. The density of transistors has doubled 34 times over 56 years. That exponential progress, loosely referred to as Moore’s law, has been one of the engines of the computing revolution. As transistors became smaller they got cheaper (more on a chip) and faster, allowing all the hand-held supercomputing wonders of today. But the sheer number of numbers that AI programs need to crunch has been stretching Moore’s law to its limits.
The neural networks found in almost all modern AI need to be trained in order to ascertain the right “weights” to give their billions, sometimes trillions, of internal connections. These weights are stored in the form of matrices, and training the model involves manipulating those matrices, using maths.Two matrices—sets of numbers arrayed in rows and columns—are used to generate a third such set; each number in that third set is produced by multiplying together all the numbers in a row in the first set with all those in a column of the second and then adding them all up. When the matrices are large, with thousands or tens of thousands of rows and columns, and need to be multiplied again and again as training goes on, the number of times individual numbers have to be multiplied and added together becomes huge.
The training of neural nets, though, is not the only objective that requires lightning-fast matrix multiplication. So does the production of high-quality video images that make computer games fun to play: and 25 years ago that was a far larger market. To serve it Nvidia, a chipmaker, pioneered the design of a new sort of chip, the graphics-processing unit (GPU), on which transistors were laid out and connected in a way that let them do lots of matrix multiplications at once. When applied to AI, this was not their only advantage over the central processing units (CPUs) used for most applications: they allowed larger batches of training data to be used. They also ate up a lot less energy.
Training AlexNet, the model which ushered in the age of “deep learning” in 2012, meant assigning weights to 60m internal connections. That required 4.7 x 1017floating-point operations (flop); each flop is broadly equivalent to adding or multiplying two numbers. Until then, that much computation would have been out of the question. Even in 2012, using the best CPUs would not just have required a lot more time and energy but also simplifying the design. The system that trained AlexNet did all its phenomenal FLOPping with just two GPUs.
A recent report from Georgetown University’s Centre for Emerging Technology says GPUs remain 10-100 times more cost-efficient and up to 1,000 times faster than CPUs when used for training models. Their availability was what made the deep-learning boom possible. Large language models (LLMs), though, have pushed the demand for calculation even further.
In 2018 Alec Radford, a researcher at OpenAI, developed a generative pre-trained transformer, or GPT, using the “transformer” approach described by researchers at Google the year before. He and his colleagues found the model’s ability to predict the next word in a sentence could reliably be improved by adding training data or computing power. Getting better at predicting the next word in a sentence is no guarantee a model will get better at real-world tasks. But so far the trend embodied in those “scaling laws” has held up.
As a result LLMs have grown larger. Epoch AI, a research outfit, estimates that training GPT-4 in 2022 required 2 x 1025 flop, 40m times as many as were used for AlexNet a decade earlier, and cost about $100m. Gemini-Ultra, Google’s most powerful model, released in 2024, is reported to have cost twice as much; Epoch AI reckons it may have required 5 x 1025 flop. These totals are incomprehensibly big, comparable to all the stars in all the galaxies of the observable universe, or the drops of water in the Pacific Ocean.
In the past the solution to excessive needs for computation has been a modicum of patience. Wait a few years and Moore’s law will provide by putting even more, even faster transistors onto every chip. But Moore’s law has run out of steam. With individual transistors now just tens of nanometres (billions of a metre) wide, it is harder to provide regular jumps in performance. Chipmakers are still working to make transistors smaller, and are even stacking them up vertically to squeeze more of them onto chips. But the era in which performance increased steadily, while power consumption fell, is over.
As Moore’s law has slowed down and the desire to build ever-bigger models has taken off, the answer has been not faster chips but simply more chips. Insiders suggest GPT-4 was trained on 25,000 of Nvidia’s a100 GPUs, clustered together to reduce the loss of time and energy that occurs when moving data between chips.
Much of the $200bn that Alphabet, Amazon, Meta and Microsoft plan to invest in 2024 will go on AI-related stuff, up 45% from last year; much of that will be spent on such clusters. Microsoft and Openai are reportedly planning a $100bn cluster in Wisconsin called Stargate. Some in Silicon Valley talk of a $1trn cluster within the decade. Such infrastructure needs a lot of energy. In March Amazon bought a data centre next door to a nuclear power plant that can supply it with a gigawatt of power.
The investment does not all go on GPUs and the power they draw. Once a model is trained, it has to be used. Putting a query to an AI system typically requires roughly the square root of the amount of computing used to train it. But that can still be a lot of calculation. For GPT-3, which required 3 x 1023 flop to train, a typical “inference” can take 3 x 1011 flop. Chips known as FPGAs and ASICs, tailored for inference, can help make running AI models more efficient than using GPUs.
Nevertheless, it is Nvidia that has done best out of the boom. The company is now worth $2.8trn, eight times more than when Chatgpt was launched in 2022. Its dominant position does not only rest on its accumulated know-how in GPU-making and its ability to mobilise lots of capital (Jensen Huang, its boss, says Nvidia’s latest chips, called Blackwell, cost $10bn to develop). The company also benefits from owning the software framework used to program its chips, called CUDA, which is something like the industry standard. And it has a dominant position in the networking equipment used to tie the chips together.
Competitors claim to see some weaknesses. Rodrigo Liang of SambaNova Systems, another chip firm, says that Nvidia’s postage-stamp-size chips have several disadvantages which can be traced back to their original uses in gaming. A particularly big one is their limited capacity for moving data on and off (as an entire model will not fit on one GPU).
Cerebras, another competitor, markets a “wafer scale” processor that is 21.5cm across. Where GPUs now contain tens of thousands of separate “cores” running calculations at the same time, this behemoth has almost a million. Among the advantages the company claims is that, calculation-for-calculation, it uses only half as much energy as Nvidia’s best chip. Google has devised its own easily customised “tensor-processing unit” (TPU) which can be used for both training and inference. Its Gemini 1.5 AI model is able to ingest eight times as much data at a time as GPT-4, partly because of that bespoke silicon.
The huge and growing value of cutting-edge GPUs has been seized on for geopolitical leverage. Though the chip industry is global, a small number of significant choke-points control access to its AI-enabling heights. Nvidia’s chips are designed in America. The world’s most advanced lithography machines, which etch designs into silicon through which electrons flow, are all made by ASML, a Dutch firm worth $350bn. Only leading-edge foundries like Taiwan’s TSMC, a firm worth around $800bn, and America’s Intel have access to this tool. And for many other smaller items of equipment the pattern continues, with Japan being the other main country in the mix.
These choke-points have made it possible for the American government to enact harsh and effective controls on the export of advanced chips to China. As a result the Chinese are investing hundreds of billions of dollars to create their own chip supply chain. Most analysts believe China is still years behind in this quest, but because of big investments by companies such as Huawei, it has coped with export controls much better than America expected.
America is investing, too. TSMC, seen as a potential prize or casualty if China decided to invade Taiwan, is spending about $65bn on fabs in Arizona, with about $6.6bn in subsidies. Other countries, from India ($10bn) to Germany ($16bn) to Japan ($26bn) are increasing their own investments. The days in which acquiring AI chips has been one of AI’s biggest limiting factors may be numbered.
© 2024, The Economist Newspaper Ltd. All rights reserved. From The Economist, published under licence. The original content can be found on www.economist.com