Cerebras’ New Monster AI Chip Adds 1.4 Trillion Transistors

Articles

Shift to 7-nanometer process boosts the second-generation chip’s transistor count to a mind boggling 2.6-trillion

Almost from the moment Cerebras Systems announced a computer based on the largest single computer chip ever built, the Silicon Valley startup declared its intentions to build an even heftier processor. Today, the company announced that its next-gen chip, the Wafer Scale Engine 2 (WSE 2), will be available in the 3^rd quarter of this year. WSE 2 is just as big physically as its predecessor, but it has enormously increased amounts of, well, everything. The goal is to keep ahead of the ever-increasing size of neural networks used in machine learning.

“In AI compute, big chips are king, as they process information more quickly, producing answers in less time—and time is the enemy of progress in AI,” Dhiraj Malik, vice president of hardware engineering said in a statement.

Cerebras has always been about taking a logical solution to the problem of machine learning to the extreme. Training neural networks takes too long—weeks for the big ones when Andrew Feldman cofounded the company in 2015. The biggest bottleneck was that data had to shuttle back and forth between the processor and external DRAM memory, eating up both time and energy. The inventors of the original Wafer Scale Engine figured that the answer was to make the chip big enough to hold all the data it needed right alongside its AI processor cores. With gigantic networks for natural language processing, image recognition, and other tasks on the horizon, you’d need a really big chip. How big? As big as possible, meaning the size of an entire wafer of silicon (with the round bits cut off), or 46,225 square millimeters.

That wafer size is one of the only stats that hasn’t changed from the WSE to the new version WSE 2 as you can see in the table here. (For comparison to an more conventional AI processor, Cerebras uses Nvidia’s AI-chart topping A100.):

	WSE 2	WSE	Nvidia A100
Size	46,255 mm²	46,255 mm²	826 mm²
Transistors	2.6 trillion	1.2 trillion	54.2 billion
Cores	850,000	400,000	7,344
On-chip memory	40 gigabytes	18 GB	40 megabytes
Memory bandwidth	20 petabytes/s	9 PB/s	155 GB/s
Fabric bandwidth	220 petabits/s	100 Pb/s	600 gigabytes/s
Fabrication process	7 nm	16 nm	7 nm

What Made It Happen?

The most obvious and consequential driver is the move from TSMC’s 16-nanometer manufacturing process—which was more than five years old by the time WSE came out—to the megafoundry’s 7-nm process, leapfrogging the 10 nanometer process. A jump like that basically doubles transistor density. The process change should also result in about 40 percent speed improvement and a 60 percent reduction in power, according to TSMC’s description of its technologies.

“There are always physical design challenges when you change nodes,” says Feldman. “All sorts of things are geometry dependent. Those were really hard, but we had an extraordinary partner in TSMC.”

The move to 7-nm alone would spell a big improvement, but according to Feldman, the company has also made improvements to the microarchitecture of its AI cores. He wouldn’t go into details, but says that after more than a year working with customers, Cerebras has learned some lessons and incorporated them into the new cores. [READ MORE]