RISC-V AI Chips Will Be Everywhere
The adoption of RISC-V, a free and open-source computer instruction set architecture first introduced in 2010, is taking off like a rocket. And much of the fuel for this rocket is coming from demand for AI and machine learning. According to the research firm Semico, the number of chips that include at least some RISC-V technology will grow 73.6 percent per year to 2027, when there will be some 25 billion AI chips produced, accounting for US $291 billion in revenue.
The increase from what was still an upstart idea just a few years ago to today is impressive, but for AI it also represents something of a sea change, says Dave Ditzel, whose company Esperanto Technologies has created the first high-performance RISC-V AI processor intended to compete against powerful GPUs in AI-recommendation systems. According to Ditzel, during the early mania for machine learning and AI, people assumed general-purpose computer architectures—x86 and Arm—would never keep up with GPUs and more purpose-built accelerator architectures.
“We set out to prove all those people wrong,” he says. “RISC-V seemed like an ideal base to solve a lot of the kinds of computation people wanted to do for artificial intelligence.”
With the company’s first silicon—a 1,092-core AI processor—in the hands of a set of early partners and a major development deal with Intel [see sidebar], he might soon be proved right.
Ditzel’s entire career has been defined by the theory behind RISC-V. RISC, as you may know, stands for reduced instruction set computer. It was the idea that you could make a smaller, lower-power but better-performing processor by slimming down the core set of instructions it can execute. IEEE Fellow David Patterson coined the term in a seminal paper in 1980. Ditzel, his student, was the coauthor. Ditzel went on to work on RISC processors at Bell Labs and Sun Microsystems before cofounding Transmeta, which made a low-power processor meant to compete against Intel by translating x86 code for a RISC architecture.
With Esperanto, Ditzel saw RISC-V as a way to accelerate AI with relatively low power consumption. At a basic level, a more complex instruction set architecture means you need more transistors on the silicon to make up the processor, each one leaking a bit of current when off and consuming power when it switches states. “That was what was attractive about RISC-V,” he says. “It had a simple instruction set.”
The Core
The core of RISC-V is a set of just 47 instructions. The actual number of x86 instructions is oddly difficult to enumerate, but it’s likely near 1,000. Arm’s instruction set is thought to be much smaller, but still considerably larger than RISC-V’s. But simply using a slim set of instructions wouldn’t be enough to achieve the computing power Esperanto was aiming for, says Ditzel. “Most of the RISC-V cores out there aren’t that small or that energy efficient. So it’s not just a question of us taking a RISC-V core and slapping 1,000 of them on a chip. We had to completely redesign the CPU so that it would fit into those very tough constraints.”
Notably missing from the RISC-V instruction set at the time Ditzel and his colleagues started work were the “vector” instructions needed to efficiently do the math of machine learning, such as matrix multiplication. So Esperanto engineers came up with their own. As embodied in the architecture of the processor core, the ET-Minion, these included units that do 8-bit integer vectors and both 32- and 16-bit floating-point vectors. There are also units that do more complex “tensor” instructions, and systems related to the efficient movement of data and instructions related to the arrangement of ET-Minion cores on the chip.
The resulting system-on-chip, ET-SoC-1, is made up of 1,088 of the ET-Minion cores along with four cores called ET-Maxions, which help govern the Minions’ work. The chip’s 24 billion transistors take up 570 square millimeters. That puts it at around half the size of the popular AI accelerator Nvidia A100. Those two chips follow very different philosophies.
The ET-SoC-1 was designed to accelerate AI in power-constrained data centers at the heart of boards that fit into the peripheral component interconnect express (PCIe) slot of already installed servers. That meant the board had only 120 watts of power available, but it would have to provide at least 100 trillion operations per second to be worthwhile. Esperanto managed more than 800 trillion in that power envelope.
Most AI accelerators are built around a single chip that uses the bulk of the board’s power budget, Esperanto.ai principal architect Jayesh Iyer told technologists at the RISC-V Summit in December. “Esperanto’s approach is to use multiple low-power chips, which still fits within the power budget,” he said.
Each chip consumes 20 W when performing a recommender-system benchmark neural network—less than one-tenth what the A100 can draw—and there are six on the board. That combination of power and performance was achieved by reducing the chips’ operating voltages without the expected sacrifice in performance. (Generally, a higher operating voltage means you can run the chip’s clock faster and get more computing done.) At 0.75 volts, the nominal voltage for the ET-SoC-1’s manufacturing process, a single chip would blow way past the board’s power budget. But when dropping the voltage to about 0.4 V, you can run six chips from the board’s 120 W and achieve better than a fourfold boost in recommender-system performance over the single higher-voltage chip. At that voltage, each ET-Minion core is consuming only about 10 milliwatts.
“Low voltage operation is the key differentiator for Esperanto’s ET-minion [core] design,” said Iyer. It informed architectural and circuit-level decisions, he said. For instance, the core’s pipeline for the RISC-V integer instructions is made up of the fewest number of logic gates per clock cycle, allowing a higher clock rate at the reduced voltage. And when the core is performing long tensor computations, that pipeline is shut down to save energy.
Other AI Processors
Other recently developed AI processors have also turned to a combination of RISC-V and their own custom machine-learning acceleration. For example, Ceremorphic, which recently came out of stealth with its Hierarchical Learning Processor, uses both a RISC-V and an Arm core along with its own custom machine-learning and floating-point arithmetic units. And Intel’s upcoming Mobileye EyeQ Ultra will have 12 RISC-V cores with its neural-network accelerators in a chip meant to provide the intelligence for Level 4 autonomous driving. [READ MORE]