The Only *Analog* Deep Learning Processors (And Why Bother?)

From 2007 to 2011, my startup Lyric Semiconductor, Inc., funded by a combination of DARPA and venture capital, created the first (and so far the only) commercial analog deep learning processor. 440,000 analog transistors did the work of a 30,000,000 digital transistors, providing 10x better Joules/Ops power compared to a digital tensor processing core. This analog tensor processing core was designed as a “plug and play” IP block for use within our digital deep learning microchips. (See this post about our overall deep learning processor architecture.)

We published in ACM¹. We also patented early explorations at MIT², the analog computing unit architecture³, analog storage⁴, factor/tensor operations⁵, error reduction of analog processing circuits⁶, and I/O⁷, research on stochastic spiking circuits⁸. We used the terms “factor” and “tensors” interchangeably⁹.

Our startup was acquired by ADI, the largest analog/mixed-signal semiconductor company in the US, and became the new machine learning/AI chipset division¹⁰. Our work also inspired further DARPA work on analog computing¹¹.

There was also significant press coverage: Wired¹², The Register¹³, Reuters¹⁴, The Flash Memory Summit¹⁵, Phys Org¹⁶, The Bulletin¹⁷, Chip Estimate¹⁸, KD Nuggets¹⁹.

The main innovation involved taking advantage of the fact that weights and activations in deep learning models can be represented by (quantized into) 8-bit numbers without causing any issues.

The noise we observed in our circuits, was about 128th of our 1.8V power supply, so we could essentially replace a 7-wire digital bus with a single wire carrying analog current. (In practice we used a differential pair – 2 wires – to represent our analog value more robustly.)

Dropping from seven wires to two wires doesn’t seem like a big enough win to justify the effort of designing analog circuits? Furthermore, the win gets even slimmer if you have decided you can quantize your weights and activations down to two bits. In that case we simply replaced two digital wires with two analog wires! So why bother?

The real win (about 10x in ops per Joule) came from two things:

You get to use fewer transistors in the multiply accumulate. Instead of a few thousand digital transistors needed to multiply two 8-bit numbers, we could use just 6 transistors. 500x fewer! A 2-bit multiply-accumulate would still require around 100 digital transistors. This is still a 16x win in transistor count for the analog version!
Less intense switching provided significant additional advantage. On average our analog wires were not switching between 0 and 1, they were varying between intermediate current values.

In the digital version, switching our wires from 1.8V to 0V and back again dissipates the majority of power in our processor. On any given digital wire, this happens about half of the times that the processor’s clock ticks.

By contrast, in the analog version, because of the statistical distribution of weights and activations around 0, on average the currents in our analog wires did not change values nearly as much.

Would all of this still work in a modern 1nm semiconductor chips? One would have to try it to find out for certain, but it’s fairly likely to still work. If the noise floor in chips have not changed very significantly, then dropping the power supply from 1.8V to 0.5V would still provide us with 5 bits of analog resolution to represent weights and activations. 5-bits should still be adequate for today’s deep learning quantization.

In 2010, a small % of the world’s computing workloads involved deep learning. Today deep learning work loads are becoming a driver for global energy consumption. There is talk of deep learning compute consuming the equivalent of dozens of Manhattans. Even less than a 10x win matters even more today than it did then. In addition, we are increasingly interested in deep learning chips within our portable devices, where again a 5-10x could be the difference between an incredibly smart phone and a smart phone of only middling intelligence.