*Analog* Deep Learning Processors in 2010

From 2007 to 2011, my startup Lyric Semiconductor, Inc., funded by a combination of DARPA and venture capital, created the first (and so far the only) commercial analog deep learning processor. 440,000 analog transistors did the work of a 30,000,000 digital transistors, providing 10x better Joules/Ops power compared to a digital tensor processing core. This analog tensor processing core was designed as a “plug and play” IP block for use within our digital deep learning microchips. (See this post about our overall deep learning processor architecture.)

We published in ACM¹. We also patented early explorations at MIT², the analog computing unit architecture³, analog storage⁴, factor/tensor operations⁵, error reduction of analog processing circuits⁶, I/O⁷, and research on stochastic spiking circuits⁸. We used the terms “factor” and “tensors” interchangeably⁹.

Our startup was acquired by ADI, the largest analog/mixed-signal semiconductor company in the US, and became the new machine learning/AI chip group. Our work also inspired further DARPA work on analog computing¹⁰.

There was also significant press coverage: Wired¹¹, The Register¹², Reuters¹³, The Flash Memory Summit¹⁴, Phys Org¹⁵, The Bulletin¹⁶, Chip Estimate¹⁷, KD Nuggets¹⁸.

The main innovation involved taking advantage of the fact that weights and activations in deep learning models can be represented by (quantized into) 7-bit numbers without causing problems.

The noise we observed in our circuits, was about 128th of our 1.8V power supply, so we could replace an 8-wire digital bus with a single wire carrying analog current. In practice we used a differential pair – 2 wires – to represent our analog value more robustly. Today’s chips often keep an entire tensor of logits within a reasonable numerical scale, by have a single “exponent” for an entire tensor of logits. We accomplished that same thing by having all of the currents (activations) flowing through our circuit sum into a single tail current that we carefully controlled (and used to compensate for variations in manufacturing process, voltage and temperature — PVT).

Still, dropping from eight wires to two wires may not seem like a big enough win to justify the effort of designing analog tensor processors? Why bother?

The real win (about 10x in ops per Joule) came from two things:

You get to use fewer transistors in the multiply-and-add. The main kind of math that deep learning processors need to do is multiplication and addition. Instead of a few thousand transistors needed to multiply two 8-bit digital numbers, we could use just 6 transistors to multiply two analog numbers. 500x fewer!
Less intense switching is low power. On average our analog wires were not switching between 0 and 1, they were varying between intermediate current values.

In the digital version, switching our wires from 1.8V to 0V and back again dissipates the majority of power in our processor. On any given digital wire, this happens about half of the times that the processor’s clock ticks.

By contrast, in the analog version, because of the statistical distribution of weights and activations around 0, on average the currents in our analog wires did not change as widely nor abruptly.

It’s quite likely that all of this still work in a modern 1nm semiconductor chips.

In 2010, a small percentage of the world’s computing workloads involved deep learning. Today deep learning work loads are becoming a driver for global energy consumption. A 10x efficiency win matters even more today than it did then!

Low power logic for statistical inference, Vigoda, Benjamin and Reynolds, David and Bernstein, Jeffrey and Weber, Theophane and Bradley, Bill, Association for Computing Machinery, 2010. ↩︎
Analog Continuous Time Statistical Processing, Vigoda, Benjamin and Gershenfeld, Neil, Massachusetts Institute of Technology, issued December 28, 2010. US Patent 7,860,687 B2. ↩︎
Belief Propagation Processor, Reynolds, David and Vigoda, Benjamin, Mitsubishi Electric Research Laboratories / Analog Devices, Inc., issued August 5, 2014. US Patent 8,799,346 B2. ↩︎
Storage Devices with Soft Processing, Vigoda, Benjamin and Bernstein, Jeffrey and Venuti, Jeffrey and Alexeyev, Alexander and Nestler, Eric and Reynolds, David and Bradley, William and Zlatkovic, Vladimir, Analog Devices, Inc., US Patent 9,036,420 B2, issued May 19, 2015. ↩︎
Programmable Probability Processing, Bernstein, Jeffrey and Vigoda, Benjamin and Nanda, Kartik and Chaturvedi, Rishi and Hossack, David and Peet, William and Schweitzer, Andrew and Caputo, Timothy, Analog Devices, Inc., issued February 7, 2017. US Patent 9,563,851 B2. ↩︎
Apparatus and Method for Reducing Errors in Analog Circuits While Processing Signals, Vigoda, Benjamin, Mitsubishi Electric Research Laboratories, Inc., issued August 31, 2010. US Patent 7,788,312 B2. ↩︎
Signal Mapping, Vigoda, Benjamin and Bernstein, Jeffrey and Alexeyev, Alexander and Venuti, Jeffrey, Lyric Semiconductor, Inc. / Analog Devices, Inc., published November 4, 2010. US Patent Application 20100281089 A1 (granted as US8,572,144 B2). ↩︎
Mixed Signal Stochastic Belief Propagation, Bernstein, Jeffrey and Vigoda, Benjamin and Reynolds, David and Alexeyev, Alexander and Bradley, William, Analog Devices, Inc., issued July 29, 2014. US Patent 8,792,602 B2. ↩︎
In a factor graph computing belief propagation, we could have, for example, a softAND gate with incident edges A, B, C. Logically, C = AND(A,B), which yields the tensor or “factor” computation p_C = \sum_{A,B,C} \delta(C-AND(A,B)) p_A p_B.
Accelerating Inference: towards a full Language, Compiler and Hardware stack, Hershey, Shawn and Bernstein, Jeffrey and Bradley, Bill and Schweitzer, Andrew and Stein, Noah and Weber, Théophane and Vigoda, Benjamin, NIPS Workshop on Probabilistic Programming, December 12, 2012. (arXiv:1212.2991) ↩︎
Upside, Wired, August 2012. ↩︎
Probabilistic Chip Promises Better Flash Memory, Spam Filtering, Wired, August 2010. ↩︎
DARPA Funds Mr Spock on a Chip, The Register, August 17, 2010. ↩︎
The Odds Are Good That Lyric Semiconductor Will Change Computing, Reuters, August 2010. ↩︎
LDPC Error Correction Using Probability Processing Circuits, Vigoda, Benjamin, Flash Memory Summit, Session 201, August 19, 2010. ↩︎
Computer Chip That Computes Probabilities and Not Logic, Phys.org, August 19, 2010. ↩︎
A Chip That Calculates the Odds, Vance, Ashlee, Bend Bulletin (via New York Times), August 18, 2010. ↩︎
MIT Spin-Out Lyric Semiconductor Launches a New Kind of Computing With Probability Processing Circuits, ChipEstimate, August 2010. ↩︎
A Chip That Digests Data and Calculates the Odds, Vance, Ashlee, New York Times (via KDnuggets), August 17, 2010. ↩︎