An active topic of research right now in hardware engineering is how we can build more efficient and effective hardware for machine learning. One of the big problems with digital hardware is that multipliers have a huge cost compared to adders,1 and that machine learning relies heavily on multiply-and-accumulate operations.

Machine learning software has massively improved, but hardware hasn’t to the same extent. GPUs are used heavily (which is why Nvidia is so dominant), but there’s research in using FPGAs. Machine learning is very computationally intensive — hardware needs to improve by rethinking things instead of just throwing more money and more existing hardware at the problem (like doubling data centre sizes).

An idea floating around: we can still get decent accuracy with a less precise number of bits.

Prof Anderson mentioned: since ML largely consists of adding, multiplying, etc., people found they could use GPUs to perform these large computations in parallel. FPGAs are also used for speeding up machine learning.

Resources

  • ECE5545 — Machine Learning Hardware and Systems, recorded lectures from its delivery at Cornell Tech

Footnotes

  1. ”Looks like I’m screwed.” - Prof Anderson, on multiplier circuits