Literature Reports
Report 1
Assignment: Write a 1 paragraph summary of a hardware architecture based paper you find. Please ensure that the paper is hardware architecture based, not software architecture based.
A Configurable Cloud-scale DNN Processor for Real-Time AI
Jeremy Fowers, Kalin Ovtcharov, Michael Papamichael, Todd Massengill, Ming Liu, Daniel Lo, Shlomi Alkalay, Michael Haselman, Logan Adams, Mahdi Ghandi, Stephen Heil, Prerak Patel, Adam Sapek, Gabriel Weisz, Lisa Woods, Sitaram Lanka, Steven K. Reinhardt, Adrian M. Caulfield, Eric S. Chung, and Doug Burger. 2018. A configurable cloud-scale DNN processor for real-time AI. In Proceedings of the 45th Annual International Symposium on Computer Architecture (ISCA '18). IEEE Press, Piscataway, NJ, USA, 1-14. DOI: https://doi.org/10.1109/ISCA.2018.00012
This paper presents the architecture for development for a new processor type (which they describe as a NPU or Neural Processing Model). They are seeking to design these because of the unique requirements and high levels of parallelism make it costly to run Deep Neural Networks (DNN) on general purpose machinery. They propose a system designed to minimize latency. They also take advantage of parallelism to have deep pipelines which increase throughput despite having a single threaded machine. They also have local (on chip) memories in order to lower the latency. They also detail data types and design for the matrix-vector multiplier, which is their primary logic unit, as well as several other microarchitecture designs intended to maximize efficiency. The models they constructed showed higher utilization and lower latency (10-90x) when compared to a general purpose GPU. The paper also states that there is still room for growth in performance, and the ideal balance is still not known.