论文标题
Autodnnchip:FPGA和ASIC的自动DNN芯片预测器和构建器
AutoDNNchip: An Automated DNN Chip Predictor and Builder for Both FPGAs and ASICs
论文作者
论文摘要
最近的深神经网络(DNN)的突破促进了对DNN芯片的不断增长。但是,设计DNN芯片是非平凡的,因为:(1)主流DNN具有数百万个参数和操作; (2)由于众多的数据流,处理元素,内存层次结构等的设计选择,较大的设计空间; (3)需要一个算法/硬件共同设计,以允许相同的DNN功能具有不同的分解,这将需要不同的硬件IP来满足应用程序规格。因此,DNN芯片需要很长时间才能设计并需要跨学科专家。为了启用快速有效的DNN芯片设计,我们提出了AutoDnnchip-DNN芯片生成器,可以自动生成来自机器学习框架(例如Pytorch)的DNN的基于FPGA和基于ASIC的DNN芯片实现,用于指定的应用程序和数据集。具体而言,AutoDnnchip由两个积分推动器组成:(1)基于基于图的加速器表示的芯片预测器,该芯片预测变量可以准确,有效地预测DNN加速器的能量,吞吐量,基于DNN模型参数,基于技术IPS,技术IP和平台约束; (2)可以自动探索DNN芯片的设计空间(包括IP选择,块配置,资源平衡等),通过芯片预测器优化芯片设计,然后生成优化的合成RTL以实现目标设计指标。实验结果表明,使用15个DNN型号和4个平台验证时,我们的芯片预测器的预测性能与实验室的性能相差<10%(Edge-FPGA/TPU/GPU和ASIC)。此外,与专家精制的最先进的加速器相比,由我们的自动设计产生的加速器可以取得更好(提高3.86倍)的性能。
Recent breakthroughs in Deep Neural Networks (DNNs) have fueled a growing demand for DNN chips. However, designing DNN chips is non-trivial because: (1) mainstream DNNs have millions of parameters and operations; (2) the large design space due to the numerous design choices of dataflows, processing elements, memory hierarchy, etc.; and (3) an algorithm/hardware co-design is needed to allow the same DNN functionality to have a different decomposition, which would require different hardware IPs to meet the application specifications. Therefore, DNN chips take a long time to design and require cross-disciplinary experts. To enable fast and effective DNN chip design, we propose AutoDNNchip - a DNN chip generator that can automatically generate both FPGA- and ASIC-based DNN chip implementation given DNNs from machine learning frameworks (e.g., PyTorch) for a designated application and dataset. Specifically, AutoDNNchip consists of two integrated enablers: (1) a Chip Predictor, built on top of a graph-based accelerator representation, which can accurately and efficiently predict a DNN accelerator's energy, throughput, and area based on the DNN model parameters, hardware configuration, technology-based IPs, and platform constraints; and (2) a Chip Builder, which can automatically explore the design space of DNN chips (including IP selection, block configuration, resource balancing, etc.), optimize chip design via the Chip Predictor, and then generate optimized synthesizable RTL to achieve the target design metrics. Experimental results show that our Chip Predictor's predicted performance differs from real-measured ones by < 10% when validated using 15 DNN models and 4 platforms (edge-FPGA/TPU/GPU and ASIC). Furthermore, accelerators generated by our AutoDNNchip can achieve better (up to 3.86X improvement) performance than that of expert-crafted state-of-the-art accelerators.