论文标题
调整Experts混合精液神经网络的混合物
Tuning of Mixture-of-Experts Mixed-Precision Neural Networks
论文作者
论文摘要
深度学习已成为一种有用的数据分析方法,但是在分布式计算机软件中的主流适应性和嵌入式设备到目前为止一直很低。通常,在主流应用程序和设备中增加深度学习推断需要使用适合卷积神经网络的信号处理器的新硬件。这项工作将新的数据类型(量化16位和8位整数,16位浮点)添加到CAFFE中,以节省内存并提高使用OpenCL的现有商品图形处理器的推理速度,并在日常设备中常见。现有模型可以在混合精确模式下轻松执行。此外,我们提出了一种混合物的变化,以提高Alexnet上的推理速度以进行图像分类。我们设法将内存使用率降低到3.29倍,同时将推理速度提高到某些设备上最高3.01倍。我们用五个简单的示例演示了如何将提出的技术轻松应用于不同的机器学习问题。整个管道由模型,示例Python脚本和修改的Caffe库组成,可作为开源软件使用。
Deep learning has become a useful data analysis method, however mainstream adaption in distributed computer software and embedded devices has been low so far. Often, adding deep learning inference in mainstream applications and devices requires new hardware with signal processors suited for convolutional neural networks. This work adds new data types (quantized 16-bit and 8-bit integer, 16-bit floating point) to Caffe in order to save memory and increase inference speed on existing commodity graphics processors with OpenCL, common in everyday devices. Existing models can be executed effortlessly in mixed-precision mode. Additionally, we propose a variation of mixture-of-experts to increase inference speed on AlexNet for image classification. We managed to decrease memory usage up to 3.29x while increasing inference speed up to 3.01x on certain devices. We demonstrate with five simple examples how the presented techniques can easily be applied to different machine learning problems. The whole pipeline, consisting of models, example python scripts and modified Caffe library, is available as Open Source software.