Research on Parallel Acceleration of Chebyshev Approximation based on Neural Network
Download as PDF
Wang Xiangchun, Chen Zijian, Wang Bingchao
It is difficult to deploy and implement limited neural network computing (Deep Neural Network, DNN) on embedded GPU with limited resources and solving ability. The neural network needs parallel accelerated design because of its huge amount of data and high computational complexity of the convolution layer. In this paper, Chebyshev polynomial is used to approximate the convolution kernel, and the optimization scheme is applied to the deep neural network for image reconstruction to realize the parallel processing of convolution operation and reduce the computational complexity. Then the parallel accelerated design based on GPU is carried out for the optimized network convolution layer. Finally, the whole network is transplanted to the NVIDIA AGX Xavier embedded development board to realize the reasoning process of image reconstruction. The experimental results show that the reasoning speed of the parallel accelerated network reconstruction is 2.2 times faster than that of the original network.
Deep neural network, Image reconstruction, parallel computing, Embedded GPU, Chebyshev approaching