Optimization of Gradient Vanishing Problem in Deep Neural Networks Based on Attention Mechanism

Wu Nannan

Optimization of Gradient Vanishing Problem in Deep Neural Networks Based on Attention Mechanism

Download as PDF

DOI: 10.25236/icceme.2025.002

Author(s)

Wu Nannan

Corresponding Author

Wu Nannan

Abstract

With the wide application of Deep Neural Network (DNN) in many fields, the problem of gradient vanishing has become a key obstacle to its performance improvement. This article focuses on how to optimize DNN by attention mechanism to solve the problem of gradient vanishing. In this study, the theoretical basis of DNN, gradient vanishing and attention mechanism is analyzed, and an optimization algorithm model based on attention mechanism is constructed. The model skillfully integrates the attention mechanism module in the hidden layer, and expounds the model architecture design and algorithm flow in detail. The experimental results show that compared with the traditional DNN model on the self-built image data set, the optimization model based on attention mechanism effectively alleviates the problem of gradient vanishing, and significantly improves the performance indexes such as gradient stability, accuracy, recall and F1 value, and has a faster classification speed. This shows that introducing attention mechanism into DNN can effectively optimize gradient propagation and provide an effective way to solve the problem of gradient vanishing.

Keywords

Deep Neural Network; Gradient Vanishing; Attention Mechanism; Optimization Algorithm