The main difference between RNN and LSTM is in terms of which one maintain information in the memory for the long period of time. Here LSTM has advantage over RNN as LSTM can handle the information in memory for the long period of time as compare to RNN. But the question is what is different in LSTM than RNN by which LSTMs are capable of maintaining long term temporal dependencies (remembering information for long period of time).
A set of gates is used to control information within memory in general, such as when it enters the memory, how long and how much information may be kept, when it begins to provide output, and when it begins to decay or be forgotten.
Recurrent Neural Networks RNNs
👉 RNNs have feedback loops in the recurrent layer. This lets them maintain information in ‘memory’ over time. But, it can be difficult to train standard RNNs to solve problems that require learning long-term temporal dependencies.
👉 This is because the gradient of the loss function decays exponentially with time (called the vanishing gradient problem).
Long Short-Term Memory LSTM
👉 LSTM networks are a type of RNN that uses special units in addition to standard units. LSTM units include a ‘memory cell’ that can maintain information in memory for long periods of time. This memory cell lets them learn longer-term dependencies.
👉 LSTMs deal with vanishing and exploding gradient problem by introducing new gates, such as input and forget gates, which allow for a better control over the gradient flow and enable better preservation of “long-range dependencies”.
𝐍𝐨𝐭𝐞: The long range dependency in RNN is resolved by increasing the number of repeating layer in LSTM.
For in depth understanding you can read the paper “𝐄𝐦𝐩𝐢𝐫𝐢𝐜𝐚𝐥 𝐄𝐯𝐚𝐥𝐮𝐚𝐭𝐢𝐨𝐧 𝐨𝐟 𝐆𝐚𝐭𝐞𝐝 𝐑𝐞𝐜𝐮𝐫𝐫𝐞𝐧𝐭 𝐍𝐞𝐮𝐫𝐚𝐥 𝐍𝐞𝐭𝐰𝐨𝐫𝐤𝐬 𝐨𝐧 𝐒𝐞𝐪𝐮𝐞𝐧𝐜𝐞 𝐌𝐨𝐝𝐞𝐥𝐢𝐧𝐠”