In my previous post, I introduced the basic ideas of Recurrent Neural Networks, as the 2nd post of RNNs, we’ll focus on long short-term memory method.

### LONG SHORT TERM MEMORY

One of the very famous problems of RNNs is the vanishing gradient, the problem is that the influence of a given input on the hidden layer, and therefore on the network output, either decays or blows up exponentially as it cycles around the network’s recurrent connections.

The shading of the nodes in the unfolded network indicates their sensitivity to the inputs at time one (the darker the shade, the greater the sensitivity). The sensitivity decays over time as new inputs overwrite the activations of the hidden layer, and the network ‘forgets’ the first inputs.

Long Short-Term Memory deals with this kind of problem, it basically are recurrent networks made of memory blocks. Each block contains one or more self-connected memory cells and three multiplicative units—the input, output and forget gates—that provide continuous analogues of write, read and reset operations for the cells.

The above figure shows what’s inside a LSTM block, in which, black arrows represent full matrices multiplications, dashed arrows represent weighted peephole connections (using diagonal matrices), ** f, g, h** are non-linearity functions. The multiplicative gates allow LSTM memory cells to store and access information over long period of time, thereby mitigating the vanishing gradient problem. For example, as long as the input gate remains closed (has an activation near 0), the activation of the cell will not be overwritten by the new inputs arriving in the network, and can therefore be made available to the net much later in the sequence, by opening the output gate.

### METHODOLOGY

The structure of my LSTM networks is similar with the structure we used in **THE SIMPLE RNN**, except we’re using Jordan-type RNNs this time, which means we use not only the last output for gradient calculating, but also all the former outputs.

#### 1. Notation

** W** represents weight between two time slot (horizontal),

*represents weight between hidden layers (vertical),*

**U****represents peephole weights (diagonal).**

*V**represent input, forget, output, cell.*

**i, f, o, c**** a^t** represents input of gate at time

*,*

**t***,*

**i^t***,*

**f^t***represent activation at time*

**o^t***.*

**t***represents derivative,*

**delta***represents output,*

**h^t****represents output derivative,**

*epsilon_h**represents state derivative of cell, more details can be found in the following equations.*

**epsilon_s**#### 2. Forward Pass

in which, * prev* means the previous layer output at time

*. Asterisk represents element-wise multiplication.*

**t****represent non-linearity function.**

*sigma, g, h*#### 3. Backward Pass

### SOURCE CODE

**https://github.com/xingdi-eric-yuan/recurrent-net-lstm**

It is a C++ implementation of LSTM RNNs, using OpenCV as the linear algebra library.

It works though it is still buggy, it can get **98.5844%** of training accuracy and **95.3615%** of test accuracy after 25 epoch of training on the toy dataset. I’ll try to fix the bugs ASAP and upgrade this net to bi-directional version. If you find any bug, please let me know :p

### REFERENCES

Alex Graves. Supervised Sequence Labelling with Recurrent Neural Networks

## 4 Comments

Toy Soldiers: War Chest – Hall of Fame …

Dear,friends!

It is greate for your sharing.I have try to run your programe,but there maybe are some bugs in it,when i change the value of the MOMENTUM or the hidden layer num from 2 to 3 or the hidden units from 512 to 128,the programe will run into end of too big cost,and i see some Notes in your codes, it seems that you have Foreseen the bugs,So what is problem,thanks very much!

This is, mostly, copy and paste from Alex Graves PhD Thesis. See here http://www.cs.toronto.edu/~graves/phd.pdf page 33, 34, 38, 39

No reference. And you don’t even mention him.

Hi Mr T,

I didn’t realize how serious this issue is when I was writing these posts, but yes as you said, I’ll add reference to all the posts. Thanks for the comment.

Eric

## 2 Trackbacks

[…] image is from this page on Eric Yuan’s blog which has a great explanation of LSTMs, as does this page of Chris Olah’s […]

[…] from Eric Yuan’s blog, A recurrent neural network operating on a sequence of inputs, the shade shows the sensitivity of […]