Recurrent Neural Networks I


Recurrent neural networks are very famous recently, they play as important roles as convolutional neural networks. RNNs can use their internal memory to process arbitrary sequences of inputs, so not only images, they work well on speech recognition and natural language processing tasks.

There are several type of RNNs, as the beginning, we focus our attention on only Elman-type RNNs (similar with Jordan type, they are both the simplest type of RNNs) in this post, and I’ll introduce and implement other advanced types of RNNs in the future parts of this RNNs series.




1. Structure

By using BPTT (Back-Propagation Through Time) method, we can unfold the networks to be some structure quite similar with regular feed forward network, say we have a recurrent neural network which time delay is 3 and has n hidden layers, then after applying BPTT method, we can get the following structure.


For each hidden layer, it shares a unique weight W (weight between two time slot) and weight U (weight between hidden layers). With this unfolded version of network, it’s easy to train all these parameters using back-propagation method.

2. Forward pass

For the first hidden layer:


in which, f represents non-linear function, such as sigmoid function, tanh, or ReLU function. And for other hidden layers:


For the output layer:


in which, g represents output non-lineariry, such as softmax.

For the first time slot, we can suppose there are zeros before them as the previous S values.

3. Backward pass

For backward pass:



in which, a represents the output of each hidden layer before applying non-linear function in forward pass.

Code and Test

I implemented it using C++ and OpenCV.

I used the same toy dataset to test my RNNs as I used in HERE.

Before feeding data into networks, I used the simplest encoding method, I just change each word into 1-of-N code (which has 1237 dimensions in the given toy dataset), and by using the github config, it got 0.986743 of training accuracy and 0.951342 of test accuracy within 43 minutes using my 2015 Macbook Air. I’m sure it will work better by using advanced encoding methods.

This will be a series of post about RNNs, I’ll try to introduce Bi-Directionality and Long Short Term Memory into it 🙂


This entry was posted in Machine Learning, NLP and tagged , , , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.


  1. sebastian schwank
    Posted May 31, 2015 at 5:21 pm | Permalink

    Is it possible to revert the Processes in an RNN so that you get the inverted trained process ?
    So insted of analyzing words the computer could write literatur ?

    • sebastian schwank
      Posted May 31, 2015 at 5:22 pm | Permalink

      … or paint pictures ?

      • Posted August 25, 2015 at 5:46 am | Permalink

        You can look ay deep belief networks for that. So they can produce the points which trained them. for e.g. training on recognising handwritten digits, you can also build a deep belief network which write a handwritten digit.

  2. sebastian schwank
    Posted May 31, 2015 at 5:35 pm | Permalink

    We imagine an RNN with Spellchecking purposes and an RNN with sensechecking purposes and an RNN with serveral “random” Parameters like popularity.

    could we combine the RNN’s like this there*s a minimal error to the given Prarameters for generating specific sentences ?

  3. Ahmed Ramzy
    Posted October 16, 2015 at 7:33 pm | Permalink

    How Far would RNN work with words detection in image processing ?
    Ex. given 100 name each got 10 samples for it and then wants to recognize which word by giving it a certain picture for a word .

One Trackback

Post a Comment

Your email is never published nor shared. Required fields are marked *

You may use these HTML tags and attributes <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>