Convolutional Neural Networks II

Since the last CNN post, I was working on a new version of CNN, which support multi-layers Conv and Pooling process, I’d like to share some experience here.


You can see in the last post, I used vector of Mat in convolution steps, it works well when we only have one convolution layer, which means for each input image, we can get 1 * KernelAmount of images after the Conv and Pooling layer (the Pooling operation doesn’t change the amount of images). For easily retrieve these “conved images”, I generate one vector of Mat for each input image.

However, when we have more than one layer of Conv and Pooling layers, using vector became disaster, say we have 3 Conv layers, the kernel amount of these layers are 4, 6, 8, so after processing, we got 1 * 4 * 6 * 8 “conved images”, this seems fine, but when doing backprop, this made me feel like in hell when I tried it.

What about using vector of vector? Good point, if you like this:

<vector<vecotr<vector<vector<vector<vector<Mat> > > > > > >

What I’ve done in this version, was using Hashmap (unordered_map in C++), even though I never thought it is best idea. I used a string as key, and the corresponding Mat as value. These example show how I define the key:

X234C0K2PC1K4 means this is a matrix which is the 234th input image convolved by kernel 2 in 0th Conv layer, and after Pooling, convolved by kernel 4 in 1st Conv layer.

X22C0K0PD, the “X22C0K0P” means this is a matrix which is the 22nd input image convolved by kernel 0 in 0th Conv layer, and pooled. And by adding a ‘D’, means this is a matrix generated during backprop process, last ‘D’ means this is the corresponding delta matrix.

X22C0K0PUD, is the result of the above matrix after doing unPooling.


  • Easy to debug, can simply get any matrix inside the whole processing.
  • Easy to know what’s going on, especially for someone who actually doesn’t fully understand the Architecture of ConvNet.
  • Fast to access data (can access data in O(1) of time) and all advantages of Hashtable.


  • String operating is boring, it feels like you are doing some LeetCode problems.
  • Memory things and all other disadvantages of Hashtable

I said I never thought this is the best data structure for ConvNet, but I found it’s a good one for newbies. If you have better idea about this part, please let me know.



I implemented a version 3.0 recently, check it here.

enjoy it 🙂

This entry was posted in Algorithm, Machine Learning, OpenCV and tagged , , , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.


  1. Min
    Posted April 14, 2014 at 1:21 am | Permalink

    good job!

  2. Alex
    Posted April 25, 2014 at 4:32 am | Permalink

    nice work!!

  3. Alex
    Posted May 6, 2014 at 12:03 am | Permalink

    hi, if the input is color-image, how to design the filters and the whole net?

  4. Phuc
    Posted May 30, 2014 at 12:09 am | Permalink

    I have an error on
    line 1147: start = clock();
    line 1188: end = clock()
    line 1189:cout << "Totally used time: " << ((double)(end – start)) / CLOCKS_PER_SEC << " second" << endl;
    How to fix this ? Can you help me ?

    • Eric
      Posted June 8, 2014 at 3:18 pm | Permalink

      Hey Phuc,
      You can delete the time counter, or, try to include “time.h”. 🙂

  5. Yao
    Posted June 12, 2014 at 11:01 am | Permalink

    Hi, I’m wondering how long did it take for training on MNIST ?

    • Eric
      Posted June 17, 2014 at 9:51 pm | Permalink

      Hey Yao,

      It depends on how many layers you choose, I tried 3 Conv layers with pooling, and 2 full connected layers, it took hours…

    • Eric
      Posted June 17, 2014 at 9:52 pm | Permalink

      Besides, it also depends how many units per layer, u know…

  6. teddy
    Posted June 27, 2014 at 4:42 am | Permalink

    Don’t you have any plan to convert these code to android?
    I think it’s very useful in mobile too. Could you please convert this to android?

    • Eric
      Posted June 27, 2014 at 2:04 pm | Permalink

      Sorry actually I don’t know much about how to convert code to android, however, first, I’m not sure whether an android-device is powerful enough to run deep neural networks; second, there is OpenCV for Android SDK, you can check if you want.

  7. teddy
    Posted June 30, 2014 at 9:46 pm | Permalink

    Thanks for your reply.
    I’ve converted your code into android in anyway. It works fine even though it works slowly in device.
    And I want to save trained weights. Is that the same as version 1 using saveWeight() in this codes?
    And how to load the trained data and classify test images using the trained data?

    • Eric
      Posted June 30, 2014 at 10:13 pm | Permalink

      Hey Teddy,

      You already converted into Android version? Well done!
      Yes you can use the “saveWeight()” for saving a cv::Mat matrix into .txt file, however I’m not sure whether you should do this in Android.
      Use the right Android-style saving method, save all the trained matrix (including Convolution layers and full connected layers), and all parameters, such as lambda, amount of units in hidden layer and so on. When testing, just load these things in, and do what “resultProdict()” function does.

      • teddy
        Posted July 1, 2014 at 12:26 am | Permalink

        Thanks for reply!!
        I’ll try what you said like!! Good luck!

  8. teddy
    Posted July 1, 2014 at 4:20 am | Permalink

    i have a question in your code.
    In resultProdict()….. there are for loops like this.

    for(int i=0; i<10; i++){
    double maxele = tmp.ATD(0, i);
    int which = 0;
    for(int j=1; j maxele){
    maxele = tmp.ATD(j, i);
    which = j;
    result.ATD(0, i) = which;

    Why j starts from 1 not 0?
    Is tmp the result of 10000 test images, right?
    So I think it needs to check first element.. now your codes compares only 9 elements in mat for each column..
    Please confirm this issue…


    • Eric
      Posted July 1, 2014 at 8:20 am | Permalink

      Hey Teddy,
      Here what I’m doing is, for every ith COLUMN, I want to find in which ROW it has max value. So in each ith column (the outer loop), I set the initial max value as the 0th row’s value (maxele), and set the position of initial max value 0 (which); then I just need to check the rest of rows of ith column. If there’s nothing bigger than the initial max value, then the result is the initial value.
      In brief, if I set the 0th value as initial value, then it is unnecessary to compare the initial value with itself.
      Hope the above comment helps.

  9. teddy
    Posted July 2, 2014 at 7:33 am | Permalink

    Thanks for your reply.
    Well.. I checked working resultProdic() in android. It works well.
    And I hope to use multi-channels(3, rgb) input images.. but your code is based on 1 channel.
    How can I fix it based on your code? Any idea?

    • Eric
      Posted July 2, 2014 at 12:25 pm | Permalink

      You can just combine the intensities from all the color channels for the pixels into one long vector, as if you were working with a grayscale image with 3x the number of pixels as the original image.

      • teddy
        Posted July 2, 2014 at 9:15 pm | Permalink

        Oh i see, for example, if I use 32×32 rgb color images, I need to make vector, the mat size is 32*(32+32+32) for each single image, right? currently, mnist image(28×28) so, mat size is 28×28 in loading images, read_Mnist().

        • Eric
          Posted July 3, 2014 at 2:06 am | Permalink

          Yes exactly. However I don’t think this is the best way although this is the only way I know, because I think there are connections among the three channels, they are closely related. So it’s likely we are losing some information if we simply do like that. Maybe a better way is just use 32 * 32 * 3, and make the whole network one more dimension? But you know, It’s hard to figure out and hard to implement especially using OpenCV. 🙂

  10. teddy
    Posted July 4, 2014 at 3:34 am | Permalink

    Well, I’m trying to change the dataset, so I load some data like below..

    Read trainX successfully, including 784 features and 39426 samples.
    Read trainY successfully, including 39426 samples

    then, when training….
    I got this message…

    Network Learning…………….
    *** glibc detected *** ./cnn: corrupted double-linked list: 0x00000000024c8240 ***
    Segmentation fault (core dumped)

    When I’m tracking the point of error, it’s randomly occured. sometimes in pooling(), or convandpooling()…
    cnn in the log is the binary name, my training image width/height 28*28, it’s the same as MNIST, the different is the size of training images. the catgegory number is the same as mnist too.
    Please give me a tip where I need to see in the codes?

    • Eric
      Posted July 4, 2014 at 10:16 am | Permalink

      Is your trainX binary, I mean the same form as Mnist?

      • teddy
        Posted July 6, 2014 at 9:03 pm | Permalink

        Yes, it’s my own traing data.. it’s not the same as mnist..

      • teddy
        Posted July 6, 2014 at 9:28 pm | Permalink

        I think it’s not a problem of the number of training images. I made trainX and trainY with 60K images/lables.. But the same error.. I’ll check the data of training..once again.

      • teddy
        Posted July 7, 2014 at 2:27 am | Permalink

        sorry, fogot this reply. I ,missed some operation before training.. after adding it, it works fine.

  11. teddy
    Posted July 7, 2014 at 3:04 am | Permalink

    When I want 3 convolving layer and 3 pooling layers, which codes should be added?
    I pushback KernelSize, KernelAmount, PoolingDim 3 times..


    and, I changed gloval variables like this.

    int NumHiddenLayers = 3;
    iint NumConvLayers = 3;

    However, I got the error… where in conv2() in cv2…
    Do I have to add some codes for using 3 convolving, 3 pooling layers??

    • teddy
      Posted July 7, 2014 at 4:55 am | Permalink

      I found the error reason, it’s the input dimention size of each layer..
      Sorry to bother you 🙂

  12. teddy
    Posted July 9, 2014 at 4:36 am | Permalink

    In this code, how many trainable parameters are there?
    Can I count that number of paremeters like this?

    C1 = 5x5x4+4+4 = 108
    P1 = 0
    C2 = 7x7x8+8+8 = 408
    P2 = 0
    FC1 = 200
    FC2 = 200
    SOFTMAX = 10
    ——————————– total : 926

    Is it right?

    • Eric
      Posted July 9, 2014 at 1:04 pm | Permalink

      Hi Teddy,

      For the two full-connected layers, the amount of trainable parameters is actually the weight matrices, so for each layer, the number should be (last_layer_outputs * this_layer_neuron_amount).

      • teddy
        Posted July 9, 2014 at 8:36 pm | Permalink

        Oh, I see, you mean that fc1 parameters are 8×200, fc2 parameters are 200×200, right?

  13. teddy
    Posted July 9, 2014 at 8:41 pm | Permalink

    oh, that’s wrong, I think the next is right, p2-fc1 is 1×200, fc1-fc2 is 200×200, fc2-softmax 10, correct?

  14. DengYu
    Posted July 14, 2014 at 1:04 am | Permalink

    Hi, Eric
    I want to use my data to train and test the code. Is there any request for the size of image or length of the training samples? Thank you very much

    • Eric
      Posted July 14, 2014 at 5:18 pm | Permalink

      There’s no constraint about size or length, but there are something you need to pay attention on:
      1. This code only support single-channel image so far.
      2. You need to modify the parameters of Conv layer and Pooling layer properly, such as kernel size and poolingDim.
      3. Inside readData(), I divided the whole dataset by 255.0 because I want the range of data to be (0, 1), so if your training data already has the range (0, 1), then disable that part of my code.

      have fun.

    Posted July 15, 2014 at 4:29 am | Permalink

    I hope to use your code to train CIFAR10,but after known that this code only support single-channel images. And if I convert CIFAR10 into single channle image for training,I think the percision would be very low. Is there any way to train color imgaes?

    • Eric
      Posted July 16, 2014 at 2:45 am | Permalink

      You can just combine the intensities from all the color channels for the pixels into one long vector, as if you were working with a grayscale image with 3x the number of pixels as the original image. Another idea, use 3d matrix instead of 2d.

  16. Arghavan
    Posted July 22, 2014 at 9:34 pm | Permalink

    Hi Eric
    I really appreciate your work on CNN. I used our code in Qt for MNIST dataset it works well all of a sudden but gives me an error : Out of Memory. I don`t know how fix it. Is there any solution for that?
    thanks a lot.

    • Eric
      Posted July 23, 2014 at 3:06 pm | Permalink

      Hi Arghavan,

      How much memory do you have in your computer? If you use large networks, it indeed need large memory, moreover, I used hash table in this code so it need more memory.

      • Arghavan
        Posted July 23, 2014 at 9:07 pm | Permalink

        I have 4G RAM and use default CNN you build. Is it possible to ask about your computer you run code on it?

  17. Yu Deng
    Posted July 24, 2014 at 3:08 am | Permalink

    Hi, Eric
    Can I bulid the code with Visual Studio 2010? Thanks for your help very much.

    • Eric
      Posted July 24, 2014 at 8:34 am | Permalink

      Hi Yu,
      Yes I think so, as long as you have correctly installed OpenCV on it.

  18. zhenghx
    Posted July 26, 2014 at 3:24 am | Permalink

    I has read your code for several days and finished now. I found this code is different with your first version “”, such as the softmax regression way and the learning rate. Your first version’s regression way is according to UFLDL’s “” but not this one.
    This code works wonderful because the convergence is very fast, so I want to read the paper or material witch you refer to. Can you give me a list ?

    • zhenghx
      Posted July 26, 2014 at 4:27 am | Permalink

      v_hl_W[i] = v_hl_W[i] * Momentum + lrate * HiddenLayers[i].Wgrad;
      v_hl_b[i] = v_hl_b[i] * Momentum + lrate * HiddenLayers[i].bgrad;
      HiddenLayers[i].W -= v_hl_W[i];
      HiddenLayers[i].b -= v_hl_b[i];
      according to UFLDL’s, I change the code to
      HiddenLayers[i].W -= lrate * HiddenLayers[i].Wgrad;
      HiddenLayers[i].b -= lrate * HiddenLayers[i].bgrad;
      and then the convergence speed become very slow.

  19. Ashok
    Posted August 22, 2014 at 6:51 am | Permalink

    Linux Make error: ConvNet.cpp:141:40: error: no matching function for call to ‘std::basic_ifstream::basic_ifstream(std::string&, const openmode&)’ ifstream file(filename, ios::binary);

    Just replaced
    ifstream file(filename, ios::binary);
    ifstream file(filename.c_str(), ios::binary);

    and works fine!!!!

    • Eric
      Posted August 27, 2014 at 10:50 pm | Permalink

      Thanks Ashok 🙂

  20. Lancelod Liu
    Posted September 23, 2014 at 2:47 am | Permalink

    Thanks a lot, I am looking for the cpp source code converted from the python version. There’s a problem. When I run your code in VS 2012 (opencv 2.4.9), it gave me a hint whose expression was “vector subscript out of range”. And there’s no output in the command window which means that cout<<"Read trainX successfully, including "<<trainX[0].cols * trainX[0].rows<<" features and "<<trainX.size()<<" samples."<<endl; didn’t run.
    I tried to use the full path of the mnist file but it failed the same.
    It’ll be great help if there’s any hint you could provide.

    • Lancelod Liu
      Posted September 23, 2014 at 2:56 am | Permalink

      I tried to add cout<<"open successfully\n" in the function readMnist() and it didn’t show up. I think maybe there’s something wrong with the read function.

      • Lancelod Liu
        Posted September 23, 2014 at 3:19 am | Permalink

        I fixed this problem by using
        readData(trainX, trainY, "mnist\\train-images-idx3-ubyte.gz", "mnist\\train-labels-idx1-ubyte.gz", 60000);
        readData(testX, testY, "mnist\\t10k-images-idx3-ubyte.gz", "mnist\\t10k-labels-idx1-ubyte.gz", 10000);

        But another problem showed up as below.

        OpenCV Error: One of arguments’ values is out of range (The total matrix si
        es not fit to “size_t” type) in cv::setSize, file C:\buildslave64\win64_amd
        _4_PackSlave-win32-vc11-shared\opencv\modules\core\src\matrix.cpp, line 126

        • Lancelod Liu
          Posted September 23, 2014 at 3:50 am | Permalink

          I fixed it…Apparently I should un-zip gz file before I read them…How could I simply read the ‘*.gz’ file and expect a right answer…

  21. Angus
    Posted October 5, 2014 at 9:44 am | Permalink


    • Eric
      Posted October 6, 2014 at 4:06 pm | Permalink


  22. 赵元兴
    Posted November 29, 2014 at 10:59 pm | Permalink

    您好,请问local contrast normalization 您是否有过尝试?对于正向,这一层如果在单feature map上做 map不是又小了一圈么? 这个需要特殊处理么? 这一层的参数貌似都是自己指定的 反向传播又应该如何做呢?请问有没有相关文章推荐呢? 谢谢

    • Eric
      Posted December 2, 2014 at 6:32 pm | Permalink


      • 赵元兴
        Posted December 9, 2014 at 5:49 am | Permalink

        谢谢,我看到你的代码中gradient checking部分 做了两个导数的比值 (tp / grad.ATD(i, j))但是请问阈值为多少的时候认为是合格能?

        • Eric
          Posted December 9, 2014 at 4:13 pm | Permalink


          • 赵元兴
            Posted December 15, 2014 at 2:21 am | Permalink


  23. Muralidhar
    Posted February 12, 2015 at 5:45 am | Permalink

    I am trying to run code, but i am getting error saying segmentation fault (core dumped) after training dataset, Please let me what I need to change in code, or if I am doing any thing wrong.

    • Muralidhar
      Posted February 12, 2015 at 9:04 am | Permalink

      Problem is it is not able to create folder kernel, What could be the reason. Because of that it is showing segmentation fault, Please let me know how to debug.

      • Muralidhar
        Posted February 12, 2015 at 10:39 am | Permalink

        Problem was I need to create kernel folder manually.
        Thank you so much for providing code.

        • Eric
          Posted March 10, 2015 at 2:32 pm | Permalink

          Hi Muralidhar,

          Maybe because I’m currently using a Mac and most of the methods in my code are Mac-supported, especially files/folders related part.

  24. Wei Chee
    Posted March 26, 2016 at 2:38 pm | Permalink

    Hi, I tried to implement your code on Microsoft Visual Studio 2013 but I keep getting this error . Hope you can help me.
    Error 1 error C2065: ‘S_IRWXU’ : undeclared identifier
    Error 2 error C2065: ‘S_IRWXG’ : undeclared identifier
    Error 3 error C2065: ‘S_IROTH’ : undeclared identifier
    Error 4 error C2065: ‘S_IXOTH’ : undeclared identifier

    All these errors occured at line 64 in

  25. Kushal
    Posted June 2, 2016 at 1:14 am | Permalink

    Sir I am doing Image processing Face recognition!
    I have tried using Eigen/ LBPH Face recognition algorithms in opencv.. but got no result.. I want to use CNN for for Image recognition part.. Your code seems pretty large and vast.. Can u pleasse guide me. . how to input my Image data.. , how to get feature vector and train into CNN.. . how to use it in prediction..
    Sir thanks in advance.. I am using C++ with opencv 3.1.0
    Hoping to hear from you. 🙂

  26. Justin
    Posted August 14, 2016 at 2:22 pm | Permalink

    Hello Eric,

    Thanks for sharing this code. I have trouble to run this code somehow. I’m using opencv 2.4 in visual studio. When I learned this code, it says that POOL_STOCHASTIC is undefined and CLOCK. I include , so the error regard clock is gone, but I still don’t know how can I make POOL_STOCHASTIC to be able. Please advice me.

One Trackback

Post a Comment

Your email is never published nor shared. Required fields are marked *

You may use these HTML tags and attributes <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>