Convolutional Neural Networks III

Hey, I’m recently working on my new version of CNN, the updates are as follows:

  1. Support 3-channels images;
  2. Add Dropout;
  3. In conv layers, one can use either 3-channel conv kernels or single-chanel conv kernels (that is to say, whether share weights).

Now I’ve finished most of the works, and I’m debugging the code, hope I can release it in several days.

Here’s Early adopters edition, which is still buggy. I’ll post the formal version in days.

Newly updated (Aug. 28):

I apologize for the delay replies comments.

I was in Pittsburgh for a few days and was busy helping my wife to move into a new apartment, I only had about 1 hour per day for doing this…

However, I do think I’m on the right way, here are some of the kernels I got from training:

channel_0  (Y)


channel_1  (Cr)



channel_2  (Cb)


3-channels  (converted into RGB)


One problem now I’m facing, the OpenCV svd function is extremely slow if the matrix is large, they used to use LAPACK svd algo. but someone edited the algo. and caused this speed problem. In my opinion, I should directly use LAPACK svd without OpenCV svd function. I’ll work on it these days.

Newly newly updated (Sept. 12):

I tested the current network on CIFAR10 using the following configs:


Sept. 29.

I updated the learning rate calculation functions, I now use second order derivatives to calculate learning rate in each learning step,  it is very similar with the gradient backprop procedure, so while backproping in each layer, I also backprop something like Hessian.

I also fixed several bugs, like

I always thought that the following code will generate a 3-channel matrix which every single element inside is 1.0:

Mat a = Mat::ones(height, width, CV_64FC3);

But it actually generates a 3-channel matrix which every element in first channel is 1.0, and elements in other channels are 0.0. So for doing what I wanted, the correct code is:

Mat a = cv::Mat(height, width, CV_64FC3, Scalar(1.0, 1.0, 1.0));

Another thing is, I found my network is suffering with overfitting issue, because the lately result says (I’m using 2 conv layers, 8 kernels in 1st layer, and 20 kernels in 2nd layer), after 40,000 iterations of stochastic gradient descent training, the accuracy on training set is about 98%, but the accuracy on test dataset is only about 71%.

So my next plan is to due with this over-fitting problem. First, find a better amount of kernels and fc-layer neurons; Second, do something like input data enlarging (like Prof. Hinton’s imagenet experiment).

This entry was posted in Machine Learning and tagged . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.


  1. Mike
    Posted August 10, 2014 at 8:46 am | Permalink

    I am working on a project where I want to have a number be inputted alongside an image. For example, I could have a picture of a patient’s leg and the scale of one to ten on how much it hurts them, to determine what the treatment should be. I was wondering how I could do this?

    • Eric
      Posted August 13, 2014 at 4:46 pm | Permalink

      Hey Mike,

      Apparently you need a lot of this kind of images, labeled or unlabeled.

      First condition: you have enough labeled images (maybe thousands of them). Use CNNs.

      Second condition: you don’t have enough labeled images. Then use DBN or Denoising auto-encoder even ICA, any pre-training method, do pre-training. Then use your little amount of labeled images to test (You can search Self-taught networks online)

      • Mike
        Posted August 23, 2014 at 3:32 pm | Permalink

        Whats your plan about your CNN lib? do you also add DBN or Denoising auto-encoder even ICA?

        • Eric
          Posted August 27, 2014 at 11:50 pm | Permalink

          Hey Mike,

          Actually I’m not going to add something like DBN/DAE/ICA into this CNN, it is still buggy and I’m working on debugging it, maybe not a good idea to add something new until it is bug-free 🙂

  2. Mike
    Posted August 10, 2014 at 10:23 am | Permalink

    another nice article:
    some other teams are implementing a soc:

    Posted November 5, 2014 at 3:34 am | Permalink

    hi,Eric Yuan,thank you for your help.Now I am learning your code. But I not sure what version development tools you use.Beause when I try to build your code but face some errors. And I use visual studio 2010 and opencv2.3.1 .
    following are the errors:
    1>e:\cnn-eric-yuan\cnn\cnn\src\ error C2065: “S_IRWXU”: 未声明的标识符
    1>e:\cnn-eric-yuan\cnn\cnn\src\ error C2065: “S_IRWXG”: 未声明的标识符
    1>e:\cnn-eric-yuan\cnn\cnn\src\ error C2065: “S_IROTH”: 未声明的标识符
    1>e:\cnn-eric-yuan\cnn\cnn\src\ error C2065: “S_IXOTH”: 未声明的标识符
    1>e:\cnn-eric-yuan\cnn\cnn\src\ error C3861: “mkdir”: 找不到标识符
    e:\cnn-eric-yuan\cnn\cnn\src\ error C2668: “std::to_string”: 对重载函数的调用不明确
    e:\cnn-eric-yuan\cnn\cnn\src\ error C2668: “std::to_string”: 对重载函数的调用不明确
    e:\cnn-eric-yuan\cnn\cnn\src\ error C2668: “sqrt”: 对重载函数的调用不明确

    • Eric
      Posted November 5, 2014 at 10:32 pm | Permalink

      哈喽! 我用的是gcc,如果用vs的话,看起来你需要找vs里做以下事情的函数:
      1. 建立文件夹,你列出的前五个错误都是UNIX里创建路径的语句报错。
      2. 把std::to_string()对应的函数换成vs适用的int转string的函数。
      3. sqrt()的问题,可能是(1)需要添加对应的头文件(2)不支持对int的开方?不清楚,你可以试试。

      • toshiba
        Posted October 11, 2015 at 1:10 am | Permalink

        我也遇到这个问题了,vs无法编译simple rnn那份代码,请问只能在Ubuntu上跑吗?

  4. Rehman
    Posted November 5, 2014 at 1:35 pm | Permalink

    Sir, I am implementing CNN in Opencv Java. I have implemented the classification. In which i have 5 layers input layer, conv layer, subsampling layer, conv layer, subsampling and atlast fully connected layer. I have used Vector<VEctor> and i have completed my code and it is running i got ~90% accuracy. but the issue is it is not much fast. For my 1248 labelled testvectors it takes 17mins approx. After debugging it i found in Matlab convn fuction is used for N-dimensional convolution and here i have used conv2 with a loop which takes much time. so any solution to solve this issue? Means how can i do N-dimensional convolution fast. Whereas same code in C++ doesnot take much time. Will be waiting for your reply sir 🙂

    • Eric
      Posted November 5, 2014 at 10:37 pm | Permalink

      Hey Rehman,

      I’m not sure but maybe we can try to use Separable kernel to do convolution? Or use separable version of kernels to approximate the kernels? I never tried this but for accelerate the conv speed, this may be a solution.

    Posted November 7, 2014 at 4:59 am | Permalink

    virtual-machine:~/src/conv-net-version-3-master# make
    [ 6%] Building CXX object CMakeFiles/conv.dir/src/
    In file included from /usr/include/c++/4.4/unordered_map:35,
    from /root/src/conv-net-version-3-master/src/convolution.h:3,
    from /root/src/conv-net-version-3-master/src/general_settings.h:6,
    from /root/src/conv-net-version-3-master/src/matrix_maths.h:2,
    from /root/src/conv-net-version-3-master/src/
    /usr/include/c++/4.4/c++0x_warning.h:31: error: #error This file requires compiler and library support for the upcoming ISO C++ standard, C++0x. This support is currently experimental, and must be enabled with the -std=c++0x or -std=gnu++0x compiler options.
    In file included from /root/src/conv-net-version-3-master/src/general_settings.h:6,
    from /root/src/conv-net-version-3-master/src/matrix_maths.h:2,
    from /root/src/conv-net-version-3-master/src/
    /root/src/conv-net-version-3-master/src/convolution.h:21: error: ISO C++ forbids declaration of ‘unordered_map’ with no type
    /root/src/conv-net-version-3-master/src/convolution.h:21: error: expected ‘,’ or ‘…’ before ‘<’ token
    /root/src/conv-net-version-3-master/src/convolution.h:23: error: ISO C++ forbids declaration of ‘unordered_map’ with no type
    /root/src/conv-net-version-3-master/src/convolution.h:23: error: expected ‘,’ or ‘…’ before ‘<’ token
    /root/src/conv-net-version-3-master/src/convolution.h:26: error: ‘unordered_map’ has not been declared
    /root/src/conv-net-version-3-master/src/convolution.h:26: error: expected ‘,’ or ‘…’ before ‘<’ token
    /root/src/conv-net-version-3-master/src/convolution.h:29: error: ‘unordered_map’ has not been declared
    /root/src/conv-net-version-3-master/src/convolution.h:29: error: expected ‘,’ or ‘…’ before ‘<’ token
    make[2]: *** [CMakeFiles/conv.dir/src/] Error 1
    make[1]: *** [CMakeFiles/conv.dir/all] Error 2
    make: *** [all] Error 2
    在ubuntu12.04系统安装opencv2.4.9,然后按照你代码的Readme.mk进行操作,出现上面的错误。按照提示的意思,应该是编译器版本的问题?错误提示要求加上 -std=c++0x or -std=gnu++0x 编译选项,但是由于我对Makefile和cmake生成下面不是很了解,不知道在哪里加上这个选项,在网上搜了一下,也没有找到答案。不知道应该如何解决呢?

    • Eric
      Posted November 7, 2014 at 11:35 pm | Permalink


        Posted November 8, 2014 at 6:53 am | Permalink

        你好,很高兴你的提示,但是升级了gcc和cmake也是会提示同样的问题。错误提示要加一些编译选项,刚开始我不知道在哪里加,后来,我在MakefileList.txt里面加上SET(CMAKE_CXX_FLAGS “-std=c++11”)就能编译通过了,呵呵!

        • cmtsai
          Posted December 28, 2015 at 2:54 am | Permalink

          Where is the MakefileList.txt?
          Where to add SET(CMAKE_CXX_FLAGS “-std=c++11″)?

    Posted November 17, 2014 at 3:24 am | Permalink

    hi,Eric Yuan,在weight_inti.cc源文件中,关于全连接层的权重初始化的代码中,我个人感觉你的代码中有个细节没有考虑到?
    if(fcConfig.size() > 0)
    Fcl tpntw;
    weightRandomInit(tpntw, hiddenfeatures * 3, fcConfig[0].NumHiddenNeurons);
    for(int i = 1; i < fcConfig.size(); i++)
    Fcl tpntw2;
    weightRandomInit(tpntw2, fcConfig[i – 1].NumHiddenNeurons, fcConfig[i].NumHiddenNeurons);
    }在上面的代码中: weightRandomInit(tpntw, hiddenfeatures * 3, fcConfig[0].NumHiddenNeurons); 其中hiddenfeatures * 3是否也是要区别对待?如果是卷积核共享权值的话,一个卷积核卷积一个三通道图片后,对应的卷积映射(feature map)应该是单通道的,所以这时候应该不用乘以3了?如果卷积核是3通道的,卷积核的每一个通道卷积核去卷积对应的图片的各个通道,这时就应该使用hiddenfeatures * 3?

    • Eric
      Posted November 17, 2014 at 2:06 pm | Permalink


        Posted November 23, 2014 at 11:59 pm | Permalink

        另外你的卷积核太大了,可能学习到的特征不够多,故“accuracy on test dataset is only about 65%”,把核调整小些,很容易达到70%+,当然训练时间肯定长了。

        • Eric
          Posted November 28, 2014 at 6:03 am | Permalink


    Posted November 17, 2014 at 3:28 am | Permalink

    // Init Softmax layer
    if(fcConfig.size() == 0){
    weightRandomInit(smr, softmaxConfig.NumClasses, hiddenfeatures * 3);

  8. zhenghx
    Posted November 18, 2014 at 12:58 am | Permalink

    Hi, Eric Yuan, I want to say thanks to you. According to you “Convolutional Neural Networks II” code, I use CUDA to accelerate it. I add “distortion, rotate,scale” and finally I can get 99.72% in mnist data.
    When I try to add “dropout” in my code, I find it work worstly.
    Did you find the same result when you use “dropout” to overcome overfitting.

    • Eric
      Posted November 19, 2014 at 9:27 pm | Permalink

      Hi, which layer/layers are you using dropout? In Geof. Hinton’s paper it says that maybe it’s not a good idea to use dropout in convolutional layers. So try to use dropout only in Fully-Connected layer.

    Posted November 27, 2014 at 4:32 am | Permalink

    Eric Yuan,我的电脑没有gpu的,有什么办法可以加速训练速度?稍微把核的数量加多一些,速度就慢得不行了。

    • Eric
      Posted November 28, 2014 at 6:04 am | Permalink

      那可能就需要用一些其他的linear algebra的库,我用OpenCV的原因是因为我常用而已。。它并不是最快的

    • zhenghx
      Posted December 1, 2014 at 1:30 am | Permalink


  10. nikx
    Posted November 28, 2014 at 8:38 am | Permalink

    how do i get to visualize the filters like you have done? how did you get seperate channel filters?? arent channels merged when we do 3d convolution?

    • Eric
      Posted December 2, 2014 at 6:28 pm | Permalink

      Hi nikx, my version of cnn works in a little different way compare with Y. LeCun’s cnn, when using 3-channel mode, I use 3-channel kernel to convolve with images separately, because I think that the three channels (RGB, YCrCb, or HSV) maybe not contain features in the same fashion, so maybe it’s better to make the 3 channels of kernels separate. For visualize the filters, I simply store the results into .txt file, and use Matlab “imagesc” to show. Thanks.

      • nikx
        Posted December 3, 2014 at 8:23 am | Permalink

        Thanks lot Eric, your code is helping me understand how convnets work ,till now have been into theory mostly… saw some projects bt felt this is really close to wht i wanted to start with… thanks alot. n one question

        I tried running this code in vs2012 on win8.1 but is hung having ” out of memory exception” at the very fix u gav recently “Mat a = cv::Mat(height, width, CV_64FC3, Scalar(1.0, 1.0, 1.0));” . is that a issue with os dependent memory allocation? or ?

        • Eric
          Posted December 3, 2014 at 9:36 pm | Permalink

          Hey nikx, what size of memory are you using? And, are you using Win32 or Win64? I’m actually pretty sure that your system is 64-bit version, but maybe you r using the 32-bit compiler? Try to use the 64-bit version of compiler in your VS. Thanks.

          • nikx
            Posted December 4, 2014 at 7:44 am | Permalink

            hey thanks i was using a 64 bit compiler but was refering to x86 lib files and dlls of opencv , also my proj was set to x86 debug mode. hence the error. worked well now thanks…

            grt work

    Posted December 7, 2014 at 2:20 am | Permalink

    Mat m = Mat::ones(1, 1, CV_64FC1);

    randu(m, Scalar(0.0), Scalar(1.0));
    m *= (src2.cols – _size – 1);
    int randomNum = int(m.ATD(0, 0));
    for(int i = 0; i push_back(src1[i + randomNum]);
    Rect roi = Rect(randomNum, 0, _size, src2.rows);
    for i = 1 : opts.numepochs
    disp([‘epoch ‘ num2str(i) ‘/’ num2str(opts.numepochs)]);
    kk = randperm(m); //打乱样本顺序
    for l = 1 : numbatches
    batch_x = x(:, :, kk((l – 1) * opts.batchsize + 1 : l * opts.batchsize));//顺序读打乱的样本
    batch_y = y(:, kk((l – 1) * opts.batchsize + 1 : l * opts.batchsize));

    • Eric
      Posted December 7, 2014 at 6:56 pm | Permalink

      Hi, 恩这个问题我也想过,所以特意用了randu去generate uniformly-distributed random number,尽可能让大部分data有均等的被点到的概率,不过这样做的问题就在于,最开始的batchsize个样本和最后的batchsize个样本被点到的机会比较小。。所有样本打乱并且顺序读取我暂时没有想好怎么去实现。。你有什么改进的想法吗?

    Posted December 7, 2014 at 9:33 pm | Permalink

    在train_networks.cc中: //定义一个整形向量,初始化为0-样本数
    for(int i = 0; i < x.size(); ++i)

    for(int epo = 1; epo <= training_epochs; epo++)
    int numbatches = x.size()/batch_size;
    for(int kk = 0; kk < numbatches; ++kk)

    //for(; k <= iter_per_epo * epo; k++)
    getSample(x, &batchX, y, &batchY, batch_size, SAMPLE_COLS, kk,randperm);

    for(int i = 0; i push_back(src1[tmp]);

    Mat mtmp = Mat::zeros(1,_size,CV_64FC1);
    for(int i = 0; i < _size; i++)
    int tmp = randperm[kk*_size+i];
    double label = src2.ATD(0,tmp);
    mtmp.ATD(0,i) = label;


    嗯,今天我把LRN设为true,跑了一下,发现你的这个local response normalization还没有调好,出现错误。呵呵。

    • Eric
      Posted December 16, 2014 at 6:41 pm | Permalink

      sample 这部分我在github上也改了。conv-net-3和multi-layer的版本都改了, LRN那里还没有弄好:) 谢谢~

    • Eric
      Posted December 16, 2014 at 6:50 pm | Permalink

      现在用的办法是,维持一个nSamples大小,也就是dataset大小的int vector,每次sample的时候random_shuffle它,然后取前batch_size个label,然后去取这些label对应的X和Y。

        Posted December 23, 2014 at 3:17 am | Permalink

        smr.Wd2 = pow((groundTruth – p), 2.0) * pow(hidden[hidden.size() – 1].t(), 2.0);
        smr.Wd2 = smr.Wd2 / nsamples + softmaxConfig.WeightDecay;//看不懂啊,哪里的公式,简化版吗?
        smr.bd2 = reduce(pow((groundTruth – p), 2.0), 1, CV_REDUCE_SUM) / nsamples;

        • Eric
          Posted December 29, 2014 at 9:13 pm | Permalink

          hi, 变量名后面加上d2的系列是二阶导数,是我用来确定学习率用的,有的地方确实取了近似,因为觉得每一步都去更新精确的二阶导数可能运算量会很大。新年快乐!!

        • Eric
          Posted December 29, 2014 at 9:16 pm | Permalink

          另外关于过拟合,我看有的人是这么解决的,对于training dataset, 对每个图像去做一些随机的缩放,旋转,或者镜像之类的,然后把这些新的图像也加入训练集,我觉得挺靠谱的,准备试试:)

  13. Feng
    Posted March 20, 2015 at 11:59 pm | Permalink

    Hi Eric:

    In your code of multi-layers, I found that the number (depth) of picture output in convolution layer is equal to the product of the channel number (depth) of input picture and the kernel amount of convolution layer. Therefore after all convolution layers, total feature map is kernelAmount[1]*kernelAmount[2]*….* kernelAmount[n].

    Is it correct ?

    I understood that the number (depth) of picture output in convolution layer is the same as the kernel amount of convolution layer. For example, input is (224×224)x64, here 64 is depth, the kernel amount of convolution layer is 128, using (5x5x64)x128 filters. For each kernel, input is 224x224x64, using 5x5x64 filters, output is 1 slice picture. 128 kernels have 128 slice picture. So the output of convolution layer is 128 slice pictures, is not 64*128 slice pictures.

    Maybe two methods are possible. Which method is better ?



    • Eric
      Posted March 23, 2015 at 4:46 am | Permalink

      Hi Feng,

      Yes you are correct, my network is slightly different with the traditional cnn, which exactly as what you described. I can’t say which is better because I’ve never tried to implement LeNet-like cnn, it is actually on my todo list but I always delaying that 🙁 my bad. However, I think one of the advantages that LeNet-like cnn outperform mine is that the dimensionality after conv layers are way lower than mine cnn, and we can use less hidden neurons in fully-connected layers, right? I’ll do this soon and let’s discuss that then. Thanks!

  14. Mohammad
    Posted March 26, 2015 at 5:35 pm | Permalink

    Hi Eric;
    I wanted to ask about the GPU part. Can you tell me which types of GPU does it support for running and if there is any option for using just CPU?


  15. Gil Levi
    Posted May 19, 2015 at 1:37 pm | Permalink

    Great work!

    Have you thought about integrating your code into OpenCV and contributing it?


  16. Lee Yoongu
    Posted June 11, 2015 at 1:01 pm | Permalink

    Good job. I am using Convolution Neural Network for gender classification of human face. Your code is very useful for me. Could you said to me which your code (Convolutional Neural Networks I,II or III) satisfies for my task? Thanks in advance

  17. Posted September 28, 2015 at 11:20 pm | Permalink

    I am using a machine with 8Gb of RAM. When we reach this line of code, only 4.53Gb of RAM are being used, when this line is called i get the following error. Any ideas?
    Mat tmp = concat(trainX);

    OpenCV Error: Insufficient memory (Failed to allocate 1228800004 bytes) in cv::O
    utOfMemoryError, file C:\builds\2_4_PackSlave-win32-vc11-shared\opencv\modules\c
    ore\src\alloc.cpp, line 52

  18. poochuan
    Posted November 3, 2015 at 12:10 pm | Permalink


  19. Sophea
    Posted November 9, 2015 at 5:39 am | Permalink


    Great thanks for your code. Actually, I can compile it without problem. But when i try to run it, i just got a message “Killed”.

    My opencv version is 3.0, I run on fedora 23, 32bit.

    Thanks in advance

  20. sachin
    Posted January 18, 2016 at 5:09 am | Permalink

    Hi Eric,

    I have trained the CNN using your code for mnist set. I have saved cvl.layer using saveWeight.

    Can you please help me how to use it for training? I don’t want to run the training module again. Can it is possible to use these saved .txt files?


  21. sachin
    Posted January 18, 2016 at 5:10 am | Permalink

    Hi Eric,

    I have trained the CNN using your code for mnist set. I have saved cvl.layer using saveWeight.

    Can you please help me how to use it for testing on different files? Also can you help me how it can be used for text recognition from images?


  22. Iftah G.
    Posted February 9, 2016 at 6:02 am | Permalink

    Great work!

    I’m trying to use your CNN with non-square input images (e.g. 16×48), but it fails. I attempted to fix the code to allow this. I Started by changing the ‘convert’ function to receive both rows and cols. But it just fails further down the line and I can’t seem to find what’s wrong. Is it possible to achieve with some work, or does it require a complete overhaul?

    Thanks in advance.

  23. Daniel Bell
    Posted April 10, 2016 at 4:23 am | Permalink

    Hi Eric,

    Is there any way to get the loss function from the SoftMax layer?



2 Trackbacks

Post a Comment

Your email is never published nor shared. Required fields are marked *

You may use these HTML tags and attributes <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>