Hey, I’m recently working on my new version of CNN, the updates are as follows:

- Support 3-channels images;
- Add Dropout;
- In conv layers, one can use either 3-channel conv kernels or single-chanel conv kernels (that is to say, whether share weights).

Now I’ve finished most of the works, and I’m debugging the code, hope I can release it in several days.

Here’s Early adopters edition, which is still buggy. I’ll post the formal version in days.

**https://github.com/xingdi-eric-yuan/conv-net-version-3**

Newly updated (Aug. 28):

I apologize for the delay replies comments.

I was in Pittsburgh for a few days and was busy helping my wife to move into a new apartment, I only had about 1 hour per day for doing this…

However, I do think I’m on the right way, here are some of the kernels I got from training:

channel_0 (Y)

channel_1 (Cr)

channel_2 (Cb)

3-channels (converted into RGB)

One problem now I’m facing, the OpenCV svd function is extremely slow if the matrix is large, they used to use LAPACK svd algo. but someone edited the algo. and caused this speed problem. In my opinion, I should directly use LAPACK svd without OpenCV svd function. I’ll work on it these days.

Newly newly updated (Sept. 12):

I tested the current network on CIFAR10 using the following configs:

Sept. 29.

I updated the learning rate calculation functions, I now use second order derivatives to calculate learning rate in each learning step, it is very similar with the gradient backprop procedure, so while backproping in each layer, I also backprop something like Hessian.

I also fixed several bugs, like

I always thought that the following code will generate a 3-channel matrix which every single element inside is 1.0:

Mat a = Mat::ones(height, width, CV_64FC3);

But it actually generates a 3-channel matrix which every element in first channel is 1.0, and elements in other channels are 0.0. So for doing what I wanted, the correct code is:

Mat a = cv::Mat(height, width, CV_64FC3, Scalar(1.0, 1.0, 1.0));

Another thing is, I found my network is suffering with overfitting issue, because the lately result says (I’m using 2 conv layers, 8 kernels in 1st layer, and 20 kernels in 2nd layer), after 40,000 iterations of stochastic gradient descent training, the accuracy on training set is about 98%, but the accuracy on test dataset is only about 71%.

So my next plan is to due with this over-fitting problem. First, find a better amount of kernels and fc-layer neurons; Second, do something like input data enlarging (like Prof. Hinton’s imagenet experiment).

## 51 Comments

I am working on a project where I want to have a number be inputted alongside an image. For example, I could have a picture of a patient’s leg and the scale of one to ten on how much it hurts them, to determine what the treatment should be. I was wondering how I could do this?

Hey Mike,

Apparently you need a lot of this kind of images, labeled or unlabeled.

First condition: you have enough labeled images (maybe thousands of them). Use CNNs.

Second condition: you don’t have enough labeled images. Then use DBN or Denoising auto-encoder even ICA, any pre-training method, do pre-training. Then use your little amount of labeled images to test (You can search Self-taught networks online)

Whats your plan about your CNN lib? do you also add DBN or Denoising auto-encoder even ICA?

Hey Mike,

Actually I’m not going to add something like DBN/DAE/ICA into this CNN, it is still buggy and I’m working on debugging it, maybe not a good idea to add something new until it is bug-free 🙂

another nice article: https://plus.google.com/+YannLeCunPhD/posts/BoUYGNM4pgy

some other teams are implementing a soc: http://www.neuflow.org/

hi,Eric Yuan,thank you for your help.Now I am learning your code. But I not sure what version development tools you use.Beause when I try to build your code but face some errors. And I use visual studio 2010 and opencv2.3.1 .

following are the errors:

1>e:\cnn-eric-yuan\cnn\cnn\src\cost_gradient.cc(117): error C2065: “S_IRWXU”: 未声明的标识符

1>e:\cnn-eric-yuan\cnn\cnn\src\cost_gradient.cc(117): error C2065: “S_IRWXG”: 未声明的标识符

1>e:\cnn-eric-yuan\cnn\cnn\src\cost_gradient.cc(117): error C2065: “S_IROTH”: 未声明的标识符

1>e:\cnn-eric-yuan\cnn\cnn\src\cost_gradient.cc(117): error C2065: “S_IXOTH”: 未声明的标识符

1>e:\cnn-eric-yuan\cnn\cnn\src\cost_gradient.cc(117): error C3861: “mkdir”: 找不到标识符

e:\cnn-eric-yuan\cnn\cnn\src\cost_gradient.cc(166): error C2668: “std::to_string”: 对重载函数的调用不明确

e:\cnn-eric-yuan\cnn\cnn\src\cost_gradient.cc(179): error C2668: “std::to_string”: 对重载函数的调用不明确

e:\cnn-eric-yuan\cnn\cnn\src\convolution.cc(410): error C2668: “sqrt”: 对重载函数的调用不明确

哈喽! 我用的是gcc，如果用vs的话，看起来你需要找vs里做以下事情的函数：

1. 建立文件夹，你列出的前五个错误都是UNIX里创建路径的语句报错。

2. 把std::to_string()对应的函数换成vs适用的int转string的函数。

3. sqrt()的问题，可能是(1)需要添加对应的头文件(2)不支持对int的开方？不清楚，你可以试试。

🙂

我也遇到这个问题了，vs无法编译simple rnn那份代码，请问只能在Ubuntu上跑吗？

Sir, I am implementing CNN in Opencv Java. I have implemented the classification. In which i have 5 layers input layer, conv layer, subsampling layer, conv layer, subsampling and atlast fully connected layer. I have used Vector<VEctor> and i have completed my code and it is running i got ~90% accuracy. but the issue is it is not much fast. For my 1248 labelled testvectors it takes 17mins approx. After debugging it i found in Matlab convn fuction is used for N-dimensional convolution and here i have used conv2 with a loop which takes much time. so any solution to solve this issue? Means how can i do N-dimensional convolution fast. Whereas same code in C++ doesnot take much time. Will be waiting for your reply sir 🙂

Hey Rehman,

I’m not sure but maybe we can try to use Separable kernel to do convolution? Or use separable version of kernels to approximate the kernels? I never tried this but for accelerate the conv speed, this may be a solution.

virtual-machine:~/src/conv-net-version-3-master# make

[ 6%] Building CXX object CMakeFiles/conv.dir/src/matrix_maths.cc.o

In file included from /usr/include/c++/4.4/unordered_map:35,

from /root/src/conv-net-version-3-master/src/convolution.h:3,

from /root/src/conv-net-version-3-master/src/general_settings.h:6,

from /root/src/conv-net-version-3-master/src/matrix_maths.h:2,

from /root/src/conv-net-version-3-master/src/matrix_maths.cc:1:

/usr/include/c++/4.4/c++0x_warning.h:31: error: #error This file requires compiler and library support for the upcoming ISO C++ standard, C++0x. This support is currently experimental, and must be enabled with the -std=c++0x or -std=gnu++0x compiler options.

In file included from /root/src/conv-net-version-3-master/src/general_settings.h:6,

from /root/src/conv-net-version-3-master/src/matrix_maths.h:2,

from /root/src/conv-net-version-3-master/src/matrix_maths.cc:1:

/root/src/conv-net-version-3-master/src/convolution.h:21: error: ISO C++ forbids declaration of ‘unordered_map’ with no type

/root/src/conv-net-version-3-master/src/convolution.h:21: error: expected ‘,’ or ‘…’ before ‘<’ token

/root/src/conv-net-version-3-master/src/convolution.h:23: error: ISO C++ forbids declaration of ‘unordered_map’ with no type

/root/src/conv-net-version-3-master/src/convolution.h:23: error: expected ‘,’ or ‘…’ before ‘<’ token

/root/src/conv-net-version-3-master/src/convolution.h:26: error: ‘unordered_map’ has not been declared

/root/src/conv-net-version-3-master/src/convolution.h:26: error: expected ‘,’ or ‘…’ before ‘<’ token

/root/src/conv-net-version-3-master/src/convolution.h:29: error: ‘unordered_map’ has not been declared

/root/src/conv-net-version-3-master/src/convolution.h:29: error: expected ‘,’ or ‘…’ before ‘<’ token

make[2]: *** [CMakeFiles/conv.dir/src/matrix_maths.cc.o] Error 1

make[1]: *** [CMakeFiles/conv.dir/all] Error 2

make: *** [all] Error 2

root@albert-virtual-machine:~/src/conv-net-version-3-master#

在ubuntu12.04系统安装opencv2.4.9，然后按照你代码的Readme.mk进行操作，出现上面的错误。按照提示的意思，应该是编译器版本的问题？错误提示要求加上 -std=c++0x or -std=gnu++0x 编译选项，但是由于我对Makefile和cmake生成下面不是很了解，不知道在哪里加上这个选项，在网上搜了一下，也没有找到答案。不知道应该如何解决呢？

要不试试更新一下gcc版本以及cmake的版本？

你好，很高兴你的提示，但是升级了gcc和cmake也是会提示同样的问题。错误提示要加一些编译选项，刚开始我不知道在哪里加，后来，我在MakefileList.txt里面加上SET(CMAKE_CXX_FLAGS “-std=c++11”)就能编译通过了，呵呵！

Where is the MakefileList.txt?

Where to add SET(CMAKE_CXX_FLAGS “-std=c++11″)?

hi，Eric Yuan，在weight_inti.cc源文件中，关于全连接层的权重初始化的代码中，我个人感觉你的代码中有个细节没有考虑到？

if(fcConfig.size() > 0)

{

Fcl tpntw;

weightRandomInit(tpntw, hiddenfeatures * 3, fcConfig[0].NumHiddenNeurons);

HiddenLayers.push_back(tpntw);

for(int i = 1; i < fcConfig.size(); i++)

{

Fcl tpntw2;

weightRandomInit(tpntw2, fcConfig[i – 1].NumHiddenNeurons, fcConfig[i].NumHiddenNeurons);

HiddenLayers.push_back(tpntw2);

}

}在上面的代码中： weightRandomInit(tpntw, hiddenfeatures * 3, fcConfig[0].NumHiddenNeurons); 其中hiddenfeatures * 3是否也是要区别对待？如果是卷积核共享权值的话，一个卷积核卷积一个三通道图片后，对应的卷积映射（feature map）应该是单通道的，所以这时候应该不用乘以3了？如果卷积核是3通道的，卷积核的每一个通道卷积核去卷积对应的图片的各个通道，这时就应该使用hiddenfeatures * 3？

恩你看得很仔细，我这版本对于卷积的实现和LeCun的LeNet有一些区别。为了省事，对于单通道卷积核，我也是直接让三个通道都等于这个核，去进行3通道的卷积。所以之后一律乘以3.而且在偏导和二阶偏导更新的时候，也对于这种情况做了特殊的处理。过段时间有空了会改成LeNet那样试试效果。（还有一点不同，不知道你发现了没，在LeNet里，如果我有m张输入图片，经过一个n个核的卷积层，输出的应该还是m个输出。而我这里，是直接对于所有输入去和所有核卷积，输出的是m*n个输出。）:)

嗯，只不过LeNet的组合输出需要经验，否则输出效果不好。理论上是m输入，m*n输出（n个核的情况下)。

另外你的卷积核太大了，可能学习到的特征不够多，故“accuracy on test dataset is only about 65%”，把核调整小些，很容易达到70%+，当然训练时间肯定长了。

另外想请教你一个问题，如何显示学到的特征呢？我把权重读出来，使用imshow函数，显示不出来你上面的特征图。

我一般都把训练好的特征存在txt里，然后用matlab去读，直接imagesc就可以。

// Init Softmax layer

if(fcConfig.size() == 0){

weightRandomInit(smr, softmaxConfig.NumClasses, hiddenfeatures * 3);

Hi, Eric Yuan, I want to say thanks to you. According to you “Convolutional Neural Networks II” code, I use CUDA to accelerate it. I add “distortion, rotate,scale” and finally I can get 99.72% in mnist data.

When I try to add “dropout” in my code, I find it work worstly.

Did you find the same result when you use “dropout” to overcome overfitting.

Hi, which layer/layers are you using dropout? In Geof. Hinton’s paper it says that maybe it’s not a good idea to use dropout in convolutional layers. So try to use dropout only in Fully-Connected layer.

Eric Yuan,我的电脑没有gpu的，有什么办法可以加速训练速度？稍微把核的数量加多一些，速度就慢得不行了。

那可能就需要用一些其他的linear algebra的库，我用OpenCV的原因是因为我常用而已。。它并不是最快的

Hi，Eric，我目前已经针对你的第一个版本进行了cuda加速：http://www.cnblogs.com/zhxfl/p/4134834.html；我会继续跟进你的第二个版本进行加速，使得他支持3通道模式。

鼓掌！

how do i get to visualize the filters like you have done? how did you get seperate channel filters?? arent channels merged when we do 3d convolution?

Hi nikx, my version of cnn works in a little different way compare with Y. LeCun’s cnn, when using 3-channel mode, I use 3-channel kernel to convolve with images separately, because I think that the three channels (RGB, YCrCb, or HSV) maybe not contain features in the same fashion, so maybe it’s better to make the 3 channels of kernels separate. For visualize the filters, I simply store the results into .txt file, and use Matlab “imagesc” to show. Thanks.

Thanks lot Eric, your code is helping me understand how convnets work ,till now have been into theory mostly… saw some projects bt felt this is really close to wht i wanted to start with… thanks alot. n one question

I tried running this code in vs2012 on win8.1 but is hung having ” out of memory exception” at the very fix u gav recently “Mat a = cv::Mat(height, width, CV_64FC3, Scalar(1.0, 1.0, 1.0));” . is that a issue with os dependent memory allocation? or ?

Hey nikx, what size of memory are you using? And, are you using Win32 or Win64? I’m actually pretty sure that your system is 64-bit version, but maybe you r using the 32-bit compiler? Try to use the 64-bit version of compiler in your VS. Thanks.

hey thanks i was using a 64 bit compiler but was refering to x86 lib files and dlls of opencv , also my proj was set to x86 debug mode. hence the error. worked well now thanks…

grt work

Eric，我感觉你的程序在读训练样本的时候是否存在一些误区呢？你的程序只是随机产生一个数，然后以这个数为下标，读取batchsize个样本，这样的话，很大的概率会有很多样本实际上在整个训练过程中一次都没有被读到。不知道我理解你的程序是不是有偏差？

Mat m = Mat::ones(1, 1, CV_64FC1);

randu(m, Scalar(0.0), Scalar(1.0));

m *= (src2.cols – _size – 1);

int randomNum = int(m.ATD(0, 0));

for(int i = 0; i push_back(src1[i + randomNum]);

}

Rect roi = Rect(randomNum, 0, _size, src2.rows);

src2(roi).copyTo(*dst2);

另外在DeepLearningToolBox的matlab版本代码中的做法是：把所有样本的顺序打乱，每次顺序读取被打乱的样本。

for i = 1 : opts.numepochs

disp([‘epoch ‘ num2str(i) ‘/’ num2str(opts.numepochs)]);

tic;

kk = randperm(m); //打乱样本顺序

for l = 1 : numbatches

batch_x = x(:, :, kk((l – 1) * opts.batchsize + 1 : l * opts.batchsize));//顺序读打乱的样本

batch_y = y(:, kk((l – 1) * opts.batchsize + 1 : l * opts.batchsize));

在matlab操作中比较方便，直接下标操作就可以，opencv中可能会复杂些，因为你的样本打乱了，在读取标签的时候，如果不能直接使用下标的话，那就很麻烦。

Hi, 恩这个问题我也想过，所以特意用了randu去generate uniformly-distributed random number，尽可能让大部分data有均等的被点到的概率，不过这样做的问题就在于，最开始的batchsize个样本和最后的batchsize个样本被点到的机会比较小。。所有样本打乱并且顺序读取我暂时没有想好怎么去实现。。你有什么改进的想法吗？

我的思路是：利用一个随机打乱的数组的值作为下标去读样本，不过这样做内存代价比较大。我修改你的代码如下

在train_networks.cc中： //定义一个整形向量，初始化为0-样本数

vectorrandperm;

for(int i = 0; i < x.size(); ++i)

randperm.push_back(i);

for(int epo = 1; epo <= training_epochs; epo++)

{

int numbatches = x.size()/batch_size;

//打乱数字的排序，每一次用这个数组的值作为下标去读样本

random_shuffle(randperm.begin(),randperm.end());

for(int kk = 0; kk < numbatches; ++kk)

//for(; k <= iter_per_epo * epo; k++)

…….

getSample(x, &batchX, y, &batchY, batch_size, SAMPLE_COLS, kk,randperm);

在get_sample.cc中：

for(int i = 0; i push_back(src1[tmp]);

}

Mat mtmp = Mat::zeros(1,_size,CV_64FC1);

for(int i = 0; i < _size; i++)

{

int tmp = randperm[kk*_size+i];

double label = src2.ATD(0,tmp);

mtmp.ATD(0,i) = label;

}

mtmp.copyTo(*dst2);

mtmp.release();

另外一种方法我想到的是，在读样本和标签的时候，把样本和标签读到一起（可以使用一个自定义的封装数据，或者使用pair？），这样就可以直接使用random_shuffle()函数进行打乱了。这样做需要对程序做比较大的改动了。

嗯，今天我把LRN设为true，跑了一下，发现你的这个local response normalization还没有调好，出现错误。呵呵。

sample 这部分我在github上也改了。conv-net-3和multi-layer的版本都改了, LRN那里还没有弄好:) 谢谢~

现在用的办法是，维持一个nSamples大小，也就是dataset大小的int vector，每次sample的时候random_shuffle它，然后取前batch_size个label，然后去取这些label对应的X和Y。

嗯，用这个方法能解决呢遇到的那个过拟合的问题。

我还有一个问题想问你，你程序中求delta（即残差）的公式是参考哪个的？我看得不少很明白，我根据UFLDL资料看，跟你的好像不太一样？你编程是不是做了简化？

smr.Wd2 = pow((groundTruth – p), 2.0) * pow(hidden[hidden.size() – 1].t(), 2.0);

smr.Wd2 = smr.Wd2 / nsamples + softmaxConfig.WeightDecay;//看不懂啊，哪里的公式，简化版吗？

smr.bd2 = reduce(pow((groundTruth – p), 2.0), 1, CV_REDUCE_SUM) / nsamples;

hi, 变量名后面加上d2的系列是二阶导数，是我用来确定学习率用的，有的地方确实取了近似，因为觉得每一步都去更新精确的二阶导数可能运算量会很大。新年快乐！！

另外关于过拟合，我看有的人是这么解决的，对于training dataset, 对每个图像去做一些随机的缩放，旋转，或者镜像之类的，然后把这些新的图像也加入训练集，我觉得挺靠谱的，准备试试:)

Hi Eric:

In your code of multi-layers, I found that the number (depth) of picture output in convolution layer is equal to the product of the channel number (depth) of input picture and the kernel amount of convolution layer. Therefore after all convolution layers, total feature map is kernelAmount[1]*kernelAmount[2]*….* kernelAmount[n].

Is it correct ?

I understood that the number (depth) of picture output in convolution layer is the same as the kernel amount of convolution layer. For example, input is (224×224)x64, here 64 is depth, the kernel amount of convolution layer is 128, using (5x5x64)x128 filters. For each kernel, input is 224x224x64, using 5x5x64 filters, output is 1 slice picture. 128 kernels have 128 slice picture. So the output of convolution layer is 128 slice pictures, is not 64*128 slice pictures.

Maybe two methods are possible. Which method is better ?

Thanks,

Feng

Hi Feng,

Yes you are correct, my network is slightly different with the traditional cnn, which exactly as what you described. I can’t say which is better because I’ve never tried to implement LeNet-like cnn, it is actually on my todo list but I always delaying that 🙁 my bad. However, I think one of the advantages that LeNet-like cnn outperform mine is that the dimensionality after conv layers are way lower than mine cnn, and we can use less hidden neurons in fully-connected layers, right? I’ll do this soon and let’s discuss that then. Thanks!

Eric

Hi Eric;

I wanted to ask about the GPU part. Can you tell me which types of GPU does it support for running and if there is any option for using just CPU?

Regards

Mohammad

Hi Mohammad,

The current version is pure CPU version, and you can find a CUDA (Nvidia) version which is implemented by zhxfl here: https://github.com/zhxfl/CUDA-CNN

Thanks,

Great work!

Have you thought about integrating your code into OpenCV and contributing it?

Thanks!

Gil

Good job. I am using Convolution Neural Network for gender classification of human face. Your code is very useful for me. Could you said to me which your code (Convolutional Neural Networks I,II or III) satisfies for my task? Thanks in advance

I am using a machine with 8Gb of RAM. When we reach this line of code, only 4.53Gb of RAM are being used, when this line is called i get the following error. Any ideas?

Mat tmp = concat(trainX);

OpenCV Error: Insufficient memory (Failed to allocate 1228800004 bytes) in cv::O

utOfMemoryError, file C:\builds\2_4_PackSlave-win32-vc11-shared\opencv\modules\c

ore\src\alloc.cpp, line 52

做为中国人替你骄傲,很少有中国人把程序开源，特别是cnn方面的，中国多一些你这样的人，中国的软件事业会提升一大步

Hello,

Great thanks for your code. Actually, I can compile it without problem. But when i try to run it, i just got a message “Killed”.

My opencv version is 3.0, I run on fedora 23, 32bit.

Thanks in advance

Sophea

Hi Eric,

I have trained the CNN using your code for mnist set. I have saved cvl.layer using saveWeight.

Can you please help me how to use it for training? I don’t want to run the training module again. Can it is possible to use these saved .txt files?

Regards,

Sachin

Hi Eric,

I have trained the CNN using your code for mnist set. I have saved cvl.layer using saveWeight.

Can you please help me how to use it for testing on different files? Also can you help me how it can be used for text recognition from images?

Regards,

Sachin

Great work!

I’m trying to use your CNN with non-square input images (e.g. 16×48), but it fails. I attempted to fix the code to allow this. I Started by changing the ‘convert’ function to receive both rows and cols. But it just fails further down the line and I can’t seem to find what’s wrong. Is it possible to achieve with some work, or does it require a complete overhaul?

Thanks in advance.

Hi Eric,

Is there any way to get the loss function from the SoftMax layer?

Regards,

Daniel

## 2 Trackbacks

[…] I implemented a version 3.0 recently, check it here. […]

[…] here‘s the version […]