• Pingback: ()

  • Pingback: ()

  • Alex

    hi, if the input is color-image, how to design the filters and the whole net?

    • Eric

      Hey Alex,

      You can just combine the intensities from all the color channels for the pixels into one long vector, as if you were working with a grayscale image with 3x the number of pixels as the original image.

      I believe there’re other ways that deal with multi-channel images, but I have not tried before.

      🙂

  • Bob

    Hi Eric,

    Your wrote: vector layer. It’s wrong? the vector has no argument. Please check it out. Thanks.

    • Eric

      Yeah you’re right, it means different kernels in conv layer, better to call it “kernel”, thanks.

  • zhenghx

    Thanks for your free code, it is very clear but I way puzzled by some code.

    in the function
    read_Mnist(string filename, vector &vec)
    the code is
    tpmat.at(r, c) = (int) temp;
    why you change “temp” for (uchar) to (int), I think “temp” will over write “tpmat” because “tpmat” is define “CV_8UC1”
    When I use your code, the tranning time are days(2 days) until now and I just get 0.30 accuracy.
    How long did you train to get 0.9828 accuracy

    • Eric

      Hi,
      Thanks for mentioning.
      First, the mnist dataset guarantees each of its element to be (0, 255), so it’s ok to use int, btw, it’s just because I’m accustomed to use “int” in CV_8UCx format things :p
      Second, It indeed used me days for training on this net, you can also try the following net instead:
      https://github.com/xingdi-eric-yuan/multi-layer-convnet
      If you have further problem with this, let me know, and I’ll check if this version I put on github is buggy.
      Thanks.

  • Daniel

    Hi Eric,

    First of all, I’d like to thank you for your codes and explanation. It’s been great help for me to map theory and practice.

    I have a similar symptom as zhenghx. First, I had nothing changed (Num HiddenNeaurons = 200) and it gave me 0.110 accuracy within 6 hours. I then tried Num HiddenNeaurons = 500 which ran for about 2 days – yielded

    learning step: 199998, Cost function value = 0.123866, randomNum = 47623
    learning step: 199999, Cost function value = 0.106693, randomNum = 9355
    correct: 3969, total: 10000, accuracy: 0.3969
    Totally used time: 199141 second

    Could you let me know what setting you had for 0.98 accuracy? I got this code from the github last weekend.

    Thank you very much!

    • Eric

      Hey Daniel,

      First, try to use Max Pooling by:
      int Pooling_Methed = POOL_MAX;
      It was months ago that I got that 0.98 accuracy result, I’m not sure how many kernels I used but I’m sure it was Max Pooling. (Sorry)

      Second, by debugging my newest version of CNN, I found some bugs that also exist in my old versions (including this one), so when the version which I’m working on is bug-free, I’ll also edit these versions of code.

      Thank you.

  • Peter

    yeah, it is very useful post for me, i want to ask about stochastic pooling,
    I dont understand your explain why just randomly choose number from 0 to 8
    more details please
    thanks!

    • Eric

      Hi Peter,

      In that specific case, we have totally 9 elements, so what I mean was just to randomly choose one element in the 3*3 matrix, say, 4, and find out which interval that the 4th largest probability element falls in, and the result should be the value of this interval. The result is not chosen by any restriction like largest value, largest probability, or something like it, this is why this method is called “stochastic” pooling; however, by using this method, the larger probability elements do have more chance to be chosen. Do I explain it clearer this time?

      Thanks,
      Eric

      • Ray L

        Actually I do not think you choose elements according to their probabilities with your method. You choose a random “rank” of the elements, which is just equivalent to choosing a random element in this pool.

  • 赵元兴

    您好,又来向您请教了,我实现了ufldl的结构,在mnist上训练得到98%的识别率,可是可视化第一层的滤波器却并没有得到类似gabor的模板,请问可能是哪里的原因呢? 谢谢

    • Eric

      你训练出的kernel长什么样?如果是mnist的话,不像gabor很正常。因为gabor是用自然图片训练出来的,mnist训练完的kernel长得像是笔划的样子。

      • 赵元兴

        http://photo.163.com/junhun-2008/#m=2&aid=195835179&pid=9137011744 这是我的结果,貌似连笔画也不像。。 我这里是9*9的20个卷积核,规范化到0~255并放大5倍后得到的结果,

        • Eric

          感觉你训练出这些kernel还是挺make sense的,能看出类似morlet wavelet的样子,黑白,或者黑白黑,白黑白这样的,而且位置和方向都不一样,对应了在不同位置和方向的edge detector和line detector。可能有的参数还需要调整一下,比如核的个数啊,或者regularization的地方。

          • 赵元兴

            多谢大牛,那我再调调参数,由于是自己写的程序,所以总担心是不是哪里写错了,哪里精度不够之类的问题,另外看你公布的卷积核还是很漂亮的~

  • 何浪

    我现在有50个训练样本,每个样本有1534维,标签是50*1。还有50个测试样本,也为1534维,标签是50*1.如何使用卷积神经网络呢?谢谢。

    • Eric

      你好何浪,根据我的经验,你的训练样本数量太少了,相对于1534维的features,对这个数据进行卷及神经网络的话必然会overfitting。建议扩充样本个数(我觉得至少几千几万个),或者对样本进行PCA,降低维度。不知道你的数据是什么数据,如果不是图像的话,用卷积网络可能效果并不一定太好。

  • 何浪

    谢谢你的耐心解答,我的这50个是样本数,1534维是从中提出的特征的维数,我是想用卷积神经网络做回归呢,然后我用真实的值和预测值进行比较,想得出均方根误差。但是我看过邓力写的一篇语音识别的文章,但是没有看懂里面的具体细节,所以现在正在困惑中。

    • Eric

      邓力的文章我没看过,所以不太清楚。我不太了解对于语音数据进行回归所用的具体方法,但是按我的理解可能用我这里的卷积网络并不一定理想,因为鉴于二维卷积的性质,我们训练出的核是对于二维特征的一种描述子(both x and y orientation)。可能对于音频,可以试试一维的卷积网络,这种网络和recurrent neural network有什么关系吗?这方面我不太熟悉,你可以去了解一下。另外我还是觉得50个样本太少了。。

  • 何浪

    或者您给我说一下,我的这个数据如何放在卷积神经网络中进行训练和测试,要改哪些参数?谢谢啦。

  • 何浪

    一维卷积神经网络的代码,或者你给我说下思路,谢谢啦。

    • Eric

      具体我也没实现过,我想大概就是把二维的卷积换成一维而已,就类似正常的卷积,m长度的信号和n长度和核卷积之后得到长度是m-n+1长度的输出,前向传播比较好理解,反向传播的公式可能需要思考一下或者查查资料。卷积层之后的non-linearity层应该还是需要的。之后就和二维的没什么区别了,还是全连接层以及输出。

  • John Bell

    Hello Eric,

    I want to get to know CNN as best i can. Looking through the code, it is not yet implemented with the GPU. I am thinking of making a branch in git and contributing. Do you think that it would work? Are changes needed first?

    Regards,

    Daniel

    • Eric

      Hi Daniel,

      Here’s a CUDA version of my code which is implemented by zhxfl, you may want to check it, thanks.
      https://github.com/zhxfl/CUDA-CNN

      • Daniel

        Awesome. I’ll check it out and maybe bring in the OpenCL methods.

        Thank you

  • 史剑

    你好,首先谢谢您的代码。
    我是最近一个月才开始接触CNN的,我下载了您的代码,在vs2013运行opencv已经设置好,发现打不开MNIST库,请问是为什么呀?
    另外我自己写的一个CNN是输入层+两个卷积层加池化(平均)+全连结+softmax,输入图片是28*28,有140张的时候误差可以很快收敛,700张的时候就收敛不了,图片是用于文字识别的,所以区别都挺大的。请问原因有可能有哪些呢?当然我最长的训练时间也就用了两天。
    还有一个问题想咨询您,就是如果在池化层后用了激活函数,尤其是sigmoid函数后反向训练时梯度会消失呀,小样本都无法收敛,请问这个正常么?还有就是最后一层用softmax效果还不如普通的输出层效果好。
    谢谢了。

    • Eric

      1. 打不开MNIST有可能是因为我代码里用的是unix系统,可能需要把读取文件的地方改成win里面文件夹的形式
      2. 你的卷积核多大?全连接层多少隐藏节点?还有就是我感觉mean pooling可能导致了不收敛,我觉得max pooling比mean pooling好用很多。
      3. 还是觉得是mean pooling的问题。另外,貌似在小的dataset上,softmax没有太大优势,不过可能data复杂了,并且数量多了之后(class也多了之后),softmax就展现出优势了。

  • Santuk

    Hi Eric,

    I tried to run the code, with some changes
    (all changes were made to compile the original code. Lists are here :
    Line 37 : changed POOL_MAX to POOL_STOCHASTIC
    Line 117 and etc : cast sqrt()’s and pow()’s parameters to (double), therefore not causing ‘ambiguous call to overloaded function’ error )

    But whenever I call something like trainX[0].cols while debugging, ‘Debut Assertion Failed!’ Msgbox comes out, which points that microsoft visual studio 10.0vcincludevector ‘s vector subscript is out of range.

    Is this a sort of ‘ambiguous function call’? Can you help me to run your code?
    Thanks.

    • Eric

      Hi Santuk,

      I’m not very familiar with MSVS, I don’t know if there’s any MS’s version of vector, so maybe use std::vector whenever we need vector?
      Or maybe you’re using any other dataset which has different image size with MNIST?

  • Santuk

    Hello, Eric!

    I’m just trying to run your code (thereby learning CNN), but whenever I run this, I get error :
    debug assertion failed : from vcincludevector.h, whenever debugger meets trainX[0].cols or textX[0].cols or so on.

    Can you help me running this code?
    Thanks!

  • Vidya

    Hi Eric,

    Could you let me know how to change the output layer to make CNN to space displacement neural network (SDNN) ?

    Thanks

  • daijie

    oh,i am so sorry.
    我没有看懂为啥随机生产的数据选取了1.4,从概率方面来说非常不合理。
    请解释一下,谢谢。如果你不能看懂中文,可以用百度翻译。谢谢

    • Eric

      你好,在stochastic pooling那里,我说选取了1.4只是随便举一个例子,具体选取哪个,当然还要看random number generator的结果了:)

  • suzy

    Hi, Eric
    First I wanna thank you for this tutorial, it really helped me a lot.
    Do you have this code in MATLAB by any chance? if you do, will you please send it to my email?
    I cant understand the mechanism very well with the c++ code that you provided.
    Thank you in advance.

  • Ari

    Hi Eric,

    Thank you so much for this wonderful job. I really appreciate that you’ve posted your codes here. I wonder if you have the code in Python as well as I’m using opencv python.

    Thank you,
    Ari

  • Ganesh S

    Hi Eric,
    Thank you for sharing your knowledge :). In your code, the trainY & testY are 1 by n mats. I mean to say the labels were 1,2,3… etc. Who to train your network if my label vector is say [1,0,0] , [0,1,0] and [0,0,1] for a 3 class problem. I tried to see where you were converting the integer labels to vector binary vector form but not able to find it. Could you please let me know how to do that? Thanks in advance.

  • Nitin

    Mr. Eric

    I cannot understand how do I create my own test and training data, say if I want to train a classifier for face/text detection.

    Even in this post :: http://yann.lecun.com/exdb/mnist/
    I can’t figure out how to make the dataset in ubyte format.

  • Narmada

    POOL_STOCHASTIC is undeclared identifier error message,is it included in any header file

  • Ahmad

    Hi eric,
    My code gets stuck in learning rate function. it just does not proceed. what can be the reason.? Thanks in Anticipation.

  • xijin luo

    Hi Eric,
    When I input a noise image, say, a alphabet or chinese character to trained CNN, the output still get a digit with high confidence. Is there any way to avoid this?