Hey, I’m recently working on my new version of CNN, the updates are as follows:
- Support 3-channels images;
- Add Dropout;
- In conv layers, one can use either 3-channel conv kernels or single-chanel conv kernels (that is to say, whether share weights).
Now I’ve finished most of the works, and I’m debugging the code, hope I can release it in several days.
Here’s Early adopters edition, which is still buggy. I’ll post the formal version in days.
Newly updated (Aug. 28):
I apologize for the delay replies comments.
I was in Pittsburgh for a few days and was busy helping my wife to move into a new apartment, I only had about 1 hour per day for doing this…
However, I do think I’m on the right way, here are some of the kernels I got from training:
3-channels (converted into RGB)
One problem now I’m facing, the OpenCV svd function is extremely slow if the matrix is large, they used to use LAPACK svd algo. but someone edited the algo. and caused this speed problem. In my opinion, I should directly use LAPACK svd without OpenCV svd function. I’ll work on it these days.
Newly newly updated (Sept. 12):
I tested the current network on CIFAR10 using the following configs:
I updated the learning rate calculation functions, I now use second order derivatives to calculate learning rate in each learning step, it is very similar with the gradient backprop procedure, so while backproping in each layer, I also backprop something like Hessian.
I also fixed several bugs, like
I always thought that the following code will generate a 3-channel matrix which every single element inside is 1.0:
Mat a = Mat::ones(height, width, CV_64FC3);
But it actually generates a 3-channel matrix which every element in first channel is 1.0, and elements in other channels are 0.0. So for doing what I wanted, the correct code is:
Mat a = cv::Mat(height, width, CV_64FC3, Scalar(1.0, 1.0, 1.0));
Another thing is, I found my network is suffering with overfitting issue, because the lately result says (I’m using 2 conv layers, 8 kernels in 1st layer, and 20 kernels in 2nd layer), after 40,000 iterations of stochastic gradient descent training, the accuracy on training set is about 98%, but the accuracy on test dataset is only about 71%.
So my next plan is to due with this over-fitting problem. First, find a better amount of kernels and fc-layer neurons; Second, do something like input data enlarging (like Prof. Hinton’s imagenet experiment).