A Simple Deep Network

During this spring break, I worked on building a simple deep network, which has two parts, sparse autoencoder and softmax regression. The method is exactly the same as the “Building Deep Networks for Classification” part in UFLDL tutorial. For better understanding it, I re-implemented it using C++ and OpenCV. 

GENERAL OUTLINE

  1. Read dataset (including training data and testing data) into cv::Mat.
  2. Pre-processing data (size normalization, random order, zero mean etc.), this is for accelerate the training speed.
  3. Implement function which calculating sparse autoencoder cost and gradients.
  4. Implement function which calculating softmax regression cost and gradients.
  5. Implement function which calculating the whole network’s cost and gradients.
  6. Using gradient checking method to check whether the above functions work correctly.
  7. Train sparse autoencoder layer by layer (for example, say we want 3 sparse autoencoder layers. First we train 1st layer using training data as both input and output, after that, we get the hidden layer activation using the trained weights and biases. The activation of first layer sparse autoencoder is as both input and output of 2nd layer sparse autoencoder. And similarly, the activation of 2nd layer is as both input and output of 3rd layer, this is why I said train this part layer by layer.)
  8. Train softmax regression. In this part, the input is the last layer autoencoder’s activation, and output is the Y part of training dataset.
  9. Fine-Tune the whole network using back propagation method.
  10. Now we got the trained network, we can test it.

Here are some figures stolen from UFLDL tutorial, they help clarify the training part.

 

Say the dataset have 6 features in each sample, and we are training a network which has 2 layers of autoencoder, the first layer has 4 neurons, the second layer has 3 neurons. The dataset’s Y can be one of three values.

So first we train autoencoder layer 1.

1

Then train autoencoder layer 2.

2

After trained all layers of autoencoder, train softmax regression.

3

The last step is Fine-Tune the whole network.

4

SOURCE CODE

// MnistClassify.cpp
//
// Author: Eric Yuan
// Blog: http://eric-yuan.me
// You are FREE to use the following code for ANY purpose.
//
// A deep net hand writing classifier.
// Using sparse autoencoder and softmax regression.
// First train sparse autoencoder layer by layer,
// then train softmax regression, 
// and fine-tune the whole network.
//
// To run this code, you should have OpenCV in your computer.
// Have fun with it

#include "opencv2/core/core.hpp"
#include "opencv2/imgproc/imgproc.hpp"
#include "opencv2/highgui/highgui.hpp"
#include <math.h>
#include <fstream>
#include <iostream>

using namespace cv;
using namespace std;

#define IS_TEST 0
#define IS_TEST_SA 0
#define IS_TEST_SMR 0
#define IS_TEST_FT 0

#define ATD at<double>
#define elif else if

int SparseAutoencoderLayers = 2;
int nclasses = 10;
int batch;

typedef struct SparseAutoencoder{
    Mat W1;
    Mat W2;
    Mat b1;
    Mat b2;
    Mat W1grad;
    Mat W2grad;
    Mat b1grad;
    Mat b2grad;
    double cost;
}SA;

typedef struct SparseAutoencoderActivation{
    Mat aInput;
    Mat aHidden;
    Mat aOutput;
}SAA;

typedef struct SoftmaxRegession{
    Mat Weight;
    Mat Wgrad;
    double cost;
}SMR;

Mat 
concatenateMat(vector<Mat> &vec){

    int height = vec[0].rows;
    int width = vec[0].cols;
    Mat res = Mat::zeros(height * width, vec.size(), CV_64FC1);
    for(int i=0; i<vec.size(); i++){
        Mat img(height, width, CV_64FC1);

        vec[i].convertTo(img, CV_64FC1);
        // reshape(int cn, int rows=0), cn is num of channels.
        Mat ptmat = img.reshape(0, height * width);
        Rect roi = cv::Rect(i, 0, ptmat.cols, ptmat.rows);
        Mat subView = res(roi);
        ptmat.copyTo(subView);
    }
    divide(res, 255.0, res);
    return res;
}

int 
ReverseInt (int i){
    unsigned char ch1, ch2, ch3, ch4;
    ch1 = i & 255;
    ch2 = (i >> 8) & 255;
    ch3 = (i >> 16) & 255;
    ch4 = (i >> 24) & 255;
    return((int) ch1 << 24) + ((int)ch2 << 16) + ((int)ch3 << 8) + ch4;
}

void 
read_Mnist(string filename, vector<Mat> &vec){
    ifstream file(filename, ios::binary);
    if (file.is_open()){
        int magic_number = 0;
        int number_of_images = 0;
        int n_rows = 0;
        int n_cols = 0;
        file.read((char*) &magic_number, sizeof(magic_number));
        magic_number = ReverseInt(magic_number);
        file.read((char*) &number_of_images,sizeof(number_of_images));
        number_of_images = ReverseInt(number_of_images);
        file.read((char*) &n_rows, sizeof(n_rows));
        n_rows = ReverseInt(n_rows);
        file.read((char*) &n_cols, sizeof(n_cols));
        n_cols = ReverseInt(n_cols);
        for(int i = 0; i < number_of_images; ++i){
            Mat tpmat = Mat::zeros(n_rows, n_cols, CV_8UC1);
            for(int r = 0; r < n_rows; ++r){
                for(int c = 0; c < n_cols; ++c){
                    unsigned char temp = 0;
                    file.read((char*) &temp, sizeof(temp));
                    tpmat.at<uchar>(r, c) = (int) temp;
                }
            }
            vec.push_back(tpmat);
        }
    }
}

void 
read_Mnist_Label(string filename, Mat &mat)
{
    ifstream file(filename, ios::binary);
    if (file.is_open()){
        int magic_number = 0;
        int number_of_images = 0;
        int n_rows = 0;
        int n_cols = 0;
        file.read((char*) &magic_number, sizeof(magic_number));
        magic_number = ReverseInt(magic_number);
        file.read((char*) &number_of_images,sizeof(number_of_images));
        number_of_images = ReverseInt(number_of_images);
        for(int i = 0; i < number_of_images; ++i){
            unsigned char temp = 0;
            file.read((char*) &temp, sizeof(temp));
            mat.ATD(0, i) = (double)temp;
        }
    }
}

Mat 
sigmoid(Mat &M){
    Mat temp;
    exp(-M, temp);
    return 1.0 / (temp + 1.0);
}

Mat 
dsigmoid(Mat &a){
    Mat res = 1.0 - a;
    res = res.mul(a);
    return res;
}

void
weightRandomInit(SA &sa, int inputsize, int hiddensize, int nsamples, double epsilon){

    double *pData;
    sa.W1 = Mat::ones(hiddensize, inputsize, CV_64FC1);
    for(int i=0; i<hiddensize; i++){
        pData = sa.W1.ptr<double>(i);
        for(int j=0; j<inputsize; j++){
            pData[j] = randu<double>();
        }
    }
    sa.W1 = sa.W1 * (2 * epsilon) - epsilon;

    sa.W2 = Mat::ones(inputsize, hiddensize, CV_64FC1);
    for(int i=0; i<inputsize; i++){
        pData = sa.W2.ptr<double>(i);
        for(int j=0; j<hiddensize; j++){
            pData[j] = randu<double>();
        }
    }
    sa.W2 = sa.W2 * (2 * epsilon) - epsilon;

    sa.b1 = Mat::ones(hiddensize, 1, CV_64FC1);
    for(int j=0; j<hiddensize; j++){
        sa.b1.ATD(j, 0) = randu<double>();
    }
    sa.b1 = sa.b1 * (2 * epsilon) - epsilon;

    sa.b2 = Mat::ones(inputsize, 1, CV_64FC1);
    for(int j=0; j<inputsize; j++){
        sa.b2.ATD(j, 0) = randu<double>();
    }
    sa.b2 = sa.b2 * (2 * epsilon) - epsilon;

    sa.W1grad = Mat::zeros(hiddensize, inputsize, CV_64FC1);
    sa.W2grad = Mat::zeros(inputsize, hiddensize, CV_64FC1);
    sa.b1grad = Mat::zeros(hiddensize, 1, CV_64FC1);
    sa.b2grad = Mat::zeros(inputsize, 1, CV_64FC1);
    sa.cost = 0.0;
}

void 
weightRandomInit(SMR &smr, int nclasses, int nfeatures, double epsilon){

    smr.Weight = Mat::ones(nclasses, nfeatures, CV_64FC1);
    double *pData; 
    for(int i = 0; i<smr.Weight.rows; i++){
        pData = smr.Weight.ptr<double>(i);
        for(int j=0; j<smr.Weight.cols; j++){
            pData[j] = randu<double>();        
        }
    }
    smr.Weight = smr.Weight * (2 * epsilon) - epsilon;
    smr.cost = 0.0;
    smr.Wgrad = Mat::zeros(nclasses, nfeatures, CV_64FC1);
}

SAA
getSparseAutoencoderActivation(SA &sa, Mat &data){
    SAA acti;
    data.copyTo(acti.aInput);
    acti.aHidden = sa.W1 * acti.aInput + repeat(sa.b1, 1, data.cols);
    acti.aHidden = sigmoid(acti.aHidden);
    acti.aOutput = sa.W2 * acti.aHidden + repeat(sa.b2, 1, data.cols);
    acti.aOutput = sigmoid(acti.aOutput);
    return acti;
}

void
sparseAutoencoderCost(SA &sa, Mat &data, double lambda, double sparsityParam, double beta){

    int nfeatures = data.rows;
    int nsamples = data.cols;
    SAA acti = getSparseAutoencoderActivation(sa, data);

    Mat errtp = acti.aOutput - data;
    pow(errtp, 2.0, errtp);
    errtp /= 2.0;
    double err = sum(errtp)[0] / nsamples;
    // now calculate pj which is the average activation of hidden units
    Mat pj;
    reduce(acti.aHidden, pj, 1, CV_REDUCE_SUM);
    pj /= nsamples;
    // the second part is weight decay part
    double err2 = sum(sa.W1)[0] + sum(sa.W2)[0];
    err2 *= (lambda / 2.0);
    // the third part of overall cost function is the sparsity part
    Mat err3;
    Mat temp;
    temp = sparsityParam / pj;
    log(temp, temp);
    temp *= sparsityParam;
    temp.copyTo(err3);
    temp = (1 - sparsityParam) / (1 - pj);
    log(temp, temp);
    temp *= (1 - sparsityParam);
    err3 += temp;
    sa.cost = err + err2 + sum(err3)[0] * beta;

    // following are for calculating the grad of weights.
    Mat delta3 = -(data - acti.aOutput);
    delta3 = delta3.mul(dsigmoid(acti.aOutput));
    Mat temp2 = -sparsityParam / pj + (1 - sparsityParam) / (1 - pj);
    temp2 *= beta;
    Mat delta2 = sa.W2.t() * delta3 + repeat(temp2, 1, nsamples);
    delta2 = delta2.mul(dsigmoid(acti.aHidden));
    Mat nablaW1 = delta2 * acti.aInput.t();
    Mat nablaW2 = delta3 * acti.aHidden.t();
    Mat nablab1, nablab2; 
    delta3.copyTo(nablab2);
    delta2.copyTo(nablab1);
    sa.W1grad = nablaW1 / nsamples + lambda * sa.W1;
    sa.W2grad = nablaW2 / nsamples + lambda * sa.W2;
    reduce(nablab1, sa.b1grad, 1, CV_REDUCE_SUM);
    reduce(nablab2, sa.b2grad, 1, CV_REDUCE_SUM);
    sa.b1grad /= nsamples;
    sa.b2grad /= nsamples;
}

void 
softmaxRegressionCost(Mat &x, Mat &y, SMR &smr, double lambda){

    int nsamples = x.cols;
    int nfeatures = x.rows;
    //calculate cost function
    Mat theta(smr.Weight);
    Mat M = theta * x;
    Mat temp, temp2;
    temp = Mat::ones(1, M.cols, CV_64FC1);
    reduce(M, temp, 0, CV_REDUCE_SUM);
    temp2 = repeat(temp, nclasses, 1);
    M -= temp2;
    exp(M, M);
    temp = Mat::ones(1, M.cols, CV_64FC1);
    reduce(M, temp, 0, CV_REDUCE_SUM);
    temp2 = repeat(temp, nclasses, 1);
    divide(M, temp2, M); 
    Mat groundTruth = Mat::zeros(nclasses, nsamples, CV_64FC1);
    for(int i=0; i<nsamples; i++){
        groundTruth.ATD(y.ATD(0, i), i) = 1.0;
    }
    Mat logM;
    log(M, logM);
    temp = groundTruth.mul(logM);
    smr.cost = - sum(temp)[0] / nsamples;
    Mat theta2;
    pow(theta, 2.0, theta2);
    smr.cost += sum(theta2)[0] * lambda / 2;
    //calculate gradient
    temp = groundTruth - M;   
    temp = temp * x.t();
    smr.Wgrad = - temp / nsamples;
    smr.Wgrad += lambda * theta;
}

void
fineTuneNetworkCost(Mat &x, Mat &y, vector<SA> &hLayers, SMR &smr, double lambda){

    int nfeatures = x.rows;
    int nsamples = x.cols;
    vector<Mat> acti;

    acti.push_back(x);
    for(int i=1; i<=SparseAutoencoderLayers; i++){
        Mat tmpacti = hLayers[i - 1].W1 * acti[i - 1] + repeat(hLayers[i - 1].b1, 1, x.cols);
        acti.push_back(sigmoid(tmpacti));
    }
    Mat M = smr.Weight * acti[acti.size() - 1];
    Mat tmp;
    reduce(M, tmp, 0, CV_REDUCE_MAX);
    M = M + repeat(tmp, M.rows, 1);
    Mat p;
    exp(M, p);
    reduce(p, tmp, 0, CV_REDUCE_SUM);
    divide(p, repeat(tmp, p.rows, 1), p);

    Mat groundTruth = Mat::zeros(nclasses, nsamples, CV_64FC1);
    for(int i=0; i<nsamples; i++){
        groundTruth.ATD(y.ATD(0, i), i) = 1.0;
    }
    Mat logP;
    log(p, logP);
    logP = logP.mul(groundTruth);
    smr.cost = - sum(logP)[0] / nsamples;
    pow(smr.Weight, 2.0, tmp);
    smr.cost += sum(tmp)[0] * lambda / 2;

    tmp = (groundTruth - p) * acti[acti.size() - 1].t();
    tmp /= -nsamples;
    smr.Wgrad = tmp + lambda * smr.Weight;

    vector<Mat> delta(acti.size());
    delta[delta.size() -1] = -smr.Weight.t() * (groundTruth - p);
    delta[delta.size() -1] = delta[delta.size() -1].mul(dsigmoid(acti[acti.size() - 1]));
    for(int i = delta.size() - 2; i >= 0; i--){
        delta[i] = hLayers[i].W1.t() * delta[i + 1];
        delta[i] = delta[i].mul(dsigmoid(acti[i]));
    }
    for(int i=SparseAutoencoderLayers - 1; i >=0; i--){
        hLayers[i].W1grad = delta[i + 1] * acti[i].t();
        hLayers[i].W1grad /= nsamples;
        reduce(delta[i + 1], tmp, 1, CV_REDUCE_SUM);
        hLayers[i].b1grad = tmp / nsamples;
    }
    acti.clear();
    delta.clear();
}

void
gradientChecking(SA &sa, Mat &data, double lambda, double sparsityParam, double beta){

    //Gradient Checking (remember to disable this part after you're sure the 
    //cost function and dJ function are correct)
    sparseAutoencoderCost(sa, data, lambda, sparsityParam, beta);
    Mat w1g(sa.W1grad);
    cout<<"test sparse autoencoder !!!!"<<endl;
    double epsilon = 1e-4;
    for(int i=0; i<sa.W1.rows; i++){
        for(int j=0; j<sa.W1.cols; j++){
            double memo = sa.W1.ATD(i, j);
            sa.W1.ATD(i, j) = memo + epsilon;
            sparseAutoencoderCost(sa, data, lambda, sparsityParam, beta);
            double value1 = sa.cost;
            sa.W1.ATD(i, j) = memo - epsilon;
            sparseAutoencoderCost(sa, data, lambda, sparsityParam, beta);
            double value2 = sa.cost;
            double tp = (value1 - value2) / (2 * epsilon);
            cout<<i<<", "<<j<<", "<<tp<<", "<<w1g.ATD(i, j)<<", "<<w1g.ATD(i, j) / tp<<endl;
            sa.W1.ATD(i, j) = memo;
        }
    }
}

void
gradientChecking(SMR &smr, Mat &x, Mat &y, double lambda){

    //Gradient Checking (remember to disable this part after you're sure the 
    //cost function and dJ function are correct)
    softmaxRegressionCost(x, y, smr, lambda);
    Mat grad(smr.Wgrad);
    cout<<"test softmax regression !!!!"<<endl;
    double epsilon = 1e-4;
    for(int i=0; i<smr.Weight.rows; i++){
        for(int j=0; j<smr.Weight.cols; j++){
            double memo = smr.Weight.ATD(i, j);
            smr.Weight.ATD(i, j) = memo + epsilon;
            softmaxRegressionCost(x, y, smr, lambda);
            double value1 = smr.cost;
            smr.Weight.ATD(i, j) = memo - epsilon;
            softmaxRegressionCost(x, y, smr, lambda);
            double value2 = smr.cost;
            double tp = (value1 - value2) / (2 * epsilon);
            cout<<i<<", "<<j<<", "<<tp<<", "<<grad.ATD(i, j)<<", "<<grad.ATD(i, j) / tp<<endl;
            smr.Weight.ATD(i, j) = memo;
        }
    }
}

void
gradientChecking(vector<SA> &hLayers, SMR &smr, Mat &x, Mat &y, double lambda){

    //Gradient Checking (remember to disable this part after you're sure the 
    //cost function and dJ function are correct)
    fineTuneNetworkCost(x, y, hLayers, smr, lambda);
    Mat grad(hLayers[0].W1grad);
    cout<<"test fine-tune network !!!!"<<endl;
    double epsilon = 1e-4;
    for(int i=0; i<hLayers[0].W1.rows; i++){
        for(int j=0; j<hLayers[0].W1.cols; j++){
            double memo = hLayers[0].W1.ATD(i, j);
            hLayers[0].W1.ATD(i, j) = memo + epsilon;
            fineTuneNetworkCost(x, y, hLayers, smr, lambda);
            double value1 = smr.cost;
            hLayers[0].W1.ATD(i, j) = memo - epsilon;
            fineTuneNetworkCost(x, y, hLayers, smr, lambda);
            double value2 = smr.cost;
            double tp = (value1 - value2) / (2 * epsilon);
            cout<<i<<", "<<j<<", "<<tp<<", "<<grad.ATD(i, j)<<", "<<grad.ATD(i, j) / tp<<endl;
            hLayers[0].W1.ATD(i, j) = memo;
        }
    }
}

void
trainSparseAutoencoder(SA &sa, Mat &data, int hiddenSize, double lambda, double sparsityParam, double beta, double lrate, int MaxIter){

    int nfeatures = data.rows;
    int nsamples = data.cols;
    weightRandomInit(sa, nfeatures, hiddenSize, nsamples, 0.12);
    if (IS_TEST_SA){
        gradientChecking(sa, data, lambda, sparsityParam, beta);
    }else{
        int converge = 0;
        double lastcost = 0.0;
        cout<<"Sparse Autoencoder Learning: "<<endl;
        while(converge < MaxIter){

            int randomNum = rand() % (data.cols - batch);
            Rect roi = Rect(randomNum, 0, batch, data.rows);
            Mat batchX = data(roi);

            sparseAutoencoderCost(sa, batchX, lambda, sparsityParam, beta);
            cout<<"learning step: "<<converge<<", Cost function value = "<<sa.cost<<", randomNum = "<<randomNum<<endl;
            if(fabs((sa.cost - lastcost) ) <= 5e-5 && converge > 0) break;
            if(sa.cost <= 0.0) break;
            lastcost = sa.cost;
            sa.W1 -= lrate * sa.W1grad;
            sa.W2 -= lrate * sa.W2grad;
            sa.b1 -= lrate * sa.b1grad;
            sa.b2 -= lrate * sa.b2grad;
            ++ converge;
        }
    }
}

void 
trainSoftmaxRegression(SMR& smr, Mat &x, Mat &y, double lambda, double lrate, int MaxIter){
    int nfeatures = x.rows;
    int nsamples = x.cols;
    weightRandomInit(smr, nclasses, nfeatures, 0.12);
    if (IS_TEST_SMR){
        gradientChecking(smr, x, y, lambda);
    }else{
        int converge = 0;
        double lastcost = 0.0;
        cout<<"Softmax Regression Learning: "<<endl;
        while(converge < MaxIter){

            int randomNum = rand() % (x.cols - batch);
            Rect roi = Rect(randomNum, 0, batch, x.rows);
            Mat batchX = x(roi);
            roi = Rect(randomNum, 0, batch, y.rows);
            Mat batchY = y(roi);

            softmaxRegressionCost(batchX, batchY, smr, lambda);
            cout<<"learning step: "<<converge<<", Cost function value = "<<smr.cost<<", randomNum = "<<randomNum<<endl;
            if(fabs((smr.cost - lastcost) ) <= 1e-6 && converge > 0) break;
            if(smr.cost <= 0) break;
            lastcost = smr.cost;
            smr.Weight -= lrate * smr.Wgrad;
            ++ converge;
        }
    }
}

void
trainFineTuneNetwork(Mat &x, Mat &y, vector<SA> &HiddenLayers, SMR &smr, double lambda, double lrate, int MaxIter){

    if (IS_TEST_FT){
        gradientChecking(HiddenLayers, smr, x, y, lambda);
    }else{
        int converge = 0;
        double lastcost = 0.0;
        cout<<"Fine-Tune network Learning: "<<endl;
        while(converge < MaxIter){

            int randomNum = rand() % (x.cols - batch);
            Rect roi = Rect(randomNum, 0, batch, x.rows);
            Mat batchX = x(roi);
            roi = Rect(randomNum, 0, batch, y.rows);
            Mat batchY = y(roi);

            fineTuneNetworkCost(batchX, batchY, HiddenLayers, smr, lambda);
            cout<<"learning step: "<<converge<<", Cost function value = "<<smr.cost<<", randomNum = "<<randomNum<<endl;
            if(fabs((smr.cost - lastcost) / smr.cost) <= 1e-6 && converge > 0) break;
            if(smr.cost <= 0) break;
            lastcost = smr.cost;
            smr.Weight -= lrate * smr.Wgrad;
            for(int i=0; i<HiddenLayers.size(); i++){
                HiddenLayers[i].W1 -= lrate * HiddenLayers[i].W1grad;
                HiddenLayers[i].W2 -= lrate * HiddenLayers[i].W2grad;
                HiddenLayers[i].b1 -= lrate * HiddenLayers[i].b1grad;
                HiddenLayers[i].b2 -= lrate * HiddenLayers[i].b2grad;
            }
            ++ converge;
        }
    }
}

Mat 
resultProdict(Mat &x, vector<SA> &hLayers, SMR &smr){

    vector<Mat> acti;
    acti.push_back(x);
    for(int i=1; i<=SparseAutoencoderLayers; i++){
        Mat tmpacti = hLayers[i - 1].W1 * acti[i - 1] + repeat(hLayers[i - 1].b1, 1, x.cols);
        acti.push_back(sigmoid(tmpacti));
    }
    Mat M = smr.Weight * acti[acti.size() - 1];
    Mat tmp;
    reduce(M, tmp, 0, CV_REDUCE_MAX);
    M = M + repeat(tmp, M.rows, 1);
    Mat p;
    exp(M, p);
    reduce(p, tmp, 0, CV_REDUCE_SUM);
    divide(p, repeat(tmp, p.rows, 1), p);
    log(p, tmp);
    //cout<<tmp.t()<<endl;
    Mat result = Mat::ones(1, tmp.cols, CV_64FC1);
    for(int i=0; i<tmp.cols; i++){
        double maxele = tmp.ATD(0, i);
        int which = 0;
        for(int j=1; j<tmp.rows; j++){
            if(tmp.ATD(j, i) > maxele){
                maxele = tmp.ATD(j, i);
                which = j;
            }
        }
        result.ATD(0, i) = which;
    }
    acti.clear();
    return result;
}

void
readData(Mat &x, Mat &y, string xpath, string ypath, int number_of_images){

    //read MNIST iamge into OpenCV Mat vector
    int image_size = 28 * 28;
    vector<Mat> vec;
    //vec.resize(number_of_images, cv::Mat(28, 28, CV_8UC1));
    read_Mnist(xpath, vec);
    //read MNIST label into double vector
    y = Mat::zeros(1, number_of_images, CV_64FC1);
    read_Mnist_Label(ypath, y);
    x = concatenateMat(vec);
}

int 
main(int argc, char** argv)
{

    long start, end;
    start = clock();

    Mat trainX, trainY;
    Mat testX, testY;
    readData(trainX, trainY, "mnist/train-images-idx3-ubyte", "mnist/train-labels-idx1-ubyte", 60000);
    readData(testX, testY, "mnist/t10k-images-idx3-ubyte", "mnist/t10k-labels-idx1-ubyte", 10000);

    // Just for testing the algorithm, you can enable the following lines, 
    // It just use the first 500 training samples, for accelerate the calculation.
    // However, mini training sample size leads to lower test accuracy.
    // Rect roi = cv::Rect(0, 0, 500, trainX.rows);
    // trainX = trainX(roi);
    // roi = cv::Rect(0, 0, 500, trainY.rows);
    // trainY = trainY(roi);

    cout<<"Read trainX successfully, including "<<trainX.rows<<" features and "<<trainX.cols<<" samples."<<endl;
    cout<<"Read trainY successfully, including "<<trainY.cols<<" samples"<<endl;
    cout<<"Read testX successfully, including "<<testX.rows<<" features and "<<testX.cols<<" samples."<<endl;
    cout<<"Read testY successfully, including "<<testY.cols<<" samples"<<endl;
    batch = trainX.cols / 100;
    // Finished reading data

    // pre-processing data. 
    // For some dataset, you may like to pre-processing the data,
    // however, in MNIST dataset, it actually already pre-processed. 
    // Scalar mean, stddev;
    // meanStdDev(trainX, mean, stddev);
    // Mat normX = trainX - mean[0];
    // normX.copyTo(trainX);

    vector<SA> HiddenLayers;
    vector<Mat> Activations;
    for(int i=0; i<SparseAutoencoderLayers; i++){
        Mat tempX;
        if(i == 0) trainX.copyTo(tempX); else Activations[Activations.size() - 1].copyTo(tempX);
        SA tmpsa;
        trainSparseAutoencoder(tmpsa, tempX, 600, 3e-3, 0.1, 3, 2e-2, 80000);
        Mat tmpacti = tmpsa.W1 * tempX + repeat(tmpsa.b1, 1, tempX.cols);
        tmpacti = sigmoid(tmpacti);
        HiddenLayers.push_back(tmpsa);
        Activations.push_back(tmpacti);
    }
    // Finished training Sparse Autoencoder
    // Now train Softmax.
    SMR smr;
    trainSoftmaxRegression(smr, Activations[Activations.size() - 1], trainY, 3e-3, 2e-2, 80000);
    // Finetune using Back Propogation
    trainFineTuneNetwork(trainX, trainY, HiddenLayers, smr, 1e-4, 2e-2, 80000);
    // Finally check result.
    Mat result = resultProdict(testX, HiddenLayers, smr);

    Mat err(testY);
    err -= result;
    int correct = err.cols;
    for(int i=0; i<err.cols; i++){
        if(err.ATD(0, i) != 0) --correct;
    }
    cout<<"correct: "<<correct<<", total: "<<err.cols<<", accuracy: "<<double(correct) / (double)(err.cols)<<endl;
    end = clock();
    cout<<"Totally used time: "<<((double)(end - start)) / CLOCKS_PER_SEC<<" second"<<endl;

    //waitKey(0);
    return 0;
}

The above version uses MNIST dataset, you can get it HERE.

The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image.

TEST RESULT

MNIST:

2 hidden layers, with 200 neurons in each hidden layer. Accuracy 0.9446
2 hidden layers, with 400 neurons in each hidden layer. Accuracy 0.968
2 hidden layers, with 600 neurons in each hidden layer. Accuracy 0.9266
2 hidden layers, with 800 neurons in each hidden layer. Accuracy 0.9656

 

MIT CBCL FACE DATABASE (#1)

 

5 

POSTSCRIPT

You can see that I used stochastic gradient descent in training process, that is because:

  • stochastic learning is usually much faster than batch learning.
  • stochastic learning also often results in better sulutions.
  • stochastic learning can be used for tracking changes.

For more details, check Efficient BackProp by Yann LeCun et al.

Enjoy the code, and feel free to let me know if there’s any bug in it.

🙂

This entry was posted in Algorithm, Machine Learning, OpenCV and tagged , , , , , , , , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

37 Comments

  1. Nick
    Posted July 30, 2014 at 12:26 am | Permalink

    Hi
    Thanks for this post.
    Sorry but I used your code as such and it compiles fine with OpenCV. But when I run it, I get the dreaded ‘Segmentation fault’.

    • Eric
      Posted July 30, 2014 at 12:36 am | Permalink

      Hi Nick,
      When does the SegFault happen? I just tried to run it, no fault happened. Can you find specifically in which function it happens?

      • Dov Bai
        Posted November 6, 2014 at 9:21 pm | Permalink

        The code can segfault on lines like this:

        int randomNum = ((long)rand() + (long)rand()) % (data.cols – batch);

        because the sum of two longs may overflow a long and result in a negative randomNum which is not legal in Rect.

        Modifying to rand()%(data.cols – batch) works OK.

        • Eric
          Posted November 6, 2014 at 10:03 pm | Permalink

          Thank you Dov 🙂

  2. Adil
    Posted July 30, 2014 at 11:04 pm | Permalink

    Hi,

    I tried to run your code, but unfortunately I don’t know how can I did that by using the regular C++ or visual C++, could you please help me on that… I really appreciate that

    • Eric
      Posted July 31, 2014 at 2:16 am | Permalink

      Hi Adil,

      I’m not sure what do you mean by “using regular C++ or visual C++”, do you have problem with compiling this code?

  3. Yoann
    Posted August 8, 2014 at 10:42 am | Permalink

    Hi,

    Thanks for this very useful code. I have some questions to understand the capabilities of this code.
    First, I understand that the trainY table is composed of 3 different values (0, 1, 2). Is it possible to train the model using real values from -1 to 1?
    Second, I would like to know if it is possible, using your code, to combine different autoencoders such as in this picture: http://1drv.ms/1oj7vAi. I don’t know if such a combination has some benefits over a more common approach where all the features are combined in the same autoencoder but it would make sense in my work to do so.

    I plan to test your code soon, thanks again.

    • Eric
      Posted August 9, 2014 at 12:37 pm | Permalink

      Hi Yoann,

      First, the Y value in this code can be 0-9, and I don’t think it works if using real values as training labels.
      Second, the picture you showed seems like using some combination of two different network, this is a good idea, however, I think what’s better to do is to use something like Denoising Auto-encoder, that’s a simple and efficient way to implement combination of different networks (not only two networks, but thousands of it).

      🙂

      • Yoann
        Posted August 11, 2014 at 2:47 am | Permalink

        Thanks for your answer!

        I will definitly implement this denoising autoencoder when i’ll find a code as simple and efficient as yours but working with real values. 🙂

  4. Yoann
    Posted August 12, 2014 at 3:32 am | Permalink

    Hi Eric,

    I’ve not found on the web a code as simple as yours to use.
    What I plan to do is to use your code but replace the softmax regression by a logistic regression to output continuous values. In this post: http://eric-yuan.me/logistic-regression/you say that you implemented the logistic regression but that there are some mistakes in that code. Do you think that the code of the logistic regression you wrote could work with the code in this page? If it could work, I may debug the code if you are interested.

    • Eric
      Posted August 13, 2014 at 4:39 pm | Permalink

      Hey Yoann,

      About that buggy code I mentioned in that post, actually I have fixed it, and it is exactly the code in http://eric-yuan.me/bpnn/ . For the second question, I don’t think logistic regression works, because it is more likely to be a classifier, it uses non-linearity process inside (softmax regression uses the same idea). I’m not sure but my suggestion is, search online about neural network + LINEAR regression. Good luck!

  5. Tagore
    Posted August 12, 2014 at 3:58 am | Permalink

    Hi,

    Thank you for your sharing,I am puzzled by one question.I think I was wrong,but I see you train the two hiddened layers in one “for-loop”.How did you train the second hiddened layer by using the activations of the first hiddened layer?
    I am a beginner,and I appreciate your code so much,waiting for your reply!
    Thank you again!

    • Eric
      Posted August 13, 2014 at 4:30 pm | Permalink

      Hi Tagore,

      Sorry I didn’t quite catch what you said, do you mean in the fine-tune training process?

      • Tagore
        Posted August 14, 2014 at 2:35 am | Permalink

        Thank you for your reply, I made a misunderning of your sparseAutoencoder training part,and I am now catch it!
        Thank you so much!

      • Tagore
        Posted August 29, 2014 at 3:09 am | Permalink

        Hi Eric,
        Now one question puzzeles me. Each AutoEncoder has params w2 and b2,but we haven’t used them, so What is the function of them?
        Haven’t really catch the network, I am really wish for your help!
        Thank you!

        • Eric
          Posted September 1, 2014 at 1:09 pm | Permalink

          Hey Tagore,

          In auto-encoder, what we mainly concerns is the ‘encoder’ part, and the W2/b2 stuff are what we called decoder. If you see Sparse Coding, you may find the usage of decoder 🙂

  6. cheong
    Posted September 14, 2014 at 10:52 pm | Permalink

    How to read image and label from my own database in your program? Thanks

    • Eric
      Posted September 15, 2014 at 6:06 pm | Permalink

      Hey,

      For images, just use opencv imread() function, and for other formats (.txt etc.), use regular C++ file operation functions.

  7. Daniel
    Posted December 3, 2014 at 7:22 am | Permalink

    Hi Eric, nice post. I try to run this on my ubuntu, this is the cmakelist:
    cmake_minimum_required(VERSION 2.8)
    project( MnistClassify )
    find_package( OpenCV REQUIRED )
    add_executable( MnistClassify MnistClassify.cpp )
    target_link_libraries( MnistClassify ${OpenCV_LIBS} )

    but give me some errors:
    /home/daniel/intelCode/MnistClassify.cpp: In function ‘void read_Mnist(std::string, std::vector&)’:
    /home/daniel/intelCode/MnistClassify.cpp:94:41: error: no matching function for call to ‘std::basic_ifstream::basic_ifstream(std::string&, const openmode&)’
    ifstream file (filename, ios::binary);
    ^
    /home/daniel/intelCode/MnistClassify.cpp:94:41: note: candidates are:
    In file included from /home/daniel/intelCode/MnistClassify.cpp:20:0:
    /usr/include/c++/4.8/fstream:467:7: note: std::basic_ifstream::basic_ifstream(const char*, std::ios_base::openmode) [with _CharT = char; _Traits = std::char_traits; std::ios_base::openmode = std::_Ios_Openmode]
    basic_ifstream(const char* __s, ios_base::openmode __mode = ios_base::in)
    ^
    /usr/include/c++/4.8/fstream:467:7: note: no known conversion for argument 1 from ‘std::string {aka std::basic_string}’ to ‘const char*’
    /usr/include/c++/4.8/fstream:453:7: note: std::basic_ifstream::basic_ifstream() [with _CharT = char; _Traits = std::char_traits]
    basic_ifstream() : __istream_type(), _M_filebuf()
    ^
    /usr/include/c++/4.8/fstream:453:7: note: candidate expects 0 arguments, 2 provided
    /usr/include/c++/4.8/fstream:427:11: note: std::basic_ifstream::basic_ifstream(const std::basic_ifstream&)
    class basic_ifstream : public basic_istream
    ^
    /usr/include/c++/4.8/fstream:427:11: note: candidate expects 1 argument, 2 provided
    /home/daniel/intelCode/MnistClassify.cpp: In function ‘void read_Mnist_Label(std::string, cv::Mat&)’:
    /home/daniel/intelCode/MnistClassify.cpp:125:41: error: no matching function for call to ‘std::basic_ifstream::basic_ifstream(std::string&, const openmode&)’
    ifstream file (filename, ios::binary);
    ^
    /home/daniel/intelCode/MnistClassify.cpp:125:41: note: candidates are:
    In file included from /home/daniel/intelCode/MnistClassify.cpp:20:0:
    /usr/include/c++/4.8/fstream:467:7: note: std::basic_ifstream::basic_ifstream(const char*, std::ios_base::openmode) [with _CharT = char; _Traits = std::char_traits; std::ios_base::openmode = std::_Ios_Openmode]
    basic_ifstream(const char* __s, ios_base::openmode __mode = ios_base::in)
    ^
    /usr/include/c++/4.8/fstream:467:7: note: no known conversion for argument 1 from ‘std::string {aka std::basic_string}’ to ‘const char*’
    /usr/include/c++/4.8/fstream:453:7: note: std::basic_ifstream::basic_ifstream() [with _CharT = char; _Traits = std::char_traits]
    basic_ifstream() : __istream_type(), _M_filebuf()
    ^
    /usr/include/c++/4.8/fstream:453:7: note: candidate expects 0 arguments, 2 provided
    /usr/include/c++/4.8/fstream:427:11: note: std::basic_ifstream::basic_ifstream(const std::basic_ifstream&)
    class basic_ifstream : public basic_istream
    ^
    /usr/include/c++/4.8/fstream:427:11: note: candidate expects 1 argument, 2 provided
    make[2]: *** [CMakeFiles/MnistClassify.dir/MnistClassify.cpp.o] Error 1
    make[1]: *** [CMakeFiles/MnistClassify.dir/all] Error 2
    make: *** [all] Error 2

    • Eric
      Posted December 3, 2014 at 10:29 pm | Permalink

      It seems that something wrong with the ifstream part, maybe in Ubuntu, ifstream::open won’t accept string as a filename, so try to change “ifstream file (filename, ios::binary);” into “ifstream file (filename.c_str(), ios::binary);”, thanks.

  8. Kankan Dai
    Posted March 5, 2015 at 4:19 am | Permalink

    Thanks for the code! But I’m confused about line 138: mat.ATD(0, i) = (double)temp; It shows in my openCV, there is no ATD() method for class Mat?

    • Kankan Dai
      Posted March 5, 2015 at 6:01 am | Permalink

      Oh It is a Macro

  9. John W Jones
    Posted March 29, 2015 at 1:15 pm | Permalink

    Hi, I’ve been trying to debug the code above for use (with attribution) for my master’s degree final project. I have it compiled on MSVC++ 2013 Pro with OpenCV 2.4.9.

    I get an MSVCP120D.dll Debug Assertion Fail on a vector where vector subscript out of range is named.

    Have you any idea what may have caused this and, hopefully, how to correct? Thanks in advance, jj.

  10. John W Jones
    Posted March 29, 2015 at 2:09 pm | Permalink

    On line 577 above, vec.resize(number_of_images, cv::Mat(28, 28, CV_8UC1));, is commented out. That will cause the error I wrote of above. Un-comment; runs fine…

  11. Miler
    Posted June 5, 2015 at 2:21 pm | Permalink

    Hi Eric, your code is great, but you can explain me why this lines:
    sa.W1 = sa.W1 * (2 * epsilon) – epsilon;
    sa.W2 = sa.W2 * (2 * epsilon) – epsilon;
    sa.b1 = sa.b1 * (2 * epsilon) – epsilon;
    sa.b2 = sa.b2 * (2 * epsilon) – epsilon;

    • Miler
      Posted June 5, 2015 at 2:39 pm | Permalink

      Forget it, this is the Parameters’s Initialization in ML. Thank you.

      • Miler
        Posted June 6, 2015 at 6:25 pm | Permalink

        I found a mistake:

        trainSparseAutoencoder(tmpsa, tempX, 600, 3e-3, 0.1, 3, 2e-2, 100, tamanoBatch);

        the number 600 is wrong

        in the second autoencoder, you dont do dimensional reduction

        number of units (2 autoencoders + 1 softmax)
        784 -> 600 -> 600 -> 10

  12. ngapweitham
    Posted August 19, 2015 at 9:10 am | Permalink

    I find an equation differ with(weight decay) the tutorial of UFLDL

    In the function “sparseAutoencoderCost”

    // the second part is weight decay part
    double err2 = sum(sa.W1)[0] + sum(sa.W2)[0];
    err2 *= (lambda / 2.0);

    But according to the equation, it should be

    cv::pow(sa.W1, 2.0, sa.W1);
    cv::pow(sa.W2, 2.0, sa.W2);
    double err2 = sum(sa.W1)[0] + sum(sa.W2)[0];
    err2 *= (lambda / 2.0);

    The weights of the equation should be the power of 2, is this a bug of the codes or my misunderstanding?Thank you

    double err2 = (cv::sum(es.w1_)[0] + cv::sum(es.w2_)[0]) *
    (params_.lambda_ / 2.0);

  13. M. Plav
    Posted August 29, 2015 at 12:42 pm | Permalink

    Hi,
    I am a newbie but found it very useful. Can you please tell me a way to write it in Matlab. Moreover, I want to train stacked autoencoders from one type of images say neon faces and predict normal faces how can I do it? I am thinking of using SparseDenoisingAutoencoders with stochastic gradient descent and may be KNN as classifier or MSE for ranking output matching results. make sense? thanks

  14. Việt Anh
    Posted September 15, 2015 at 10:27 pm | Permalink

    One small thing :

    Need to #include to use CLOCKS_PER_SEC and clock().

    P/s : Thank you very much !

    • Việt Anh
      Posted September 15, 2015 at 10:29 pm | Permalink

      #include time.h

  15. Shelby
    Posted March 31, 2016 at 5:04 am | Permalink

    Hi Eric,

    I built a 3 layer stacked RBM and trained it completely, now how can i test it?
    i used matlab for this.

    thank you in advance.

  16. sakazi
    Posted December 9, 2016 at 8:26 am | Permalink

    Hello Eric,
    could you provide this code using Matlab?

    thanks

Post a Comment

Your email is never published nor shared. Required fields are marked *

You may use these HTML tags and attributes <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*
*