I’m learning Prof. Andrew Ng’s Unsupervised Feature Learning and Deep Learning tutorial, This is the 4th exercise, which is using Softmax regression to build a classifier and classify MNIST handwritten digits. Just like my other UFLDL exercise posts, I’ll not go through the detail of the material. More details about this exercise can be found **HERE**.

I’ll re-implement Softmax regression algorithm with C++ this weekend (if I have can finish my homeworks), and I will write a post about Softmax.

### CODE

% softmaxCost.m function [cost, grad] = softmaxCost(theta, numClasses, inputSize, lambda, data, labels) % numClasses - the number of classes % inputSize - the size N of the input vector % lambda - weight decay parameter % data - the N x M input matrix, where each column data(:, i) corresponds to % a single test set % labels - an M x 1 matrix containing the labels corresponding for the input data % % Unroll the parameters from theta theta = reshape(theta, numClasses, inputSize); numCases = size(data, 2); groundTruth = full(sparse(labels, 1:numCases, 1)); cost = 0; thetagrad = zeros(numClasses, inputSize); %% ---------- YOUR CODE HERE -------------------------------------- % Instructions: Compute the cost and gradient for softmax regression. % You need to compute thetagrad and cost. % The groundTruth matrix might come in handy. [nfeatures, nsamples] = size(data); M = theta * data; maxM = max(M, [], 1); M = bsxfun(@minus, M, maxM); M = exp(M); M = bsxfun(@rdivide, M, sum(M)); temp = groundTruth .* log(M); cost = - sum(sum(temp)) ./ nsamples; cost = cost + sum(sum(theta .^ 2)) .* lambda ./ 2; temp = groundTruth - M; temp = temp * data'; thetagrad = - temp ./ nsamples; thetagrad = thetagrad + lambda .* theta; % ------------------------------------------------------------------ % Unroll the gradient matrices into a vector for minFunc grad = [thetagrad(:)]; end |

And the method in softmaxPredict.m is exactly the same as the above method which calculates the cost function.

### RESULTS

image size: 28 * 28

image amount: 10000

iterations: 100

time used: 13.501522 second

Accuracy: 92.640%

🙂