Sparse coding is one of the very famous unsupervised methods in this decade, it is a dictionary learning process, which target is to find a dictionary that we can use a linear combination of vectors in this dictionary to represent any training input vector. For better capture structures and patterns inherent in the input vectors, we use an over-complete dictionary, moreover, we want the linear combination to be sparse, which means there are only a very small part of values in it are non-zero. This is how people define sparse coding cost function,
in which, D is our desired dictionary, h is the latent representation, or the “linear combination”. We want to reconstruct the original input x as well as possible, so we use the L2-Norm part to minimize the reconstruction error; we also want the latent representation has many zeros, so we use the L1-Norm part as sparsity penalty.
Seems simple, but when implementing it, a problem raises, the L1-Norm is not differentiable at zero, so how can we use gradient descent method to solve sparse coding problems? I’ll introduce two solutions here.
1. Iterative Shrinkage and Thresholding Algorithm (a.k.a. ISTA)
Because the L1-Norm part with only respect to h, so during h updating, we doing this: First we update normally from the reconstruction error part, then if h changes sign because of the L1-Norm part gradient, we clamp h to 0. The “if…else…” part is also known as Shrinkage function.
2. Smooth L1-Norm
Here we simply using a smooth version of L1-Norm, and make it differentiable at zero.
This method is from Prof. Andrew Ng’s UFLDL website, and in the code of this post, I’m also using this method. Of course, there are other ways that dealing with this problem, you are able to find a whole bunch of methods related to improving sparse coding method online.
- Initialize Dictionary D randomly.
- Repeat until convergence:
- Select a random mini-batch
- Initialize h
- Find the h that minimizes J(D, h) for the D found in the previous step
- Solve for the D that minimizes J(D, h) for the h found in the previous step
My implementation of sparse coding using C++ and Armadillo.
I used this image as test image.
First I added random noise into this image, the std of noise is equal to 1/3 of the std of origin image, then I extracted all overlapping 8 * 8 sub-images in this image, as my training dataset.
The following figures (gif) are dictionaries learned by using 49, 121, 225, 400 dictionaries.
I tried to implement denoising using sparse coding (SEE THIS PAPER), after training dictionary, I averaged the reconstruction of each patch and get a reconstructed image (using the 49 dictionaries result).
Figure 1 is the original image with noise, Figure 2 is the reconstructed image, actually it’s hard to say the reconstructed image is denoised, so I used imstats() in Matlab.
It is denoised SOMEHOW, I believe I can do this better by using better parameters, such as penalty parameters and so on. If you have any idea, please tell me.