### INTRODUCTION

Sparse coding is one of the very famous unsupervised methods in this decade, it is a dictionary learning process, which target is to find a dictionary that we can use a linear combination of vectors in this dictionary to represent any training input vector. For better capture structures and patterns inherent in the input vectors, we use an over-complete dictionary, moreover, we want the linear combination to be sparse, which means there are only a very small part of values in it are non-zero. This is how people define sparse coding cost function,

in which, **D** is our desired dictionary, **h** is the latent representation, or the “linear combination”. We want to reconstruct the original input **x** as well as possible, so we use the L2-Norm part to minimize the reconstruction error; we also want the latent representation has many zeros, so we use the L1-Norm part as sparsity penalty.

### METHODS

Seems simple, but when implementing it, a problem raises, the L1-Norm is not differentiable at zero, so how can we use gradient descent method to solve sparse coding problems? I’ll introduce two solutions here.

#### 1. Iterative Shrinkage and Thresholding Algorithm (a.k.a. ISTA)

Because the L1-Norm part with only respect to **h**, so during **h** updating, we doing this: First we update normally from the reconstruction error part, then if **h** changes sign because of the L1-Norm part gradient, we clamp **h** to **0**. The “if…else…” part is also known as Shrinkage function.

#### 2. Smooth L1-Norm

Here we simply using a smooth version of L1-Norm, and make it differentiable at zero.

This method is from Prof. Andrew Ng’s **UFLDL** website, and in the code of this post, I’m also using this method. Of course, there are other ways that dealing with this problem, you are able to find a whole bunch of methods related to improving sparse coding method online.

### ALGORITHM

- Initialize Dictionary
**D**randomly. - Repeat until convergence:

- Select a random mini-batch
- Initialize
**h** - Find the
**h**that minimizes**J(D, h)**for the**D**found in the previous step - Solve for the
**D**that minimizes**J(D, h)**for the**h**found in the previous step

### CODE

My implementation of sparse coding using C++ and Armadillo.

**https://github.com/xingdi-eric-yuan/sparse-coding **

### TEST

I used this image as test image.

First I added random noise into this image, the std of noise is equal to **1/3** of the std of origin image, then I extracted all overlapping **8 * 8** sub-images in this image, as my training dataset.

The following figures (gif) are dictionaries learned by using 49, 121, 225, 400 dictionaries.

49 dictionaries:

121 dictionaries:

225 dictionaries:

400 dictionaries:

I tried to implement denoising using sparse coding (**SEE THIS PAPER**), after training dictionary, I averaged the reconstruction of each patch and get a reconstructed image (using the 49 dictionaries result).

Figure 1 is the original image with noise, Figure 2 is the reconstructed image, actually it’s hard to say the reconstructed image is denoised, so I used **imstats()** in Matlab.

It is denoised SOMEHOW, I believe I can do this better by using better parameters, such as penalty parameters and so on. If you have any idea, please tell me.

🙂