Since the last CNN post, I was working on a new version of CNN, which support multi-layers Conv and Pooling process, I’d like to share some experience here.

### VECTOR VS HASH TABLE

You can see in the last post, I used vector of Mat in convolution steps, it works well when we only have one convolution layer, which means for each input image, we can get **1 * KernelAmount** of images after the Conv and Pooling layer (the Pooling operation doesn’t change the amount of images). For easily retrieve these “conved images”, I generate one vector of Mat for each input image.

However, when we have more than one layer of Conv and Pooling layers, using vector became disaster, say we have 3 Conv layers, the kernel amount of these layers are 4, 6, 8, so after processing, we got **1 * 4 * 6 * 8 **“conved images”, this seems fine, but when doing backprop, this made me feel like in hell when I tried it.

What about using vector of vector? Good point, if you like this:

**<vector<vecotr<vector<vector<vector<vector<Mat> > > > > > >**

What I’ve done in this version, was using Hashmap (unordered_map in C++), even though I never thought it is best idea. I used a string as key, and the corresponding Mat as value. These example show how I define the key:

**X234C0K2PC1K4** means this is a matrix which is the **234th** input image convolved by kernel **2** in **0th** Conv layer, and after **Pooling**, convolved by kernel **4** in **1st** Conv layer.

**X22C0K0PD**, the “X22C0K0P” means this is a matrix which is the **22nd** input image convolved by kernel **0** in **0th** Conv layer, and **pooled**. And by adding a ‘D’, means this is a matrix generated during backprop process, last ‘D’ means this is the corresponding delta matrix.

**X22C0K0PUD**, is the result of the above matrix after doing **unPooling**.

Advantages:

- Easy to debug, can simply get any matrix inside the whole processing.
- Easy to know what’s going on, especially for someone who actually doesn’t fully understand the Architecture of ConvNet.
- Fast to access data (can access data in O(1) of time) and all advantages of Hashtable.

Disadvantages:

- String operating is boring, it feels like you are doing some LeetCode problems.
- Memory things and all other disadvantages of Hashtable

I said I never thought this is the best data structure for ConvNet, but I found it’s a good one for newbies. If you have better idea about this part, please let me know.

### CODE

**https://github.com/xingdi-eric-yuan/multi-layer-convnet**

I implemented a version 3.0 recently, check it **here**.

enjoy it 🙂

Pingback: Machine Learning, Deep Learning, OpenCV, Pooling, CNN()