Since the last CNN post, I was working on a new version of CNN, which support multi-layers Conv and Pooling process, I’d like to share some experience here.
VECTOR VS HASH TABLE
You can see in the last post, I used vector of Mat in convolution steps, it works well when we only have one convolution layer, which means for each input image, we can get 1 * KernelAmount of images after the Conv and Pooling layer (the Pooling operation doesn’t change the amount of images). For easily retrieve these “conved images”, I generate one vector of Mat for each input image.
However, when we have more than one layer of Conv and Pooling layers, using vector became disaster, say we have 3 Conv layers, the kernel amount of these layers are 4, 6, 8, so after processing, we got 1 * 4 * 6 * 8 “conved images”, this seems fine, but when doing backprop, this made me feel like in hell when I tried it.
What about using vector of vector? Good point, if you like this:
<vector<vecotr<vector<vector<vector<vector<Mat> > > > > > >
What I’ve done in this version, was using Hashmap (unordered_map in C++), even though I never thought it is best idea. I used a string as key, and the corresponding Mat as value. These example show how I define the key:
X234C0K2PC1K4 means this is a matrix which is the 234th input image convolved by kernel 2 in 0th Conv layer, and after Pooling, convolved by kernel 4 in 1st Conv layer.
X22C0K0PD, the “X22C0K0P” means this is a matrix which is the 22nd input image convolved by kernel 0 in 0th Conv layer, and pooled. And by adding a ‘D’, means this is a matrix generated during backprop process, last ‘D’ means this is the corresponding delta matrix.
X22C0K0PUD, is the result of the above matrix after doing unPooling.
- Easy to debug, can simply get any matrix inside the whole processing.
- Easy to know what’s going on, especially for someone who actually doesn’t fully understand the Architecture of ConvNet.
- Fast to access data (can access data in O(1) of time) and all advantages of Hashtable.
- String operating is boring, it feels like you are doing some LeetCode problems.
- Memory things and all other disadvantages of Hashtable
I said I never thought this is the best data structure for ConvNet, but I found it’s a good one for newbies. If you have better idea about this part, please let me know.
I implemented a version 3.0 recently, check it here.
enjoy it 🙂