In the Motion Capture course lecture this week, teachers talked about camShift algorithm, I heard about this algorithm since about three or four years before, but I never tried it, and during that class I thought I must do it this time.

CamShift is a tracking algorithm, which is based on MeanShift algorithm, what camShift do is nothing but do meanShift in every single frame of a video, and record the results we got by meanShift.

CamShift algorithm includes these three parts:

- 1. Back Projection
- 2. MeanShift
- 3. Track

and I will simply explain each of these steps in this blog.

**1. Back Projection.**

Back projection is a method which using the histogram of an image to show up the probabilities of colors may appear in each pixel. Let’s see how to get the back projection of an image using OpenCV.

cvtColor(image, hsv, CV_BGR2HSV); int ch[]={0,0}; hue.create(hsv.size(), hsv.depth()); mixChannels(&hsv, 1, &hue, 1, ch, 1); calcHist(&hue, 1, 0, Mat(), hist, 1, &hsize, &phranges); normalize(hist, hist, 0,255, CV_MINMAX); calcBackProject( &hue, 1, 0, hist, backproj, &phranges, 1, true );

First we transform the picture space to HSV space (or any space which include a H channel that represent the hue of each pixel, of course, value of hue is between 0 to 180, you can see more info in wiki.) Secondly, we split the H channel out, as a single grayscale image, and get its histogram, and normalize it. Thirdly, use “calcBackProject()” function to calculate the back projection of the image.

Let me use an example to explain how we get the back projection.

If this is our input image, we can see it is a colorful mosaic picture.

As we talked above, transform the picture into HSV space and here is the hue channel.

This is its histogram looks like.

“calcBackProject()” function actually calculate the weight of each color in the whole picture using histogram, and change the value of each pixel to the weight of its color in whole picture. For instance, if one pixel’s color is, say yellow, and the color yellow’s weight in this picture is 20%, that is, there are 20% of pixels’ color in the whole picture is this kind of yellow, we change this pixel’s value from yellow to 0.2 (or 0.2*255 if using integer), by doing this method to all pixels, we get the back projection picture.

**2. MeanShift**

What is meanShift? MeanShift is nothing but an algorithm which finding modes in a set of data samples representing an underlying probability density function (PDF) in R^N. It is a nonparametric clustering technique which does not require prior knowledge of the number of clusters, and does not constrain the shape of the clusters. I will not show you those annoying formulae, I’ll show you how it works by graph and words.

Imagine we are in a space, and the dimension of this space is d, (of course, d maybe bigger than 2) and there are a lot of points in this space, what we are going to do is clustering these points. We now make a sphere which center is any of the points, and radius is, say, h. Because we are in a high-dimensional space, so what we just made is a high-dimensional sphere. By now, every single point inside this space can be seen as a vector (directed line), and the sum (normalized) of these vectors is what we called mean shift. By this mean shift vector, we can get the current mass center.

In c++, the current mass center can be calculated like this:

M00 = 0.0; M10 = 0.0; M01 = 0.0; for(int i = 0; i < probmap.height; i++){ for(int j = 0; j < probmap.width; j++){ M00 += probmap.at(i, j); M10 += i * probmap.at(i, j); M01 += j * probmap.at(i, j); } } Center_x = M01 / M00; Center_y = M10 / M00;

Above is the core of meanShift algorithm, so the whole algorithm is:

a. Initialize the sphere, including the center and radius.

b. Calculate the current mass center.

c. Move the sphere’s center to mass center.

d. repeat step b and c, until converge, that is, current mass center after calculate, is the same point with center of sphere.

In OpenCV, meanShift function is something like this:

CV_IMPL int cvMeanShift( const void* imgProb, CvRect windowIn, CvTermCriteria criteria, CvConnectedComp* comp );

In which, imgProb is 2D object probability distribution, which is the result of back projection above; windowIn is CvRect of window initial size; numIters means that if the algorithm iterates this many times, stop, that prevents in some cases the function just repeats and repeats; windowOut is the location, height and width of converged window.

**3. Track**

The last step is tracking, if we have a video, or frames captured by our web camera, what we need to do is just use meanShift algorithm to every single frame, and the initial window of each frame is just the output window of the prior frame.

By using OpenCV camshift() function, we can get a RotatedRect, which is defined in OpenCV like:

class RotatedRect { public: // constructors RotatedRect(); RotatedRect(const Point2f& _center, const Size2f& _size, float _angle); RotatedRect(const CvBox2D& box); // returns minimal up-right rectangle that contains the rotated rectangle Rect boundingRect() const; // backward conversion to CvBox2D operator CvBox2D() const; // mass center of the rectangle Point2f center; // size Size2f size; // rotation angle in degrees float angle; };

Interesting, except center, size, we can also get an angle of the rectangle, which means we can track the orientation of our target, this is a very useful feature.

**Testing**

First, I tried to track my Kendama balls, it works well.

Moreover, see, I’m drawing with my face!

ðŸ™‚