Color segmentation

In most of images that we take during data acquisition, we only want part of the image that contains that data the we need. There are a lot of ways to do this; it can even be done using template matching. For this activity, what we do is we use the color of the region of interest (ROI) to separate it from the whole image.

There are two techniques that can be done in color segmentation. They are the Parametric and Non-Parametric segmentation. Both of these techniques use a portion of the ROI.

Each pixel in a colored image has RGB values. What we do in parametric segmentation is that we take the fraction of the RGB values that is in each pixel in the portion of the ROI. That is:

I= R + G + B

1 = \frac{R + G + B}{I}, 1 = r + g + b

where r = R/I, g = G/I and b = B/I. However, we can see that b can be seen as a function of r and g, given by:

1-b = r + g

This means that we only need to obtain the r and g values.

In parametric segmentation, we obtain the r- and g-values of a portion of the ROI. Then we obtain the mean r and mean g values, given by μr, μg and the standard deviation values, σr, σg, respectively. After obtain the mean and standard deviation, we obtain the probability that a pixel belongs to the ROI using the r and g values of the pixel. The probability for r (and g) is given by:

p(r) = \frac{1}{\sigma_r\sqrt{2\pi}} exp(-\frac{(r-\mu_r)^2}{2\sigma_r^2})

where r is the r-value of the pixel. The same form is also used for g-values. After obtaining p(r) and p(g), we obtain p(r) * p(g) which will give us the probability of the pixel being part of the ROI. We show in figure 1 the image that we used, and in figure 2, the portion of the ROI where the mean and the standard deviation are obtained.

flower1

Figure 1. The ROI are the flowers.

roi

Figure 2. A portion of the flowers.

Figure 3 shows the probability of each pixel. We can see that the flowers are separated from the whole image.

flower_param1Figure 3. The flowers are separated from the other parts of the image.

For the next technique, we use a histogram. This histogram is created using the r and g values of the portion of the ROI. We convert the r and g values into integers and binning the image values in a matrix (Soriano, 2013). The quality of the segmentation depends on the size of the bin that is used. In figure 4, we show the result of the non-parametric segmentation with different bin sizes.

nonparametricFigure 4. left to right, top to bottom: bin size = 2, 4, 8, 16, 32, 64, 128, 256

From figure 4, it can be seen that the lower the bin size, the constraint becomes too loose such that even other objects in the image are segmented. The higher the bin size, the lesser the details are segmented until such that even some of the objects that are needed are not segmented. It all boils down to choosing a good bin size.

In comparison, the parametric segmentation yields better results, since the flowers have a higher brightness, and no other objects are captured. However non-parametric segmentation is faster since the look up table (the histogram) is present and no further calculation is needed. I show another example in figure 5.

flower2tecFigure 5. The left image is the original, center the result of parametric segmentation, and right the result of non-parametric segmentation. The difference between the quality of parametric and non-parametric segmentation is seen. The parametric segmentation captures more area of the flowers.

I give myself a grade of 6/10, since this blog is overdue. I thank Ms. Abby Jayin for her help in understanding the concepts.

Sources:

Color Segmentation by Dr. Maricor Soriano.

Image Compression (PCA)

In the old age where a diskette (ha! hipsters can relate) is equivalent to today’s USB in terms of usefulness, Storage was once a problem, and in finding a way to solve this, they have come across what is currently known as image compression, which is still being used today (hello JPEG). In this activity, we show how to compress images using Principal Components Analysis (PCA).

Basically what PCA does is that it converts a set of correlated signals into orthogonal functions whose sums (with their respective coefficients)  can reconstruct any of the signals. (I do not want to indulge in here. :PPP) If we have a set of functions, f1, f2, f3, …. , fn,  then we can obtain a set of orthogonal functions W1, W2, W3… Wm such that:

f_i = \sum\limits_{n=1} ^{m}a_n,_i W_n

Where a_n,_i is the coefficient for the function W_n. The functions W_n are called the principal components. However, each W_n has a different percent contribution to the reconstruction of the image. In Scilab, W_n and a_n can be obtained using a single line of code. . If X is the matrix containing the signals (each column being an individual signal), the principal components and the coefficients can be obtained by:

[lambda, facpr, comprinc] = pca(X);

where facpr is the matrix containing the principal components and comprinc the matrix containing the coefficients. Lambda gives the percent contribution of each principal components to the signals.

In image compression, what we do is that we cut our image into NxN parts. Then we convert each part to a single matrix with size 1 x N^2 . Each 1 x N^2 matrix will be one signal. Then we concatenate all the signals into one matrix which will be input to the pca. For this activity, we use N = 10.

There will be N^2 = 100 principal components. What we will then do is try to reconstruct the images using the principal components. Because each principal component have different percent contribution, we will investigate what will happen to the reconstruction if we use only n principal components (0<n<100).

We use a picture of a cat (well its a tiger but…. still a cat).

cat2

A cat.

In the figure below, we show the difference of the original picture (in grayscale) to the reconstruction images.

catconcatenate

The topleft figure is the grayscale image of the cat. The top center is for n=1, top right is for n = 5, bottom left is n = 10, bottom center is n = 50 and bottom right is n = 100.

From the figure above, we can see that the higher the number of principal components used, the better the quality. However, it can be observed that for n = 50 and  n = 100, the images do not vary much. This is due to the percent contribution of the principal components.

In the next figure, we show what the principal components look like.

comp0

The principal components are each 10 x 10 part of this picture.

We show another example. This one came from the anime Fate zero, where the protagonist is a female swordsman…..well you can search it if you like.

fateconcatenate

The number of principal components are the same as the cat figure.

The author gives himself a score of 9/10 for this work. He acknowledges Mr. Tingzon for the help. 🙂

Sources:

Wikipedia

Dr. Maricor Soriano

Image and Music (activity 12)

In this activity, we play notes on a score sheet using image processing and scilab (I know, tones from programming languages are usually pure tone, they do not emulate real world instruments. Anyway, still its cool). The score sheet that I used is a score sheet of a common song, “Twinkle Twinkle Little Star”.twinkle_twinkle_little_star

Figure 1. Score sheet from mamalisa.com. One of the coolest tones ever!

We read the image of the score sheet in scilab and do the usual (imread, rgb2gray, etc). The next thing that we do is crop a part of the image shown in figure 2.

note5

Figure 2. In case you’re wondering what this is, this is the oval in the end of a half note.

What we are trying to do here is that we want to obtain the coordinates of the notes, at least one coordinate of each note whose qualification is consistent. This is because the row of the coordinate will be used to obtain the tone and the column used for the duration and sequence of the notes.

So what we are going to do next is to edit figure 1 such that the notes are sequential. To drive this idea home, I show figure 3.

notes

Figure 3. The notes are now sequential. This is to facilitate better sequencing of notes.

We then do template matching of figure 2 and figure 3. After thresholding, we obtain the relative position of the notes. This works because all the notes are quarter and half notes so figure 2 is found in all the notes that are present.

black_white_twinkle_2Figure 4. The white dots are the position in which the template matching has the highest values.

We obtain the position of the white dots in figure 4, the x being the tone and y being the sequence and the length of each note. From figure 1, the notes are in the key of C. Therefore Do is C, Re is D, Mi is E, etc. Using the table from http://www.phy.mtu.edu/~suits/notefreqs.html, we can convert the x-axis into sinusoids. Using the sound function, we can play the notes. I currently cannot upload the file.

The author gives himself a grade of 9/10 for this activity.