Fourier transform

We finally get to play around with Fourier transform!

For this activity, we are going to familiarize ourselves with the discrete Fourier transform, specifically the Fast Fourier Transform (FFT) by Cooley and Tukey. Then using the FFT, we apply Convolution between two images. Next is Correlation, where we use the conjugate of the image. Lastly, using convolution, we do an edge detection.

1. Familiarization with FFT.

One theorem in mathematics is that, all wave forms can be decomposed into sinusoids. That is, the function can be expressed as a sum of sines and cosines. Though this theorem may sound gibberish at first, we will explore the wonders that this theorem has given us. 🙂

Breaking down a function into sinusoids gives us the “ingredients” in making that function in the frequency domain. This is done using the Fourier Transform, the equation as follows:

F(f_x) = \int_{\infty}^{\infty} f(x) e^{i2\pi xf_x} dx

where F(f_x) is the FT of the function f(x). F(f_x) is the amplitude of the frequencies found in f(x). The equation above is for one dimensional Fourier transform, where graphing F(f_x) will give the peaks corresponding to the values of f_x (frequency) that the function f(x) contains. However, here, we are going to deal with two-dimensional Fourier transforms since we are dealing with images. The 2D Fourier transform is shown:

F(f_x, f_y) = \int_{\infty}^{\infty}\int_{\infty}^{\infty} f(x,y)e^{i2\pi (f_x x + f_y y)} dx dy

where F(f_x, f_y) is the 2DFT of f(x,y), f_x being the frequencies in the x direction and f_y the frequencies in the y direction. However, in the real world, data is discrete when existing sensors, or sampling instruments, sample any data set. Meaning that data is not continuous and that only some values of the data set is captured. An example of this would be the camera. When we look at the properties of the image captured by the camera, we see something that is called the pixel. Images are composed of pixels, so when an image is zoomed in, the details become distinct and not continuous; squares of different colors can be observed. When you compare this to the real world, you cannot see small squares when “zooming in” the real world objects. Though you cannot determine position and momentum of particles in exact values at the same time, but that topic will not be tackled here. =,=

Because real data is discrete, we cannot apply the integral above, since integrals are applicable to continuous systems. Cooley and Tukey (1965) derived the fastest discrete Fourier transform algorithm still used by many programming languages today. The algorithm is often called the Fast Fourier transform (FFT). It is shown by the following equation:

F(f_x, f_y) = \sum\limits_{m=0}^{M-1}\sum\limits_{n=0}^{N-1}f(n,m)e^{i2\pi (\frac{nf_x}{N} + \frac{mf_y}{M})}

where f(n,m) is the value of the image in the n^{th} x value and m^{th} y value with size N \times M. This algorithm is being used in a lot of applications; radar, ultrasound, basically almost anything that includes image processing.

The FFT algorithm in most programming languages has the diagonals of the output interchanged. Because of this, a function called the fftshift is often used to correct the placement of the diagonals. In figure 1, I show the result of the FFT of a circle with aperture radius equal to 0.1 of the maximum radius that can be created, with and without the fftshift function:

circle_fft_fftshift

Figure 1. We show the FFT of a circle(left) without(center) and with(right) the fftshift. Analytically, the FFT of a circle is an airy pattern, which can be seen in the image on the right.

We can see that the fftshift function is really important to obtain the correct FFT of an image. Figure 2 shows the FFT of the letter A, and also the images showing what happens if FFT is applied twice to the same image and if IFFT is applied after the first FFT.

A_images

Figure 2. The top left image shows the letter A. The top right image shows the FFT of letter A (already fftshift-ed). The bottom left image shows what happens if you apply FFT twice to the letter A. The bottom right shows if you apply the inverse FFT after application of FFT to letter A.

It can be seen in figure 2 that if FFT is applied twice, the image becomes inverted, while if IFFT or inverse FFT is applied after application of FFT, the image becomes upright. The next image shows the FFT of circles with different sizes, ranging from 0.1 of the maximum radius to 1 times the maximum radius.

circles_fft

Figure 3. The circles with different radii, shown with their FFTs. The circles have width equal to (from left to right) 0.1, 0.2, 0.4, 0.6, 0.8 and 1 times the maximum radius. The results agree with the theoretical that says, the smaller the circle, the more detailed the airy pattern is. The theory will not be discussed here.

2. Convolution

Convolution is a linear operation that basically means smearing one function with the other (Soriano, 2013) such that the resulting function has properties of both functions. The convolution is represented by an *. So if two functions are convolved with each other, they are represented as (in integral form):

f(x,y) = \int \int h(x',y')g(x-x', y-y') dx' dy'

or in short hand form:

f = g*h

Since the FFT is a linear transform, and convolution is a linear operation, the FFT of a convolution is multiplication of the two functions being convolved. Basically:

\mathcal{F}(f = g*h) \Rightarrow F = GH

where F, H and G are FFTs of f, h and g, respectively. Convolution is used to model sensors and instruments that are used to sample data sets, since sensors also affects the values of the data, whether the sensor be a linear system or second order system or higher order. An example of a sensor that affects how data is captured is a camera, where the resulting image (f) is a convolution of the data itself (g) and the aperture of the camera (h).

The equation of a convolution is oftentimes more difficult than just applying FFTs. So in this part of the activity, we apply FFT to both functions g and h, and multiply them, then apply IFFT to obtain the convolved function f. With g as the VIP image and h as the camera aperture, we apply convolution operation. We show the results in figure 4.

VIP_conv

Figure 4. The top left figure is the VIP target. Then the following images (from left to right, top to bottom) are the results from the VIP image being convolved with different aperture sizes (1.0, 0.8, 0.4, 0.2, 0.1, 0.05, 0.02)

It can be observed from figure 4 that the smaller the aperture size, the clearer the image becomes. This is because the smearing becomes less and less as the aperture size becomes smaller. So the smaller the aperture size, the higher the image quality.

3. Correlation

The second concept that we will discuss here is called correlation. Correlation is the link between two data sets, or in this case, between two images. This concept is often used in template matching because it answers the question “how same are these two images?”. The equation of correlation is given by:

f(x,y) = \int \int g(x', y') h(x + x', y + y')dx'dy'

or in shorthand notation:

f = g \odot h

We can see that the integral equation of correlation looks like the convolution. The two are related by the following, where we can show correlation in a convolution equation:

f(x,y) = g(x,y) * h(-x, -y)

The same as convolution, correlation is a linear operator, so applying FFT will lessen our load (such as not doing the integral shown above). The correlation has the following property:

\mathcal{F}(f = g \odot h) \Rightarrow F = \bar{G} H

where \bar{G} is the complex conjugate of G. We use the correlation in template matching, or in finding out how similar are the two images in a position (x_pix, y_pix). We use the sentence THE RAIN IN SPAIN STAYS MAINLY IN THE PLAIN and correlate it with an image of the letter A using the FFT relationship above. Then by thresh holding, we obtain the positions in which the images have a high correlation. We show the result in figure 5.

Template_matching

Figure 5. We do template matching using the sentence THE RAIN IN SPAIN STAYS MAINLY IN THE PLAIN and the letter A. This basically means we find the places in the second image where the first image can be found, or is most similar. The white dots in the third image in the right shows those locations. If we observe further, the white dots in the third image corresponds to the locations of the A’s in the second image. Because these are the places where the letter A is most similar in image 2.

We can see that after thresh holding, the places with the highest correlation value are the places in the second image where the letter A is found. This is what template matching means.

4. Edge Detection

Edge detection is basically convolving an image with an edge pattern. Patterns that add up to zero, often small patterns, are called edge pattern. In this exercise, the edge patterns are just 3 \times 3 matrices that are padded with zeroes to reach the size of the image being convolved with. We show an example below and its Fourier transform, to show why it is an edge pattern.

array

Figure 6. The edge pattern on the left, and its FFT. Multiplying this FFT to the FFT of any image and applying IFFT to it will give us the edge of the image. This is because the edges of the objects in any image can be seen as the high frequency parts of the image.

There are many different edge patterns. The edge pattern in figure 6 has the following matrix:

-1 -1 -1

-1 8 -1

-1 -1 -1

We call this the spot pattern. We also have the horizontal, the vertical, the diagonal, all shown below

-1 -1 -1                            -1 2 -1                                    2 -1 -1

 2  2  2                             -1 2 -1                                   -1 2 -1

-1 -1 -1                            -1 2 -1                                   -1 -1 2

We can also manipulate the spot pattern such that the number 8 is off-center. We can have the following matrices

8 -1 -1         -1 8 -1       -1 -1 8          -1 -1 -1             -1 -1 -1

-1 -1 -1       -1 -1 -1      -1 -1 -1         8  -1 -1              -1 -1  8

-1 -1 -1       -1 -1 -1      -1 -1 -1         -1 -1 -1              -1 -1 -1

and the following:

                 -1 -1 -1          -1 -1 -1            -1 -1 -1

                 -1 -1 -1          -1 -1 -1            -1 -1 -1

                  8 -1 -1          -1  8  -1            -1 -1  8

If the number 8 is off-center, we call the matrix depending on the position of the 8. So the above matrices are called, from left to right, top to bottom: northwest, north, northeast, west, east, south west, south, and southeast. Using the above matrices as edge patterns, we apply convolution with edge pattern and the VIP image. For the horizontal, vertical and the diagonal edge patterns, the results are shown in figure 7.

horizontal_vertical_diagonal

Figure 7. Edge detection with horizontal, vertical and diagonal edge patterns (from left to right).

We can see in figure 7 that the edge that is most prominent is the edge that follows the edge pattern. For the horizontal edge pattern, the horizontal edges are the most prominent. Likewise for the vertical and the diagonal edge patterns. Figure 8 shows the edge detection using the 9 different spot edge patterns.

edge_spot

Figure 8. Edge detection using spot edge pattern. The left images show the result after using edge spot patterns for edge detection. The placement of the image correspond to the placement of the number 8 in the 3 x 3 matrix used. The right image shows the FFT of the spot edge patterns, with the placement of the image, again, corresponds to the placement of the number 8 in the 3 x 3 matrix.

From figure 8, we can see the the southwest and the northeast edge patterns are the same. The same goes for the northwest and the southeast; the north and the south; and the east and the west. These edges can be used for determining the area of the image by using the Greens theorem. Images can have different prominent lines, so the optimal edge pattern is different for each image.

I have shown the tip of the iceberg of the uses of the FFT. There are a lot of uses of FFT in many different fields, particularly in sensing and signal processing.

The author gives himself a grade of 10/10 for completing the minimum requirements needed for this activity.

Acknowledgements go to these sites:

http://fourier.eng.hmc.edu/e180/e101.1/e101/lectures/Image_Processing/node6.html
https://en.wikipedia.org/wiki/Fourier_transform
http://www.mathsisfun.com/data/correlation.html
http://en.wikibooks.org/wiki/Signals_and_Systems/Time_Domain_Analysis

I will not place here the codes that I have used.

Image enhancement

We were tasked to enhance an image by changing the cummulative distribution function (CDF) of an image. The CDF of a graylevel is just the sum of all the number of pixels with the grayvalue, from grayvalue of  0 to some grayvalue K. This is normalized by the number of pixels present in the image. The CDF is obtained by obtaining the histogram of the grayscale of an image. The histogram is then normalized to obtain the probability density function of the image (PDF). Obtaining the cummulative sum will give us the CDF. The figure we enhance in this activity is shown in figure 1, along with the CDF of the image.

image_cdf

Figure 1. The image to be enhanced is on the left and the Cummulative Distribution Function is on the right. The CDF shows a sudden increase in the amount of pixels having low gray levels. This means that the picture is mostly black.

LINEAR

First, to enhance the image in Figure 1, we impose a linear CDF, that is, a normal PDF. This is quite easy to do since the goal is a linear CDF, we just need to multiply the CDF of the original image by 255. The general process is shown in figure 2.

steps_cdf

Figure 2. The steps on how to alter the gray scale distribution. (1) The gray level of the pixel must be known, and the corresponding value in the CDF. Then the CDF value is projected onto the desired CDF, and finally, the corresponding gray level of the pixel is the new gray level.[1]

For the first part, we obtain the enhanced image with a linear desired CDF. Figure 3 shows the enhanced image, along with the CDF of the resultant image, and the desired CDF.

linear_cdf_name

f igure 3. The enhanced image with the resulting CDF(center) and the desired(right) CDF. It can be seen that though the resulting CDf kinda follows the desired CDF, the resulting CDF shows a ladder pattern

From figure 3, it can be seen that the resulting CDF follows the desired CDF but has a ladder pattern. This means that even though we force the CDF to change, the previous CDF of the image still acts as a constraint, preventing the resulting CDF to completely follow the desired CDF.

SIGMOID

A sigmoid function is a function that looks like the letter S. The function is given by:

                                                                                                             S(t) = 1/(exp(-t) + 1)                                                                     (1)

If we add the scaling factors, the translation and the size of the sigmoid function can be changed. Applying the scaling factors, (1) becomes:

                                                                                                          S(t) = 1/(exp((-t + T)/d) + 1)                                                          (2)

Where T is the translation and d is the width of the sigmoid. Conversely, we can find the t if we are given the S(t). In our case, the S(t) is the desired CDF and t is the gray level corresponding to the CDF. Doing some mathematical derivation, we arrive at:

                                                                                                                 t = -dlog(1/s – 1)  +T                                                                     (3)

First we need to find the optimum T. Our code (see below), has a line with a function uint8. this ensures that the values are between 0 and 255. This function obtains the modulo of every value of 256. Choosing a random d to find the optimum T, (d = 32), we generate the following images, with different T.

0to120

150-270

Figure 4. Images showing different translation T in the computation. On the upper left is T = 0, and with 30 increments thereafter. We know that the image was created with a black pen on a white background. In the figure above, the image we picked is the image in the lower leftmost figure. this is because, the contrast is not that high or low.

From figure 4, we used T = 150. For the next part, we change the width factor d in the calculation, from 1, 2, 4, 8, 16, 32, 64, and 128. Figure 5 shows the CDF from 1 – 128 (in exponents of 2). (I call it width since it seems to be similar to the gaussian standard dev where beam width is 1/e)

theo_cdf

Figure 5. The CDF used, with different CDF sizes ( innermost is 1 while outermost is 128, in exponents of 2)

Using the CDFs above as the desired images, we generate the following images(with the corresponding resulting CDF’s):

some

Figure 6. The enhanced images with different CDF widths. Left column has width = 1, 2, 4, 8(top to bottom), While the right column has width = 16, 32, 64, 128 (top to bottom)

What seems peculiar in figure 6 is that, when the CDF width is chosen to be 128, there can be no discernible letter in the image. Looking at the image, we could say that the most enhanced image would be the image with width equal to 32 and translation equal to 150. it can be seen that the resulting CDF follows the desired CDF exept for the lower 3 images in the right column.

Other softwares also have this kind of manipulation. The histogram can be manipulated by dragging the diagonal line. however I am not going to discuss that here.

I give myself a credit of 11/10 since i investigated the effect of translating and manipulating the width of the sigmoid.

[1] Laboratory manual by Dr. Maricor Soriano

CODE for Sigmoidim = imread(‘C:\Users\ledy_rheill\Google Drive\Documents\Academics\13-14_1stsem\AP186\activity 6 27 Jun 2013\image.png’);

im_gray = rgb2gray(im);//convert to grayscale

his_plot = CreateHistogram(im_gray, 256); // obtain histogram

size_xy = size(im_gray);//obtain sizes for normalizatio

narea = size_xy(1,2) * size_xy(1,1); //take area for normalization

summed_his = cumsum(his_plot)/area; //normalized cummulative sum

//first we use a linear cummulative distribution function where the probability distribution function is normal (cdf is a line), therefore we just multiply the cdf of the original image by 256

new_summed = summed_his(im_gray + 1);//converts the image gray values to the corresponding cummulative sum. there are 256 values of cdf, but array starts at 1

im_gray2 =-1*log(2 ./(new_summed + 1.000002) – 1) ;// since CDF is linear, the correspondence is also linear, so we just multiply the CDF by the value of the pixel with cdf equal to 1, which is

max(new_summed) * 256 -1im_gray_new2 = matrix(real(im_gray2), size_xy(1,1), size_xy(1,2));// since it was flattened to one d, we convert it back to 3D

im2 = uint8(im_gray_new2);//convert to 8bit imageimshow(im2);

his_plot2 = CreateHistogram(im2);summed_his2 = cumsum(his_plot2)/area

imwrite(im2, ‘C:\Users\ledy_rheill\Google Drive\Documents\Academics\13-14_1stsem\AP186\activity 6 27 Jun 2013\sigmoid_cdf_128width_flower.png’)