Title | Discrete fast Fourier transform | |

Author | May Boggess, StataCorp | |

Date | October 2005 |

For many of us, the last time we saw a Fourier series was in math class, where the goal was to decompose a function on [0,2π] into its frequency components by expressing it as a linear combination of {sin(nx), cos(nx), n=0... infinity}. The coefficients of this linear combination are called the Fourier coefficients.

In real life, we do not have the entire function at our disposal; we have just a sample, that is, a finite set of values of the function at particular points. If we sample 8 points, we cannot hope to determine an infinite number of coefficients. With 8 points, we will only be able to calculate 8 complex coefficients. Thus we usually try to sample as many points as we can. Fortunately, the fast Fourier transform is an algorithm for computing the coefficients that is, well, very fast (Monahan 2001, sec. 14.5).

You should to be aware that the FFT algorithm requires the number of sampled
points to be a power of 2. If you do not, you can make more points by
padding with zeros, but that introduces a minor complication (that we will
discuss later), so it is easier to stick with 2^{m} points.

The FFT algorithm for calculating the coefficients is well documented (Bloomfield 2000, sec. 5.7; Olver and Shakiban 2006). For 8 sampled points, we obtain 8 complex coefficients, that is, 8 pairs of real numbers. Using these coefficients, we can write down a function that is a sum of sines and cosines that passes through all the 8 sampled points

where (x_{j}, y_{j}) are the sampled points and
a_{j}, b_{j} are the Fourier coefficients. The FFT
algorithm requires that the x-values be equally spaced, so, for 8 points on
[0,2π], we would have x_{j} = 2π j/8, j=0...7. Because

we can write the above sum using lower frequency components:

For our first example, let’s use y=sin(x) on [0,2π] with 8 sampled points (we are using a trivial example so that it will have nice Fourier coefficients):

clear set obs 8 gen x=(_n-1)*(2*_pi)/8 gen y=sin(x) mata: h=st_data(.,"y") mata: fft(h) # delimit; twoway scatter y x || function y=1/8*(4*sin(1*x)-4*sin( 7*x)), range(x) || function y=1/8*(4*sin(1*x)-4*sin(-1*x)), range(x) ||, legend(cols(1) label(1 "sampled values") label(2 "sum of high frequency components") label(3 "sum of low frequency components")) title("Reconstructed signal for y=sin(x) sampled at 8 points"); # delimit cr

The low-frequency components are often preferred. As the above example
showed, the order of the coefficients Mata put into the matrix H is 0
though 7. To make it easy to use the low-frequency components, the Mata
function
**ftunwrap()** orders
the coefficients in the low-frequency way: −3, −2, −1, 0,
1, 2, 3, 4. Let’s see this with another example:

clear set obs 8 gen x=(_n-1)*(2*_pi)/8 gen y=sin(x)+cos(2*x) mata: h=st_data(.,"y") H=fft(h) re=Re(H);im=Im(H) re=ftunwrap(re);im=ftunwrap(im) st_matrix("H",(re, im)) end matrix list H set obs 101 gen xhat=(_n-1)*2*_pi/100 # delimit; gen yhat=(1/8)*( H[1,1]*cos(-3*xhat)+H[1,2]*sin(-3*xhat) +H[2,1]*cos(-2*xhat)+H[2,2]*sin(-2*xhat) +H[3,1]*cos(-1*xhat)+H[3,2]*sin(-1*xhat) +H[4,1]*cos(0*xhat) +H[4,2]*sin(0*xhat) +H[5,1]*cos(1*xhat) +H[5,2]*sin(1*xhat) +H[6,1]*cos(2*xhat) +H[6,2]*sin(1*xhat) +H[7,1]*cos(3*xhat) +H[7,2]*sin(2*xhat) +H[8,1]*cos(4*xhat) +H[8,2]*sin(3*xhat)); twoway scatter y x || line yhat xhat, legend(label(1 "sampled points") label(2 "reconstructed signal")) title("Reconstructed signal for y=sin(x)+cos(2x)"); # delimit cr

Let’s go back to the minor complication we mentioned earlier: what
happens if we do not have 2^{m} sampled points? Suppose we
have 100 points. In this case, we add 28 zeros on the end of the function
using the Mata function
**ftpad()** so that 128
points are “sampled”. That means Mata will have the wrong
impression about what the x-values are for the sampled points: it will
think we have 128 equally spaced points and the last 28 of them
have zero y-value:

clear set obs 100 gen x=(_n-1)*(2*_pi)/100 gen y=x^2-2*x+1 mata: h=st_data(.,"y") H=fft(ftpad(h)) re=Re(H);im=Im(H) re=ftunwrap(re);im=ftunwrap(im) st_matrix("H",(re, im)) end local j=0 gen yhat=0 forvalues i=-63(1)64{ local j=`j'+1 quietly replace yhat=yhat+H[`j',1]*cos(`i'*x) /* */ +H[`j',2]*sin(`i'*x) } replace yhat=(1/128)*yhat #delimit; twoway scatter y x, msize(vsmall) || line yhat x, legend(label(1 "sampled points") label(2 "incorrect reconstructed signal")) title("Incorrect reconstructed signal for y=x^2-2x+1") subtitle("sampled at 100 points"); #delimit cr

Well, we certainly have not got that right. When we get the coefficients back from Mata, we should rescale and cut off the last piece of the function to get the original function back again:

clear set obs 100 gen x=(_n-1)*(2*_pi)/100 gen y=x^2-2*x+1 mata: h=st_data(.,"y") H=fft(ftpad(h)) re=Re(H);im=Im(H) re=ftunwrap(re);im=ftunwrap(im) st_matrix("H",(re, im)) end gen realx=(_n-1)*(2*_pi)/128 in 1/100 local j=0 gen yhat=0 quietly forvalues i=-63(1)64{ local j=`j'+1 replace yhat=yhat +H[`j',1]*cos(`i'*realx) /* */ +H[`j',2]*sin(`i'*realx) } replace yhat=(1/128)*yhat #delimit; twoway scatter y x, msize(small) ms(Oh)|| line yhat x, legend(label(1 "sampled points") label(2 "correct reconstructed signal")) title("Correct reconstructed signal for y=x^2-2x+1") subtitle("sampled at 100 points"); #delimit cr

If we have sampled many points, we do not need to reconstruct the function
using sin and cos (because we can get a good graph using just the points we
have). In this case, we can use the Mata function
**invfft()** to transform
back from the coefficients to the y-values of the sampled function:

clear set obs 128 gen x=(_n-1)*(2*_pi)/128 gen y=sin(x)+cos(2*x) gen yhat=. mata: h_in=st_data(.,"y") H=fft(h_in) h_out=invfft(H) h_out st_store((1,128),"yhat",h_out) end twoway scatter y x || line yhat x, /* */ legend(label(1 "sampled points") /* */ label(2 "reconstructed signal")) /* */ title("Using FFT and invFFT on y=sin(x)+cos(2x)")

There are two common applications of Fourier series. The first is to filter out high-frequency noise in a sound signal (Olver and Shakiban 2006, 285; Bloomfield 2000, sec. 5.7). We can reduce the noise in a signal by taking its Fourier decomposition and discarding the high frequency components, that is, setting the high frequency coefficients to zero (note if we had sampled 8 points, the highest frequencies would be 3 and 4, not 6 and 7) (Boggess and Narcowich 2001, 139):

clear set obs 256 gen t=(_n-1)*(2*_pi)/256 gen y=exp(-(cos(t))^2)*(sin(2*t)+2*cos(4*t)+0.4*sin(t)*sin(50*t)) gen yhat=. gen coef=. mata: h_in=st_data(.,"y") H=fft(h_in) H[(6::249),.]=J(244,1,0) h_out=invfft(H) h_out=Re(h_out) st_store((1,256),"yhat",h_out) end twoway line yhat y t, /* */ legend(order(2 1) label(2 "sampled points") /* */ label(1 "filtered signal")) /* */ title("Filtered y=exp(-cos^2(t))(sin(2t)+2cos(4t)+0.4sin(t)sin(50t))")

A second common application is data compression (Boggess and Narcowich 2001, 139). In this case, we discard the coefficients with least absolute value:

clear set obs 256 gen t=(_n-1)*(2*_pi)/256 gen y=exp(-t^2/10)*(sin(2*t)+2*cos(4*t)+0.4*sin(t)*sin(10*t)) gen yhat=. mata: h_in=st_data(.,"y") H=fft(h_in) abs=abs(H) sorted=uniqrows(abs) sorted[205,1] sorted[206,1] for (i=1; i<=256; i++){ if (abs(H[i,1])<sorted[205,1]){ H[i,1] = 0 } } H h_out=invfft(H); h_out=Re(h_out) st_store((1,256),"yhat",h_out) end twoway line yhat y t, /* */ legend(order(2 1) label(2 "sampled points") /* */ label(1 "80% compressed signal")) /* */ title("80% compression") /* */ subtitle("y=exp((-t^2/10))(sin(2t)+2cos(4t)+0.4sin(t)sin(10t))")

- Bloomfield, P. 2000.
*Fourier Analysis of Time Series*. New York: Wiley.

- Boggess, A., and F. Narcowich. 2001.
*A First Course in Wavelets with Fourier Analysis*. Upper Saddle River, NJ: Prentice Hall.

- Monahan, J. 2001.
*Numerical Methods of Statistics*. Cambridge: Cambridge University Press.

- Olver, P. and C. Shakiban. 2006.
*Applied Linear Algebra*. Upper Saddle River, NJ: Prentice Hall.