This manual documents version 2.0.1 of FFTW, the Fastest Fourier Transform in the West. FFTW is a comprehensive collection of fast C routines for computing the discrete Fourier transform (DFT) in one or more dimensions, of both real and complex data, and of arbitrary input size. FFTW also includes parallel transforms, documented elsewhere. We assume herein that the reader is already familiar with the properties and uses of the DFT that are relevant to her application. Otherwise, see e.g. The Fast Fourier Transform by E. O. Brigham (Prentice-Hall, Englewood Cliffs, NJ, 1974). Our web page also has links to FFT-related information online.
FFTW is usually faster (and sometimes much faster) than all other freely-available Fourier transform programs found on the Net. For transforms whose size is a power of two, it compares favorably with the FFT codes in Sun's Performance Library and IBM's ESSL library, which are targeted at specific machines. Moreover, FFTW's performance is portable. Indeed, FFTW is unique in that automatically adapts itself to your machine, your cache, the size of your memory, the number of registers, and all the other factors that normally make it impossible to optimize a program for more than one machine. An extensive comparison of FFTW's performance with that of other Fourier transform codes has been made. The results are available on the Web at the benchFFT home page.
In order to use FFTW effectively, you need to understand one basic concept of FFTW's internal structure. FFTW does not used a fixed algorithm for computing the transform, but it can adapt the DFT algorithm to details of the underlying hardware in order to achieve best performance. Hence, the computation of the transform is split into two phases. First, FFTW's planner is called, which "learns" the fastest way to compute the transform on your machine. The planner produces a data structure called a plan that contains this information. Subsequently, the plan is passed to FFTW's executor, along with an array of input data. The executor computes the actual transform, as dictated by the plan. The plan can be reused as many times as needed. In typical high-performance applications, many transforms of the same size are computed, and consequently a relatively-expensive initialization of this sort is acceptable. FFTW also provides fast planners based on heuristics or on stored plans for cases where the one-time cost of the planner is significant.
This pattern of planning/execution applies to all four operation modes of FFTW, that is, I) one-dimensional complex transforms (FFTW), II) multi-dimensional complex transforms (FFTWND), III) one-dimensional transforms of real data (RFFTW), IV) multi-dimensional transforms of real data (RFFTWND). Each mode comes with its own planner and executor.
Besides the automatic performance adaptation performed by the planner, it is also possible for advanced users to customize FFTW for their special needs. As distributed, FFTW works most efficiently for arrays whose size can be factored into small primes (2, 3, 5, and 7), and uses a slower general-purpose routine for other factors. FFTW, however, comes with a code generator that can produce fast C programs for any particular array size you may care about. For example, if you need transforms of size 513 = 19*33, you can customize FFTW to support the factor 19 efficiently.
For more information regarding FFTW, see the paper, "The Fastest Fourier Transform in the West," by M. Frigo and S. G. Johnson, which is the technical report MIT-LCS-TR-728 (Sep. '97). See also, "FFTW: An Adaptive Software Architecture for the FFT," by M. Frigo and S. G. Johnson, which appeared in the 23rd International Conference on Acoustics, Speech, and Signal Processing (Proc. ICASSP 1998 3, p. 1381). These papers, along with the latest version of FFTW, the FAQ, benchmarks, and other links, are available at the FFTW home page. The current version of FFTW incorporates many good ideas from the past thirty years of FFT literature. In one way or another, FFTW uses the Cooley-Tukey algorithm, the Prime Factor algorithm, Rader's algorithm for prime sizes, and the split-radix algorithm (with a variation due to Dan Bernstein). Our code generator also produces new algorithms that we do not yet completely understand. The reader is referred to the cited papers for the appropriate references.
The rest of this manual is organized as follows. Section Tutorial describes the basic features of FFTW, and will let you use FFTW quickly. Various issues are explored in more detail by Section Words of Wisdom, Section Multi-dimensional Array Format, and Section Parallel Computing and FFTW. Section FFTW Reference provides comprehensive documentation of all FFTW's features. Section Installation and Customization explains how to install FFTW in your computer system and how to adapt FFTW to your needs. License and copyright information is given in Section License and Copyright. Finally, we thank all the people who helped us in Section Acknowledgments.
Go to the first, previous, next, last section, table of contents.