FAST COMPUTATION OF
GENERAL FOURIER TRANSFORMS ON GPUS
D. Brandon Lloyd
Chas Boyd Naga Govindaraju
Microsoft
Corporation
Èñòî÷íèê http://research.microsoft.com/pubs/70576/tr-2008-62.pdf
ABSTRACT
We
present an implementation of general FFTs for
graphics processing units (GPUs). Unlike most
existing GPU FFT implementations, we handle both complex and real data of any
size that can fit in a texture. The basic building block for our algorithms is
a radix-2 Stockham formulation of the FFT for
power-of-two data sizes that avoids expensive bit reversals and exploits the
high GPU memory bandwidth efficiently. We implemented our algorithms using the
DirectX 9 API, which enables our routines to be used on many of the existing GPUs today. We have performed comparisons against optimized
CPU-based and GPU-based FFT libraries (Intel Math Kernel Library and NVIDIA
CUFFT, respectively). Our results on an NVIDIA GeForce
8800 GTX GPU indicate a significant performance improvement over the existing
libraries for many input cases.
Index
Terms— graphics hardware, FFT, GPGPU
1.
INTRODUCTION
Though
commodity graphics processors (GPUs) have been traditionally
used for real-time 3D rendering in visualization applications and video games,
their raw computational power and relatively low cost has made them
increasingly attractive for more general purpose, data-parallel computations.
The performance of GPUs comes from their large number
of cores and high memory bandwidth. For example, the GeForce
8800 GTX GPU has 128 scalar processors and 86 GB/s peak memory bandwidth. GPUs are well-suited for a number of multimedia
applications including signal processing for audio, images, and video. An
important component of these applications is the Fast Fourier Transform (FFT).
In this paper we discuss how the GPU can be used for high performance
computation of general FFTs.
A
number of FFT implementations for the GPU already exist, but these are either
limited to specific hardware or they are limited in functionality. Probably the
most general FFT implementation for GPUs available
today is the CUFFT library. CUFFT handles FFTs of
varying sizes on both real and complex data. However, CUFFT is written in CUDA,
a programming interface that is specific to only the most recent NVIDIA GPUs. A recent survey of over 800,000 gaming enthusiasts
revealed that only about 15% of those surveyed had a GPU capable of running CUDA.
Those numbers are probably much lower for a more general user base. To support
multiple generations of GPUs from different vendors,
some FFT libraries are written in the high-level shading languages found in
standard graphics APIs such as OpenGL or DirectX. However, all of these
implementations share the same limitation – they are restricted to sizes that
are a power of two.
In this
paper we describe our FFT library which is both general in terms of the GPUs it supports and its functionality. Our current FFT
library is written using DirectX 9. We handle 1D and 2D FFTs
for power-of-two and non-power-of-two sizes on both real and complex data,
using a simple API that chooses the appropriate algorithm for a given input.
Our focus has been on general algorithms that ensure the availability of
functionality over a wide range of GPUs more than on
attaining maximum performance for any specific GPU. Nevertheless, even our
general implementation on newer GPUs typically
outperforms the same computation on the CPU, while achieving comparable
performance to vendor-specific implementations such as CUFFT.
The
rest of this paper is organized as follows. In Section 2 we present some
background information on the GPU programming model and the Fourier Transform.
We briefly discuss previous work in Section