FAST COMPUTATION OF GENERAL FOURIER TRANSFORMS ON GPUS

D. Brandon Lloyd Chas Boyd Naga Govindaraju

Microsoft Corporation

Источник http://research.microsoft.com/pubs/70576/tr-2008-62.pdf

ABSTRACT

We present an implementation of general FFTs for graphics processing units (GPUs). Unlike most existing GPU FFT implementations, we handle both complex and real data of any size that can fit in a texture. The basic building block for our algorithms is a radix-2 Stockham formulation of the FFT for power-of-two data sizes that avoids expensive bit reversals and exploits the high GPU memory bandwidth efficiently. We implemented our algorithms using the DirectX 9 API, which enables our routines to be used on many of the existing GPUs today. We have performed comparisons against optimized CPU-based and GPU-based FFT libraries (Intel Math Kernel Library and NVIDIA CUFFT, respectively). Our results on an NVIDIA GeForce 8800 GTX GPU indicate a significant performance improvement over the existing libraries for many input cases.

Index Terms— graphics hardware, FFT, GPGPU

1. INTRODUCTION

Though commodity graphics processors (GPUs) have been traditionally used for real-time 3D rendering in visualization applications and video games, their raw computational power and relatively low cost has made them increasingly attractive for more general purpose, data-parallel computations. The performance of GPUs comes from their large number of cores and high memory bandwidth. For example, the GeForce 8800 GTX GPU has 128 scalar processors and 86 GB/s peak memory bandwidth. GPUs are well-suited for a number of multimedia applications including signal processing for audio, images, and video. An important component of these applications is the Fast Fourier Transform (FFT). In this paper we discuss how the GPU can be used for high performance computation of general FFTs.

A number of FFT implementations for the GPU already exist, but these are either limited to specific hardware or they are limited in functionality. Probably the most general FFT implementation for GPUs available today is the CUFFT library. CUFFT handles FFTs of varying sizes on both real and complex data. However, CUFFT is written in CUDA, a programming interface that is specific to only the most recent NVIDIA GPUs. A recent survey of over 800,000 gaming enthusiasts revealed that only about 15% of those surveyed had a GPU capable of running CUDA. Those numbers are probably much lower for a more general user base. To support multiple generations of GPUs from different vendors, some FFT libraries are written in the high-level shading languages found in standard graphics APIs such as OpenGL or DirectX. However, all of these implementations share the same limitation – they are restricted to sizes that are a power of two.

In this paper we describe our FFT library which is both general in terms of the GPUs it supports and its functionality. Our current FFT library is written using DirectX 9. We handle 1D and 2D FFTs for power-of-two and non-power-of-two sizes on both real and complex data, using a simple API that chooses the appropriate algorithm for a given input. Our focus has been on general algorithms that ensure the availability of functionality over a wide range of GPUs more than on attaining maximum performance for any specific GPU. Nevertheless, even our general implementation on newer GPUs typically outperforms the same computation on the CPU, while achieving comparable performance to vendor-specific implementations such as CUFFT.

The rest of this paper is organized as follows. In Section 2 we present some background information on the GPU programming model and the Fourier Transform. We briefly discuss previous work in Section 3. In Section 4 we present the details of our FFT routines. We show some results using our library in Section 5 and conclude with some ideas for future work.