Introduction
Fast Fourier transformation is applied in a lot of branches of science: radiolocation, compression of video and images, geology. A lot of such problems require computing the transformation in real time.
It is possible to make parallel solving of the problem to decrease the time, necessary for computing of the transformation.
The target of this work is researching the algorithms of fast Fourier transformation for decimal signals. And further development of the program, working on parallel computing system, computing the result in real time.
The main goals are:
- researching different algorithms of fast Fourier transformation;
- researching the parallel computing systems;
- of the program, computing the fast Fourier transformation in real time.
Originality
Originality of this work is in computing two dimensional fast Fourier transformation for big images in real time. That mean’s that the received results would be able to be further effectively applied, without noticeable time delay.
Results
It is planned to receive a program, that allows to compute two dimensional fast Fourier transformation for big images, such as photos from the satellite.
Related work
are a lot of related researches in the world. They are using different computer architecture. The main architectures are:
- claster;
- graphic processor units(GPU);
- Cell processors.
Among huge amount of researches it is especially noticable the researches of Microsoft employee’s Naga K. Govindaraju, Brandon Lloyd, Yuri Dotsenko, Burton Smith, John Manferdelli and also Vasily Volkov. In their article “” they analyzed the existing work on this topic:
“Sorensen and Burrus compiled a database of over 3400 entries on efficient algorithms for the FFT [2]. We refer the reader to the book by Van Loan [3] which provides a matrix framework for understanding many of the algorithmic variations of the FFT. The book also touches on many important implementation issues.
The research most related to our work involves accelerating FFT computation by using commodity hardware such as GPUs or Cell processors. Most implementations of the FFTs on the GPU use graphics APIs such as current versions of OpenGL or DirectX [4], [5], [6], [7], [8], [9]. However, these APIs do not directly support scatters, access to shared memory, or fine-grained synchronization available on modern GPUs. Access to these features is currently provided only by vendorspecific APIs. NVIDIA’s FFT library, CUFFT [10], uses the CUDA API [11] to achieve higher performance than is possible with graphics APIs. Concurrent work by Volkov and Kazian [12] discusses the implementation of FFT with CUDA.”[1]
Also Vasily Volkov has developed an improved the algorithm of fast Fourier transformation, used in standard library of NVIDIA CUFFT.
In Ukraine related researches were made by:
- Trojanskij A.V. in the dissertation "Line processors of fast transformation of Fourier with reconstructed architecture on odnotype SBIS" [18];
- Gjurdzhjan M.K. in the dissertation on a theme "Some questions of the organisation of parallel calculations in multiprocessing computing systems of claster type" [19];
- Potopalsky V.V. in work "The Estimation of quality of transformation of Fourier in radio engineering and television systems" [20];
- Kerekesha P.V. in the dissertation on a theme "The Combined method of transformation of Fourier and interface of analytical functions in problems of the theory of elasticity" [21].
Among teachers and students of Donetsk National Technical University related work is researched by Smolyanaya Darya in her master’s work “Researching The Effictivity Of Parallel Realization Of Fast Fourier Transformation For Digital Images”
The Fourier transformation
Fourier transformation is an operation, that matches the function of substantial variable another function of substantial variable. The coefficients of new function describes the amplitudes and phases for decomposition of old function on elementary parts – harmonic wavering with different frequencies.
Discrete Fourier transformation
Discrete Fourier transformation is called the (DFT) transformation on the input sequence of numbers
{x-0, x-1, …, x-n-1}
with power of n є N, n>1, where xi є Z, i = (0, n-1). The result of transformation is the sequence
Picture 1 - Formula for computing DFT
where k = (0, n-1), W = e-j*?/n, where j is virtual one.[22]
Fast Fourier Transformation
The algorithm of fast Fourier transformation(FFT) is the optimized upon the speed of calculation DFT. The main ideas are the next:
- to divide the sum (picture 1) of N addendums into two sums of N/2 addendums, and calculate them separately. To calculate each sum they are also splitted into two sums, and so on.;
- use few times already calculated addendums.
The main formulas of FFT:
Picture 2 - The main formulas for FFT algorithm
Let’s see an example of FFT where N=16 (picture 3).
Picture 3 – Mixing the elements of the sequence x
Total it is carried out (log2(N))-1 iterations.
We Will consider binary representation of numbers of elements and places occupied by them. The element with number 0 (binary 0000) after all shifts takes of a position 0 (0000), an element 8 (1000) - a position 1 (0001), an element 4 (0100) - a position 2 (0010), an element 12 (1100) - a position 3 (0011). And so on. It is easy to notice communication between binary representation of a position before shifts and after all shifts: they are mirror symmetric. Binary representation of a final position turns out from binary representation of an initial position by shift of bits upside-down. And on the contrary [3].
This fact is not accident for concrete N=16, and is law. On the first step even elements with number n have moved to a position n/2, and odd of a position in position N/2 + (n-1)/2. Where n=0,1, …, N-1.
In picture 4 is illustrated the second stage of calculation DFT.
Picture 4 – The Second step FFT
(5 frames, 276х204 pixels, 10 repeats, delay 1s)
The usage of elements for calculating the values of new elements is shown by lines from top to bottom. Very conveniently that two elements on certain positions in the array give two elements on the same places. But they already will belong to other, longer arrays placed on a former place of more short arrays. It allows to manage in one array the initial data, result and storage of intermediate results for all T iterations. [2]
Two dimention discrete Fourier's transformation
Two dimention discrete Fourier's transformation – is discrete Fourier's transformation upon matrix, that is calculated upon the formula:
Picture 5 - Two dimensional DFT
where k, i = (0, n-1), l, j = (0, m-1). That it is consecutive one-dimensional DFT at first over lines, and then over columns.[22]
NVIDIA CUDA technology
CUDA (Compute Unified Device Architecture) - is a technology developed by company NVidia, designed for developing the applications on parallel computing devices.
Technology CUDA works on video cards NVIDIA starting from 8400GS series and higher. Different types of video cards have different opportunities. In general, for example, if video card has 128 SP (Streaming Processor) that means therу are 8 SIMD MP (multiprocessor), each of one computes 16 operations simultaneously.
Few types of memory are available for programming: registers, local, global, shared, constant and texture memory. Each of this memory types has it’s own destination, that is specified by it’s technical parameters.
The main advantages of CUDA technology are it’s free (SDK for all of the platforms is available on developer.nvidia.com), simple programming (programs are writed on extended C) and flexibility.[23]
Also it is possible to improve the productivity by joining several video cards. [24]
NVIDIA CUDA and FFT
Special library CUFFT exists in CUDA technology for computing FFT. It’s disadvantage is the limit on the power of transformation. Also the calculation of double precision numbers is supported only in the latest versions of this library.
Results
At the present moment they were explored the algorithms of computing two dimentional discrete fast Fourier transformation. Also there were reviewed existing researches on this topic, studied the most effective and useful of them for the future development applying. Especially there were studied the algorithms for parallel computation.
A program calculating decimal fast Fourier's transformation for images was written. The time of computing on personal computer for different powers of transformation is listed in table 1.
Table 1 - Time of computing of decimal fast Fourier transformation
Height | Width | Time |
4 | 4 | 0,0468 s |
8 | 8 | 0,0625 s |
16 | 16 | 0,0937 s |
64 | 64 | 0,3437 s |
512 | 512 | 5,5781 s |
1024 | 1024 | 26,8125 s |
2048 | 2048 | 1 min 54,859 s |
New technology of computing on graphic processors NVIDIA CUDA was studied. It’s main advantages and disadvantages were found. Also it was compiled and successfully executed text program on GPU NVIDIA GeForce 8600 GT.
In a month it is planned to write a program, computing two dimensional FFT using technology NVIDIA Cuda.
Conclusion
The architecture and the variability of different FFT algorithms are giving the opportunities of computation much more exceeding the power of ordinary personal computer. The compute capabilities of parallel computing systems allow to reach planning results of solving discrete two dimensional FFT in the real time for big images.
During writing this referat, the master’s work is not finished. It will be finished in December 2009. The full version of work will be available after previously named date. For more information contact the author or her master’s leader.
References
1. Говидараю Н.К., Ллойд Б., Доценко Ю., Смит Б., Манферделли Д. Высокопроизводительные дискретные преобразования Фурье на графических процессорах/ компания Microsoft, - ../library/translate.htm
2. Sorensen H.V., Burrus C. S. Fast Fourier Transform Database.PWS Publishing, 1995.
3. Loan C.V. Computational Frameworks for the Fast Fourier Transform / Society for Industrial Mathematics, 1992.
4. Moreland K., Angel E. The FFT on a GPU / Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware, 2003.
5. Spitzer J. Implementing a GPU-efficient FFT / SIGGRAPH Course on Interactive Geometric and Scientific Computations with Graphics Hardware, 2003.
6. Mitchell J.L., Ansari M.Y., Hart E. Advanced image processing with DirectX 9 pixel shaders / ShaderX2: Shader Programming Tips and Tricks with DirectX 9.0, W. Engel, Ed. Wordware Publishing, Inc.,2003.
7. Jansen T., B. Rymon-Lipinski B., Hanssen N., Keeve E. Fourier volume rendering on the GPU using a split-stream-FFT / Proceedings of the Vision, Modeling, and Visualization Conference 2004.
8. Sumanaweera T., Liu D. Medical image reconstruction with the FFT / GPU Gems 2, Pharr M., Ed. Addison-Wesley, 2005.
9. Govindaraju N.K, Larsen S., Gray J., Manocha D. A memory model for scientific algorithms on graphics processors / Supercomputing 2006.
10. CUDA CUFFT Library/ Официальный сайт разработчика, - http://developer.download.nvidia.com/compute/cuda/2_1/toolkit/docs/CUFFT_Library_2.1.pdf
11. NVIDIA CUDA: Compute Unified Device Architecture / NVIDIA Corp., 2007.
12. Volkov V., Kazian B. Fitting FFT onto the G80 architecture / компания Microsoft, - http:www.cs.berkeley.edu/kubitron/courses/cs258-S08/projects/reports/project6 report.pdf
13. Chow A.C., Fossum G.C, Brokenshire D.A. A programming example: Large FFT on the cell broadband engine / Whitepaper, 2005.
14. Cico L., Cooper R., Greene J. Performance and programmability of the IBM/Sony/Toshiba Cell broadband engine processor / Whitepaper, 2006.
15. Williams S., Shalf J., Oliker L., Kamil S., Husbands P., Yelick K. The potential of the cell processor for scientific computing / CF 06:Proceedings of the 3rd Conference on Computing Frontiers, 2006, pp.9–20.
16. Bader D.A., Agarwal V. FFTC: fastest Fourier transform for the IBM Cell broadband engine / 14th IEEE International Conference on High Performance Computing (HiPC), pp. 172–184, 2007.
17. Frigo M., Johnson S. G. FFTW on the cell processor / компания Microsoft, - http://www.fftw.org/cell/index.html
18. Троянский А.В. Поточные процессоры быстрого преобразования Фурье с перестраиваемой архитектурой на однотипных СБИС: Дис. канд. техн. наук: 05.12.21 / Одесский гос. политехнический ун-т. - О., 1997. - 185с. -
http://www.dissua.com/contua/pk-15757.html
19. Гюрджян М. К. Некоторые вопросы организации параллельных вычислений в многопроцессорных вычислительных системах кластерного типа: дис. канд. техн. наук: 05.13.05 / Институт проблем информатики и автоматизации НАН РА. - Ереван, 2004. -
http://www.dissua.com/contua/pk-20301.html
20. Потопальский В. В. Оценка качества преобразования Фурье в радиотехнических и телевизионных системах : дис.канд.техн.наук: 05.12.17 / АН Украины. — Львов, 1993. — 189с. -
http://www.dissua.com/contua/pk-260506.html
21. Керекеша П.В. Комбинированный метод преобразования Фурье и сопряжения аналитических функций в задачах теории упругости: Дис. д-ра физ.-мат. наук: 01.02.04 / Одесский гос. ун-т им. И.И.Мечникова. - О., 1999. - 314с. -
http://www.dissua.com/contua/pk-11181.html
22. Блейхут Р. Быстрые алгоритмы цифровой обработки сигналов. Москва, Мир – 1989. 248 с.
23. NVIdiA Cuda Programming Guide / сайт NVIDIA, - http://developer.download.nvidia.com/compute/cuda/2_2/toolkit/docs/NVIDIA_CUDA_Programming_Guide_2.2.pdf
24. Schivea H., Chiena C., Wonga S.,Tsaia Y., Chiueha T. Graphic-Card Cluster for Astrophysics (GraCCA) - Performance Tests / Сборник статей по математике, физике, программированию и др., - http://xxx.lanl.gov/ftp/arxiv/papers/0707/0707.2991.pdf