Gennadiy Voitov Digital signal processing in FPGA

Gennadiy Voitov

Faculty: Computer Science
Speciality: Computer Systems and Networks

Theme of master's work:

Digital signal processing in FPGA

Scientific adviser: Zinchenko Yuriy

About author

Summary of research and developments

"Review of existing digital signal processing systems"

Content

Introduction
Comparison of different types of devices, digital signal processing
The hybrid architecture of FPGA / DSP
Using the FPGA as a coprocessor
Using a combination of DSP / FPGA in programmable radio
Conclusion
Literature

Introduction

           Today, the technology of digital signal processing (DSP) has become an integral part of everyday life. DSP devices are used in a wide variety of systems - from mobile phones, computer modems, digital televisions, MP3 players and DVD-systems, voice traffic on IP-networks, medical equipment, car navigators. Currently, there are three main classes DSP devices - the universal processor (UP), signal processors, or processors of digital signal processing (DSP), and digital signal processing devices based on user-programmable gate array (FPGA). The latter - one of the varieties of specialized circuits (ASIC). In the late 1980's - early 1990's DSP-chips, specially designed for digital signal processing, its characteristics significantly superior to the traditional UE. But recently the difference between these two classes of processors almost disappear, with many serious challenges UE perform digital signal processing. Increasingly over the specialized ASIC-chips is given to promising DSP-based FPGA device with a flexible architecture, high level of parallelism and a high productivity, especially in the development of systems produced by small and medium series

Comparison of different types of devices, digital signal processing

           Each type of DSP devices can be divided into junior and senior processor models. Let us discuss from this point of view presented on the market today DSP devices.
           The most common models for younger UE can be embedded microcontrollers (4 -, 8 -, 16 - and 32-bit), designed for systems with high demands for economic efficiency. These processors include 32-bit microcontroller based on the company's ARM core ARM7. UE Senior models - central processors used in PCs, workstations, network servers, such as Pentium family of processors from Intel and PowerPC company IBM. The share of UP currently are the highest volume of sales of processors on the market.
           DSP younger models to its relatively simple architecture similar to the first microprocessor in this class, developed in the 1980's. Are they effective for specific tasks of digital signal processing, not requiring high performance (for example, decoding the data in MP3 format), with low power consumption and an acceptable cost. The share of these processors bulk sales of DSP. An example of the younger models DSP - TMS320C54x chip company Texas Instruments [1, 2].
           However, there are applications for which it is important to high speed. Thus, instead of 50 • 106 multiplications-accumulations per second (50 MMACS), needed to decode the data in the MP3 format, the number of required processors, the performance to be achieved by several billion or more of such operations per second (GMASS). Older models of DSP-based architecture implemented with the use of very long command word (VLIW) and the principle of "one team - a lot of data" (SIMD). Example - a family of processors TigerSHARC company Ananlog Devices. They have a static Superscalar VLIW-architecture with SIMD-two elements that allow using a single team to manage the two types of execution units. As a result, the processor TigerSHARC, and universal processor PowerPC74xx, also executed on the SIMD-based architecture that can produce eight 16-bit multiplications per cycle [2]. Older models of signal processors are used in medical equipment, which provides visualization of data, fixed point wireless communications, electronic intelligence, radar and sonar stations, satellite videoperedatchikah, industrial equipment control plates / templates for the production of semiconductor devices.
           Poll of the requirements of developers to speed 16-bit DSP devices, conducted by Forward Concepts, showed the following distribution:

Performance MMACS the proportion of respondents, %
100 19,3
600 24,3
1000 11,5
6000 21,6
10000 9,2
10000 14,2

Thus, there is a significant market for DSP devices junior models (with a speed of 100 MMACS). At the same time a large number of respondents are in need of DSP devices senior models (with the speed of> 10000 MMACS) [1].
           In recent times, is a lively debate about the design and technological solutions for the senior models of DSP devices, including for military purposes. It should be noted that the digital signal - it is not an application problem, a set of interrelated tasks, which are summarized as the problem can be described as preliminary and final processing of the signal. The Art of the developer is in the correct selection of DSP devices for specific applications based on traditional criteria. It must take into account not only productivity, but also many other parameters that characterize the work of the processor.
           For example, FPGA can not perform operations with floating point, if a key requirement of the problem - accuracy. At the same time a means of radio and electronic intelligence (SIGINT and ELINT, respectively) require a large number of FFT, is usually performed with floating point, since the operation with fixed-point limit the dynamic range of the output solutions [3], ie in the creation of these devices should be used DSP. In addition, the operation of (or division), the matrix is also better than using a DSP or UE.
           It should take into account other important differences between the DSP, CB and FPGA. Thus, high performance DSP, but such a processor can simultaneously perform a few operations, while the FPGA can perform both virtually unlimited number of transactions, providing high parallelism of work. However, their performance tends to be lower than that of the DSP and the UE. Thus, DSP and UE are more suited to work with complex algorithms with floating point, and the FPGA are useful in systems that work with fixed-point and requires a high level of parallelism. Consequently, only a particular application determines the advantage of one type of DSP processor to another. For example, to create diagrams of the normal direction or a digital antenna system multiplies by multiple antennas signals at their weights. And if high-speed FPGA - an ideal solution for the operation of multiplication, the dynamic calculation of the weights requires treatment matrix, which is much better implemented UE or a specialized DSP. The most acceptable solution to the problem of forming diagrams of the focus, obviously, is the realization of the input antenna system coprocessor for FPGA, and the output unit for data collection - in the DSP, or UE. This solution simplifies the system architecture, facilitates its further development, as well as helpful in upgrading the functionality of existing systems [3].
           Great importance in the selection of a DSP-type device for the application rate is the cost-performance. " A Texas Instruments by comparing this figure to develop their own specialized DSP and FPGA of Altera, a major supplier of such chips, revealed that for applications not requiring the performance of more than 300 MMACS, the optimal solution can be obtained by using DSP. For applications with a capacity of 300-1000 MMACS preferred DSP specialist with the resources necessary to perform the required functions (Table 1, 2).

The hybrid architecture of FPGA / DSP

           If you want performance over 1000 MMASS, it is advisable to apply the hybrid device based on DSP / FPGA. Such "hybrid architecture, in which are placed on one board DSP and FPGA computing elements, typically used to implement embedded processor digital signal processing.

Fig.1. Criteria for selecting a processor signal processing

           But it had to take a lot of trade-related interface devices, input-output data (device I / O), interprocessor communication, memory configuration, the host interface, control, and software and hardware FPGA. All of these solutions must be carefully considered and supported by the model software [5].
           Interface. To reconfigure the device I / O in many modern on the market of industrial motherboards computer systems are already available FPGA computing elements. Placing the elements near the I / O allows data to support any data transmission standards, including such different standards, as PCI, PCI Express, USB, GigE, Serial RapidIO. This is particularly advantageous when using the new formats boards - VITA 41, VITA 46 and the AMC, which support high-speed series-parallel and parallel-serial converter (SerDes), capable of operating with several different protocols.
           Interprocessor communication. In addition to the selected type of computational elements (DSP or FPGA) on the performance of the system affects the quality of communication between them. Obviously, it must be sufficiently high. In addition, the relationship must be determined and fitted with a small delay. Less obvious is the fact that the data transfer rate on the average should be higher than the bandwidth of devices, I / O card. The main task of the hybrid architecture is that the desired type of computing element located in the right place on the system used at the right time. To fulfill the data generally need to transfer between different elements and often more than once (Fig. 2).

Fig.2. Example interprocessor communication

           These are usually introduced into the system on the motherboard via an interface device I / O matrix FPGA, which performs the preliminary processing. Sometimes, as in the case of the converter to decrease the frequency or the performance of pulse compression algorithm, pre-treatment leads to a reduction in the speed of data transmission. However, the overall pre-processing algorithms, including algorithms for the realization of filters, photo interpreters, FFT, do not affect the data transfer speed, and sometimes lead to its increase. Therefore, when the data transfer element DSP for the purpose of further processing, as well as the return of their FPGA for final processing and output of the minimum required bandwidth, the speed of interprocessor communication must be equal to the speed of data transmission devices, V / V. However, in a more general case, when the FPGA resources coupled with DSP processing, or when the computational elements are built on multiple FPGA and / or DSP, the speed of interprocessor communication can often exceed the speed of data transmission devices, V / V.
           Memory configuration. Not all DSP devices require a large amount of memory. But if there is such a memory of its type, configuration and even the location depends on the specific application requirements. Sharing a large amount of memory module and the FPGA allows it to form various types and configurations by changing the module and reprogrammirovaniya valve matrix. For example, a 64-bit data bus can be used to support one bank of memory with a length of 64 bits or words can be rekonfigurirovat to support two independent banks of memory with a length of 32 bits of each word. In doing so, banks of memory may be different types.
           The host interface and management. The most appropriate way to implement a host of industrial interface boards with a hybrid DSP-device - connect the standard interfaces and separate bus command and control of every DSP and FPGA computing elements through the bridge. Because this line will be perpendicular to lines of data, the host interface is often called matrix management (as opposed to the data matrix). The existence of independent matrix control allows the host to directly sample the data and manage all the resources without affecting the bandwidth of data bus. Bus command and control provides direct access to host memory. It can also serve as an additional means of interprocessor communication. A general view of the industrial board with hybrid DSP device and the decision to address the problems presented in Fig.3.

Fig.3. The hybrid architecture of the processor processing the signal communication

           Software and hardware for FPGA. At great opportunities FPGA, programming of the matrix can be daunting. If you need to ensure high-speed DSP FPGA device located on the industrial board with fixed terms of findings and external interfaces, the task even more complicated. To facilitate the solution, both the Boards should be IP-ins (including device I / O, memory and interprocessor communication, command and control bus). Ideally, the structure must support FPGA programmable connection between the IP-modules, together with the documented approval of their interfaces, allowing the user to apply the signal processing modules that can be introduced into the flow of data. In this case, data flows, as shown in Fig.2, easily achieved by transferring the required blocks provisional / shared / post-processing of signals (in the figure are shown in yellow) in the configurable program data streams. During the processing of tires teams can configure and control required for this application the flow of data between interfaces, I / O, IP-modules, DSP boards and other resources.
           Software. The more complex the hardware, the greater the need for low-level support for the host interface, debugging systems, command and control, the execution of the program. For a hybrid device, which is harder to the equal, the unification of software is especially important. Debugging may require special resources for the implementation of various technical solutions, but the configuration and data management of all computing elements must be incorporated into a single library interface driver host. If the above software and hardware gate array implemented to specify data flow and management of the necessary software. In addition to performing Fig.2. Example interprocessor communication of data, interruptions, coordination and synchronization to ensure program support interaction with the elements of DSP-FPGA. This can be achieved through code libraries, messaging interface, or OS [5].

Using the FPGA as a coprocessor

           In high-speed DSP platforms, traditionally sold on universal DSP, performing algorithms in C, for the data or coprocessor functions are increasingly being used FPGA. This is due to the flexibility of FPGA, which supports the high parallelism of work in carrying out such operations as КИХ-filtering, FFT, digital down conversion and direct error correction. Hardware system containing the DSP and FPGA-coprocessor can perform operations distribution of algorithms between the DSP, FPGA configurable logic blocks and embedded processor on the FPGA. The challenge is to achieve an effective distribution system operations on the DSP hardware resources available. How best to use FPGA-embedded processors, it is not always obvious. This hardware resources can make a major contribution to reducing the total cost of the system. FPGA allows all non-critical operations in the software flow of embedded processors, reducing the total number of hardware resources required for the system [6].
           The use of DSP chips and FPGA-coprocessor for the implementation of a rapid videokodirovaniya standard H.264/AVC has shown that the functionality of the codec that was performed on their base, much broader than just the codecs on DSP (Fig.4) [7].

Fig.4. Comparison of the capacity of the codec on the DSP to FPGA-coprocessor and DSP in the light of the normal coding standards, types of transactions, number of channels and resolution

Fig.5. The use of FPGA as a coprocessor in the system, four video Raven-D

Developed a system of four video Raven-D on the basis of a low-cost FPGA Cyclone II of Altera, and DSP DM642 firm Texas Instruments (Fig. 5). DSP operates at clock frequency up to 1 GHz. The number of interfaces I / O, command / clock signals and Multipliers limited word length is fixed. In addition, for communication with other DSP uses PCI bus to the relatively low speeds. FPGA chip can work with a large number of teams / clock signals and words of variable length that contains two orders of magnitude more multipliers than the DSP. Thus, in the Cyclone II FPGA type and 150 18? 18 Multipliers / accumulating adder operating at up to 250 MHz each, and about 70 thousand standard logic elements. FPGA, and DSP, has access to various types of modern dose. Interprocessor communications provide LVDS bus with the speed 1 Gb / s SerDes or tire with a speed of 1 Gbps. Disadvantages FPGA DSP processor - a long development time and significantly lower values of clock frequency in comparison with the DSP.
           Thus, DSP and FPGA complement each other. DSP with a high speed to perform new and complex algorithms, and from two to four calculations simultaneously, and FPGA - both vector and matrix math operations. In addition, the FPGA chip is suitable for establishing communication between the multiple processor nodes, for data collection and their distribution between the digital signal processing devices, as well as for the adjustment of additional computations in a single output stream. In the video FPGA can be used as a coprocessor for video pre-processing (stabilization, filtration and detection of movements), as well as to perform the functions of video compression [8].
           DSP in conjunction with the FPGA has been successfully working in the company's JPEG2000 codec BroadMotion (Fig.6). The use of additional low-cost FPGA Cyclone II chip company Altera or Xilinx Spartan 3 has enabled the company to expand

Fig.6. The use of FPGA as a coprocessor in the coder of JPEG2000 BroadMotion

           functionality of the encoder and improve image quality while respecting the requirements to provide sufficient low cost. When a full color video with a resolution of 720? 480 pixels codec company BroadMotion encodes more than 50 frames per second performance with 25 MB / sec. The use of such a coprocessor provides more than an order of magnitude better efficiency encoder in comparison with the device only on the DSP [9].
           Good results can be obtained, and in constructing a system based on three types of DSP devices. Example sharing UP / DSP / FPGA - charge controller IP-based cameras trehyadernogo signal the microcontroller MCam02, developed by Elvis in the platform Multikor (Fig.7). The fee, in addition to the controller, provides an inexpensive chip FPGA Spartan 3 types of Xilinx (for interface controls the lens with variable focal length, the interface removable flash memory, I2C interface for input-output high-quality audio, an additional interface for the buttons and the flash). As a result, the effectiveness of the microcontroller in the performance of transactions processing and transmission of the video signal is increased [10].

Is the use of a combination of DSP / FPGA in programmable radio

           To achieve a balance of cost, power, speed, flexibility and reliability of the developers in constructing the architecture of programmable radio (SDR), including for military purposes, also use a combination of processor elements. SDR-systems operate with signals of various forms of waves, and for their digital conversion with decreasing frequency of a flexible logic programmable gate array.

Fig.7. Structure of the controller board IP-Camera (MCam02-IP)

           After demodulation signal FPGA dynamic processing in real time can do DSP. In transmitting the channel process is repeated in reverse order. For modems in the SDR-support systems and operations of modulation, demodulation, perediskretizatsii with increasing and decreasing frequency, as well as the correction of errors need to combine high-speed DSP and FPGA. Depending on the type of algorithms for wireless modem device control system of errors in data transmission or a direct error correction can be implemented with or DSP, or a logical valves. For example, the encoding and decoding algorithms for Reed-Solomon, along with the coding in the ultra and turbokodah, easier and better to perform signal processors due to better balance the cost-performance. However, more sophisticated techniques for decoding algorithms for correcting errors in the ultra turbokodah or appropriate to apply the logic valves, built-in processor or FPGA. In this regard, the use of FPGA for SDR-systems as a coprocessor has many advantages because it allows the flexibility to support multiple protocols.
           Access to the data environment that provides encoding and decoding packages dvuhrazryadnyh data transfer interface to the network and from him, as well as flow control and conflict in the channel, requires a high-performance real-time operating systems, and implies a large number of operations. The best DSP devices to perform the functions of access to the media are considered to be UP. Thus, for the realization of multi-multi-SDR systems need to combine the UP, DSP and FPGA [11].
           The combination of the signal processor and the FPGA provides a very flexible system solution when working in the military standards of AFDX, ARINC and MIL-STD-853. Analysis and data processing is executed by the processor, while the DSP on FPGA provides their input / output. Most FPGA configuration data storage device requires an external nonvolatile memory. After power on the data loaded in the FPGA. Often, the load is carried out consistently and took hundreds of milliseconds. Reduce the load time can microcontrollers, programmable data flash memory via a standard RS-232. This fee can be upgraded with new software without any of the installed equipment. Other advantages of this method include the possibility of programming of any modification of the device, download the configuration FPGA in parallel mode via the processor bus, so it takes much less time than if the boot sequence. As a result, fewer components, increasing the time time between failures, reduced physical size and significantly reduces the cost of the card.


Fig.8. Combining the microcontroller and the FPGA with a variety of system interface

In Fig.8 shows an example of a combination of the microcontroller and the FPGA, including the various system interface card. The design used by RISC-Microcontroller Texas Instruments MSP430 companies with extremely low power consumption: the current running does not exceed 10 mA in an inactive mode - ~ 1 mcA. Microcontroller has two asynchronous serial ports, eight 12-bit ADC, a few common devices, I / O, PWM and timers.

Conclusion

           Each technology its own advantages and disadvantages, and each is capable, depending on the particular application to outdo the other. If you select a processor to be evaluated many options including:
• system requirements to the characteristics of DSP devices;
• power consumption;
• the number of components and dimensions;
• schedule (roadmap) for future products / systems
• and improving existing facilities;
• Economic indicators, such as recurrent
• the design and implementation of (NRE), the cost of materials, delivery to the market and the risks associated with the project [5].

Literature

Strauss W. High-end DSP markets compute to higher revenue. DSP-FPGA.com Product Resource Guide 2006.
Hori B. et al. Use a Microprocessor, a DSP, or Both? Workshop ESC-304.
Cavill P. FPGA or DSP for military applications? Both have their place. DSP-FPGA.com Product Resource Guide 2005.
Afra B., Kapadiya A. DSP or FPGA? How to choose the right device. www.dspdesignline.com/207600551?printableArticle=true
Milrod J. Hybrid FPGA/DSP architecture: the optimal solution. DSP-FPGA.com Product Resource Guide 2006.
Hill T. The benefits of FPGA coprocessing. DSPFPGA. com Product Resource Guide 2006.
Banks J., Chung W. Combining the power of DSP and FPGAs to implement a high-performance H.264/AVC video coding standard. www.dsp-fpga.com/articles/banks_and chung/
Jentz B., Rotem J. Leveraging. FPGA coprocessors to optimize high-performance digital video surveillance systems. www.dsp-fpga.com/articles/jentz_and_rotem
Wang R. Encoding JPEG2000 using both DSP and FPGA. www.embedded,com/showArticle.jhtml?article ID=192202060
Беляев А.А., Солохина Т.В., Александров Ю.Н., Миронова Ю.В., Коплович Е.А. Программная реализация алгоритмов сжатия изображений на базе процессоров семейства "Мультикор". – Тезисы докладов научно-технической конференции "Современные телевизионные технологии. Состояние и направления развития". – M., 2006.
Dumas M., Belanger L. A new architecture for development platforms targeted to portable radio applications. www.dsp-fpga.com/articles/dumas_and_belanger
Wilson R. 100-core DSPs in our sights, TI says. Electronics Weekly 3/9/2007.
LaPedus M. DSP to go multicore. www.eetimes.com/showArticle.jhtml?articleID=1978 01152

Performance	MMACS the proportion of respondents, %
100	19,3
600	24,3
1000	11,5
6000	21,6
10000	9,2
10000	14,2