ua ru
DonNTU   Masters' portal

Abstract

Content

Introduction

Neural networks have achieved considerable success in the field of image and video recognition. This was made possible thanks to massive databases of images with tagged objects. After analyzing the photographs or video clips, the computer can almost accurately determine the location, or the object located in the frame. The ability to determine the location of the action from the sounds is just as important as determining the location from the video. In cases of poor visibility due to contamination of the camera or fog, the camera will not provide enough data to analyze the situation. However, if the microphone is working, there will be more data for system to be able to to analyze the situation.

In recent years, Machine Hearing (Machine Hearing) has been actively developing. Within this framework, models of machine perception and sound processing.

Machine hearing technologies can also be involved in such tasks as automatic identification of advertising brands and slogans; meeting transcript analysis; sound complement antiplagiatnyh systems; automatic grouping of media news by keywords and headings etc. Despite the significant progress in speech recognition achieved by Google and Apple mobile applications, the development of computer-aided hearing has its own characteristics and requires the development of deeper models and methods.

1. Relevance of the topic

One of the main advantages of neural networks is the simultaneous processing of a large number of signals. Most of the networks currently being implemented are software emulations running on personal computers and specialized servers [4]. In addition to the advantages, such as simplicity achieved by software emulation, and the widespread availability of personal computers, these stations also have disadvantages, such as redundancy and high power consumption. The speed of the neural network constructed in this way will directly depend on its dimension, since the frequency of the single CPU will be divided into all neurons used in this network. Their speed is inversely proportional to the number of elements of the cellular automaton.

To date, there are two types of implementation of neural networks:

The most prevalent are software-implemented artificial neural networks. This popularity can be explained by the simplicity of the development of this type of network, which resulted in the emergence of a large number of libraries for the development and training of neural networks, which made software neural networks even more common. Also an important argument for the benefit of the software implementation of a neural network is the simplicity of their implementation and application in computer systems, this effect is explained by the lack of the need to upgrade hardware [10].

There is also an implementation method that allows you to take advantage of both types of neural networks. This method consists in the implementation of a network using a programmable logic integrated circuit (FPGA) [1,2]. The principle of FPGA operation is the ability to program the system architecture, i.e. its hardware component. In other words, programmatically change the hardware structure of the system. It is possible to create software creation of simple modules in a pluss and merge them into a large network. On the other hand, the FPGA supports fully parallel computing, in contrast to pseudo-parallel computing, which are used in conventional computers. Another advantage of the FPGA is a well-developed periphery, which includes a wide range of modern computer interfaces (including high-speed ones), which allows for debugging of a neural network and also eliminates the problem of integrating the network into a computer system.

A general view of a neuron is a set of the following blocks: adder, multiplier, logical activation function and data storage unit, as well as link weights. Individual neurons are combined into layers that form a network. This structure is easily implemented in the FPGA. And the possibility of achieving parallel operation of neurons, based on the hardware structure of the FPGA, allows us to achieve greater speed than software implementations [3].

2. The purpose and objectives of the study, the planned results

The main purpose of the work — development of neural network architecture based on FPGA, focused on the recognition and analysis of audio signals.

Tasks:

  1. Analysis of existing audio signal processing algorithms.
  2. Analysis of existing neural networks for processing audio signals.
  3. Analysis of the possibilities of implementing neural networks based on FPGA.
  4. Processing of the results.
  5. Development of the neural network architecture.

The result of the work will be a neural network based on FPGA aimed at solving the problem of recognizing sound signals.

3. Analysis of existing sound recognition systems

Specialists from the Laboratory of Informatics and Artificial Intelligence (CSAIL) of the Massachusetts Institute of Technology have made a qualitative leap in the field of analysis and processing of sound signals, having developed a system for machine learning SoundNet. Figure 1 shows the general sound recognition algorithm.

Figure 1 — General algorithm for sound recognition by the neural network

(animation: 21 frames, looped repetition, 73 KB)

Employees of CSAIL used the method of natural synchronization between machine vision and computer hearing, teaching the neural network to automatically extract the sound representation of an object from unallocated video material. During the training of the neural network, approximately 2 million Flickr videos were used, with a total volume of 26 TB, as well as an annotated sound base — 50 categories and approximately 2000 samples. Figure 2 shows the architecture of the SoundNet neural network.

Figure 2 — Аrchitecture of the SoundNet neural network [5]

System shows high results when working offline. The system classifies at least three standard acoustic scenes in which the system designers tested it. A more detailed analysis of the neural network has shown that the system independently recognizes sounds characteristic of various scenes. This fact was surprising, since the developers did not provide its samples for the recognition of these objects. Using the base of unmarked video materials, the neural network independently determined which scene corresponds to the sound of a cheering crowd (stadium) and bird chirping (lawn or park). Simultaneously with the scene, the neural network recognizes the specific object that is the source of the sound.

Two standard sound recording databases were used to test the SoundNet system. The result of testing the neural network turned out to be 13-15% more accurate in recognizing objects than the best of the existing programs. Accuracy of SoundNet is 92% when typing data in 10 different categories of sounds. With an increase in the number of categories to 50, the accuracy of the system is 74%. For comparison, on the same data sets, people show recognition accuracy, on average, 96% and 81% [5].

4. Usage of FPGA-based Neural Networks

To date, most of the computations associated with neural networks are performed by graphics processors. In addition to computing power, these GPU systems also consume large amounts of power.

Microsoft has deployed various large global data centers that perform a huge amount of computation by using convolutional neural networks (CNS). CNN — It is a type of machine learning that analyzes an image in such a way as to explore functions that can help a computer identify patterns in an image.

Deep learning using CNN is quite a laborious process, and in this connection, Microsoft's data processing centers encountered difficulties in using the GPU as the only computing tool, because they had limited applicability and high power consumption. In 2014, the Altera Stratix 5 FPGA was tested for processing search ranking algorithms. The results showed a performance increase of almost 2 times, and therefore it was decided to use this architecture to test the CNN accelerator.

An accelerator based on programmable logic integrated circuits (FPGAs) FPGA was designed to efficiently calculate the direct propagation of convolutional layers. Thus, the CNN accelerator must be able to take an input image and process several convolutional layers in a row. When designing, the architecture should include several factors. Since the system must handle several levels, the computational engine of the system must be configured to support these layers. Memory management is critical, so the project must include an efficient data buffering scheme and a re-distribution network on a chip. Finally, the project architecture should be able to contain a spatially distributed array of processing elements that can be easily scaled to thousands of units. This allows the CNN accelerator to take an input image, and then perform an analysis of multiple convolutional layers in a row. The method in which the system processes convolutional layers strongly depends on the equipment used. FPGAs have become the clear choice for greater processing efficiency [6].

FPGA Arria 10 provides up to 40 GFLOPS per watt. Arria 10 uses OpenCL, a VHDL variety, to encode its IEEE754 floating-point digital signal processing units. Arria 10 has a flexible data path that bypasses external memory and allows OpenCL cores to transfer data directly to each other. In addition to a flexible data path, Arria 10 supports fixed-point hardware operations, both for multiplication and addition. This allows the FPGA to contain more logic and provide a higher clock frequency. With these improvements in hardware and software functionality, the Arria 10 can outperform existing GPU-based platforms [7].

With improved FPGA FPGA performance, data centers can also use them to meet computing and power requirements. This allows you to emulate deep learning by sequentially processing several layers of convolutional neural networks.

5. Development Tools Overview

Xilinx ISE (Integrated Synthesis Environment) — software created by Xilinx for the synthesis and analysis of HDL structures, allowing the developer to synthesize (compile) his projects, perform work analysis in time, analyze RTL diagrams and simulate the module's response to different signals [8].

Xilinx ISE is a design environment for Xilinx FPGA products and is closely related to the architecture of such chips and cannot be used with FPGA products from other suppliers. Xilinx ISE is mainly used to synthesize and design circuits, while ISIM or ModelSim logic simulator is used for testing at the system level. Other components that come with the Xilinx ISE include the Embedded Development Kit (EDK), the SDK software development kit, and ChipScope Pro.

Xilinx ISE has a number of features, the main of which is the ability to create a firmware file for downloading to the FPGA complex. This is the main feature of this environment, since it allows not only to simulate the described schemes and programs, but also to test them in real conditions.

The implementation is also divided into several stages, during which the project model is built on logical elements. Due to the large number of steps, creating the firmware file takes a long time, which is a big disadvantage of this environment.

After the synthesis phase of the HDL synthesis process, it is possible to display a schematic representation of the synthesized source file (RTL diagram). This diagram shows a representation of a pre-optimized design using common elements, such as adders, multipliers, counters, and logical elements AND OR. Reviewing this diagram can help identify design problems early in the design process. An example of seme is shown in Figure 3.

Figure 3 — RTL example

Even before loading the program into the FPGA-complex, ISE knows the number of logical elements, as well as the complex structures necessary to build a circuit that executes a written program. This data is presented to the user in the form of a table and allows you to determine at what point the user will exceed the existing volume of elements.

This environment has several disadvantages, including low speed. In this regard, the size of this software is also quite large. Another disadvantage is the fact that ISE cannot independently monitor the behavior of the firmware inside the board. In order to do this you need to use the separate software ChipScope Pro [9] . But, despite a number of drawbacks, the environment is widely used.

Conclusion

As a result of research work, materials were collected and studied on issues related to the theme of master's work.

The study analyzed analogs of sound recognition systems based on neural systems, the possibility of implementing neural networks FPGA, as well as a development environment specialized in working with FPGA.

Thanks to the materials studied, it can be concluded that there are a small number of specialized applications for recognizing the sound source. The study also showed that the implementation of neural networks based on FPGA is the optimal solution for solving local problems.

Bibliography

  1. Введение в архитектуры нейронных сетей [Электронный ресурс]. — Режим доступа: https://habr.com/company/oleg-bunin/blog/340184/. — Заглавие с экрана. — (Дата обращения: 05.12.2018).
  2. Nurvitadhi E. et al. Accelerating recurrent neural networks in analytics servers: Comparison of FPGA, CPU, GPU, and ASIC // Field Programmable Logic and Applications (FPL), 2016 26th International Conference on. — IEEE, 2016. — P. 1–4.
  3. Азаров А. Б. Обзор существующей концепции и возможностей реализации нейронных сетей / А. Б. Азаров, В. С. Константинов, Ю. Е. Зинченко, Т. А. Зинченко // Материалы студенческой секции IX Международной научно-технической конференции «Информатика, управляющие системы, математическое и компьютерное моделирование» (ИУСМКМ-2018). — Донецк: ДонНТУ, 2018. — С. 390-394.
  4. Дедегкаев Альберт Гагеевич, Рыжков Александр Александрович Метод проектирования структуры нейронных сетей на основе клеточных автоматов // Universum: технические науки. 2013. №1. URL: https://cyberleninka.ru/article/n/metod-proektirovaniya-struktury-neyronnyh-setey-na-osnove-kletochnyh-avtomatov (дата обращения: 23.12.2018).
  5. Машинный слух. Нейросеть SoundNet обучили распознавать объекты по звуку [Электронный ресурс]. — Режим доступа: https://habr.com/post/399659/. — Заглавие с экрана. — (Дата обращения: 05.12.2018).
  6. Использование ПЛИС FPGA в создании нейронных сетей [Электронный ресурс]. — Режим доступа: http://digitrode.ru/computing-devices/fpga/1045-ispolzovanie-plis-fpga-v-sozdanii-neyronnyh-setey.html. — Заглавие с экрана. — (Дата обращения: 05.12.2018).
  7. Intel PAC c FPGA Stratix 10 SX — ускоритель для больших задач [Электронный ресурс]. — Режим доступа: https://habr.com/company/intel/blog/425187/. — Заглавие с экрана. — (Дата обращения: 05.12.2018).
  8. Xilinx ISE [Электронный ресурс]. — Режим доступа: http://we.easyelectronics.ru/plis/osvaivaem-xilinx-ise.html. — Заглавие с экрана. — (Дата обращения: 05.12.2018).
  9. ChipScope Pro and the Serial I/O Toolkit [Электронный ресурс]. — Режим доступа: https://www.xilinx.com/products/design-tools/chipscopepro.html. — Заглавие с экрана. — (Дата обращения: 05.12.2018).
  10. Нейронные сети, вредные советы [Электронный ресурс]. — Режим доступа: https://habr.com/post/211610/. — Заглавие с экрана. — (Дата обращения: 05.12.2018).