Abstract on the theme of masters work

Content

Introduction
1. Theme urgency
2. Goal and tasks of the research
3. Capturing an image with face from the video shot
4. Convolution Neural Network
5. Software model
Conclusion
References

Introduction

The problem of pattern recognition refers to the class of problems that are difficult to formalize. At the present time, it is particularly urgent because of the need to automate the processes of communication (visual, verbal) in the intellectual systems. Therefore, the search and implementation methods of human recognition by computer systems continue. Artificial neural networks are very perspective to solve the problems of this class. They are induced by biology, because they composed of the elements the functionality of which are similar to most elementary functions of a biological neuron. Despite superficial similarities, artificial neural networks demonstrate properties that are the characteristic of the living brain. Particularly, they are trained on the base of experience. They then generalize the previous precedents on new examples and extract essential properties of the incoming information, which includes extra data [1].

1. Theme urgency

In recent years the problem of recognition human faces has become increasingly important. Face recognition is a practical application of pattern recognition theory, which task includes automatic localization of the face on the photo and, if necessary, identification of the person by the face. Face recognition is applicable in the following areas: security systems, forensics, computer graphics, human-computer interaction, virtual reality, computer games, e-commerce, personalization and data protection, etc. Thus, the relevance of this problem takes place [2].

Master's thesis is devoted to an actual scientific problem of pattern recognition, where the image appears as the picture of human face. We get the image from the video shot. In fact, the problem is divided into two subproblems. First, person stands out from the video flow. Then there is recognition of the selected face. Platform Java SE, IDE Eclipse, and several supporting libraries are used as tools.

Great contribution to the development of the theme of pattern recognition in DonNTU was made by Oleg Fedyaev and his masters S. Kulikov, K. Driga, Yu. Makhno, G. Kostetsckaya, A. Sowa, and others.

The research of this topic has been conducted since the 80s in the world. Jan LeKun an American scientist, made a great contribution to the development of systems of pattern recognition using neural networks. Intel Corporation wrote its own open source library of computer vision OpenCV [3], that is accessable to everybody.

2. Goal and tasks of the research

The aim of this research is to develop a subsystem that will accurately detect human faces in real time. Such system can be used to identify individuals such as students, members of the lecture audience.

The main purposes of the research are:

To search for a method that selects with maximum accuracy of images of human faces in a picture from the video flow.
To design a parametric model of the convolutional neural network.
To construct the logical and physical model of the convolutional neural network.
To develop software model of face recognition.

Object of research: neural network face recognition system.

Subject of research: The structure of convolution neural network and its learning algorithms, algorithms for the selection of faces from the video flow.

Actual scientific results are planned to receive the following areas within the master's thesis:

Determining the most effective methods of localization of the human face.
Creating a software convolution neural network model and define its parameters.
Analysis of the effectiveness of the algorithms.

3. Capturing an image with face from the video shot

Currently the problem of recognition human faces is becoming more relevant. The process of direct face recognition precedes an important stage in automatic localization of faces in the picture, the methods of implementation which are currently being actively developed. The Viola-Jones method is considered below [4]. Despite the fact that this method was described in 2001, it remains one of the most efficient algorithms of face selection. Also, this algorithm was on the top 10 algorithms of artificial intelligence [5] in 2007 year.

It's based on Adaptive boosting algorithm[6]. There is a set of standard objects, i.e. image and the class to which it belongs (-1 – no faces in the picture, 1 – there is a face). Besides, there are a number of simple classifiers, from which we can build more sophisticated and powerful one. In the multi-step process of compilation and training of the final classifier emphasis are made on standards, which are recognized worse. Algorithm adapts to the most complex objects in the learning process. This is the adaptability of the algorithm. Figure 1 shows the idea of localization of the face in the image of this algorithm:

The idea of the Viola-Jones algorithm

Figure 1 – The idea of the Viola-Jones algorithm

The algorithm is based on three key concepts:

a) Usage of the "Integral Image" which allows to calculate the area of the scanning window in several times faster than the algorithm for the detection of skin color. The integral image at point (x, y) includes the amount of pixels that are located to the left of and above the image:

where ii(x,y) – value of the integral image at (x,y) point.

The following recurrence formulas are used to calculate the integral image:

The recurrence formulas are used to calculate the integral image

where s(x,y) – accumulated sum of the line, s(x,-1)=0, ii(-1, y)=0.

b) An adaptive learning algorithm selects a small number of low level signs of more than 180 thousand characters. There are many motivations for using signs, rather than individual pixels. The most important reason – a sign that can encode different meanings that are difficult to define and study, using a finite set of training data. Signs are computed faster than pixel values that will be discussed in detail below. Viola-Jones algorithm uses three types of signs: two rectangles, three and four rectangles. A cascade of classifiers constructs on these classifications. Each element of the next stage has more stringent conditions of passing than the previous one (using more primitives). Thus only the right samples reach the end. Education is based on the idea of boosting. Several thousands of positive and negative images are required for successful learning. Figure 2 shows the algorithm of boosting:

Boosting's algorithm

Figure 2 – Boosting's algorithm

Boosting is used for selection of a small finite set of features for a high-level training classifier. For training the neural network classifier a simple (weak) learning algorithm is used.

c) Usage of the cascade of Haar [7], which selects a small number of features and combines them into a higher level of symptoms. As a result we get a cascade of 38 steps on a large data set consisting of 507 individuals with more than 75 million scanning windows. Faces are recognized with 10 features per second calculation for each window. The resulting detector slides on the image in different scales. Thus the invariance is archived.

4. Convolutional Neural Network

Neural networks are successfully used for solving many problems of pattern recognition: recognition of character, object recognition, and many others. Currently classic neural network architectures are used for solving problems of recognition (multi-layer perceptron network with radial basis function, etc.), but the usage of this networks is ineffective for the following reasons:

ensemble of neural networks (2-3 neural networks are trained with different initial values of the synaptic coefficients and order of presentation of the images) is usually applied to this problem. This negatively affects the computational complexity of solving the problem and run-time;
as a rule, the classical neural network architectures are used in conjunction with auxiliary techniques of selection the scene from the image (segmentation by skin color, edge enhancement, etc.) that require high quality and meticulous pretreatment training and operational data that is not effective;
neural network architectures are highly sensitive to the influence of various external factors (changes in shooting conditions, the presence of individual features in the image, change the orientation) [8].

In addition, there are difficulties in applying traditional neural networks to real problems of pattern recognition and classification of images.

First, as a rule the images have a large dimension, respectively, grows the size of a neural network (number of neurons, weights). A large number of parameters increases the capacity of the system and respectively, requires more training samples, which increase time and computational complexity of the learning process.
Second, the lack of a full mesh architecture is that the topology of the input is completely ignored. Input variables can be presented in any order without affecting the goal of learning.

Convolutional neural network architecture is designed to overcome these shortcomings. Convolutional neural networks provide a partial resistance to change scale displacements, rotations, changing angles, and other distortions.[9].

The works of French origin, the American scientist Jan LeKun focused on solving these problems. These works are associated with the development and research of convolutional neural networks. The idea of convolutional neural networks lies in the alternation of convolutional layers (C-layers), layers of sub-sample (S-layers) and using the output layer neurons. Convolutional networks are based on three mechanisms used to achieve invariance to the transfer, scale, low distortion:

Local features extraction. Each neuron receives input from the local receptive field in the previous layer and retrieves its local characteristics.
The formation of layers in a set of cards features. Each computing layer consists of a set of cards features – planes, on which all the neurons have to use the same set of synaptic weights.
Subsampling. Each layer follows a convolutional computation layer which performs local averaging and sub-sample. This operation causes to decrease sensitivity of output signal of the operator mapping features a slight displacement and other types of deformation.

Consistent application of convolution and subsampling increase the level of features: if the first layer extracts the local features of the image areas, the subsampling layers are extracted common features, which are called the signs of a high order. Figure 3 shows a network of convolution, which recognizes the input image [10].

Architecture of the convolutional neural network

Figure 3 – Architecture of the convolutional neural network

In the set-theoretic model of level the convolutional neural network (CNN) is presented in the form of a tuple 3:

where M – amount of layers in the network; η – number of layer; 1≤η≤M; N^η – amount of the feature cards in the η layer; C, S – weights of the C and S layers respectively, f^η,φ^η – the activation function of neuron of the η-layer, C and S layers respectively; N – amount of neurons in the last layer of the neural network; w – the matrix of output layer weights; ψ – the activation function of neurons of the output layer.

Parametric model does not reflect the behavioral (functional) aspect of the elements of a neural network. This lack eliminates a logical model as class diagrams, which contains the main objects of a parametric model of the CNN, endowed with the properties and the operating ability of the neural network.

Building a logic model begins with the definition of entities (classes and objects) of the system. In figure 4, we can see the implementation of the system as a class diagram. Based on the topology of the convolutional neural network identified the following classes: Neuron keeps all necessary information for a neuron (inputs, weights, outputs); abstract class BasePlane describes the workings of all types of planes (input, C-plane, S-plane, output), training and recognition methods. Also, the structure of relations between maps of adjacent layers are realized in this class. An abstract class SubPlane (extension of the class BasePlane) realizes methods specific to the convolutional and subsampling planes. ImagePlane class encapsulates the work of the image, i.e., translates it into a two-dimensional array of input values and keeps the image. ConvolutionalPlane class implements the functionality of C-layer network. Class SubSamplingPlane is an implementation of S-layer network. Class OutputPlane is a layer of neurons, which are completely linked to previous network layer (S2), generates output values of the network. NeuralNet class is a black box. At the entrance we get the image and the output layer generates output signals. In general, the layers can interact with each other [11].

Logical model of the convolutional neural network in the form of UML diagram

Figure 4 – Logical model of the convolutional neural network in the form of UML diagram

5. Software model

General view of the programming model is shown in figure 5. Stages of Software Product are:

Capture an image from a webcam.
The face localization in the picture.
Extraction the face image and sending it to the input of the neural network.
Face detection by neural network.
Recognition result output to the screen.

Figure 5 – Software neural net model of face recognition
(animation: 9 frames, 5 cycles of repeating, 196 kilobytes)

Conclusion

Creating intelligent recognition systems, particularly faces is one of the most popular topics today. This master's paper attempts to create a software product that has several sub-tasks: to connect to a web camera, get a shot, release the face on this shot, create a database of faces, train convolutional neural network and, finally, recognized a face. Steps of selection and recognition of faces are implemented at the moment, and can work independently of each other. The next step will be the merging of two software products without depending on operator.

Master's work is not yet complete. Final completion will be in January 2013. The full text of the work and materials on the topic can be obtained from the author or his head after that date.

References

Федяев О.И., Махно Ю.С. Система распознавания зашумлённых и искажённых графических образов на основе нейронной сети типа неокогнитрон // Одиннадцатая национальная конференция по искусственному интеллекту с международным участием КИИ-2008: Труды конференции. Т. 3. – М .: ЛЕНАНД, 2008. – 464 с.
Компьютерное зрение [Электронный ресурс]. – Режим доступа: http://ru.wikipedia.org/wiki/Компьютерное_зрение
OpenCV – библиотека компьютерного зрения с открытым исходным кодом [Электронный ресурс]. – Режим доступа: http://software.intel.com/en-us/articles/...
Viola P, JonesM(2001) Rapid object detection using a boosted cascade of simple features. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. pages 511–518, Kauai, HI
Wu V., Kumar V., Ross Quinlan J. et al. Top 10 algorithms in data mining. – Knowledge and Information Systems, 2008, v. 14, i. 1, p. 1–37
Саймон Хайкин.Нейронные сети: полный курс.: издательский дом «Вильямс», 2006. – 1104с.
Rainer Lienhart, Alexander Kuranov, Vadim Pisarevsky. Empirical Analysis of Detection Cascades of Boosted Classifiers for Rapid Object Detection.: Intel
Макаренко, А. А. Методика локализации изображения лица для систем видеоконтроля на основе нейронной сети / А. А. Макаренко, В. Т. Калайда // Известия Томского политехнического университета [Известия ТПУ] / Томский политехнический университет (ТПУ) . – 2006 . – Т. 309, № 8 . – [С. 113-118]
Костецкая Г.Ю., Федяев О.И. Распознавание изображений человеческих лиц с помощью свёрточной нейронной сети // Сб. трудов Международной студенческой научно-технической конференции «Информационные управляющие системы и компьютерный мониторинг» – Донецк: ДонНТУ, 2010. – с. 216-218.
Умяров Н.Х., Федяев О.И. Параметрическая модель свёрточной нейронной сети // VI международная научно-техническая конференция студентов, аспирантов и молодых научных работников «Информатика и компьютерные технологии»: Т. 2 – Донецк, ДонНТУ, 2010. – 292с.
Умяров Н.Х., Федяев О.И. Логическое и физическое представление архитектуры свёрточной нейронной сети // Інформаційні управляючі системи та комп’ютерний моніторинг (ІУС КМ – 2011) : II Всеукраїнська науково-технічна конференція студентів, аспірантів та молодих вчених, 11-13 квітня 2011 р., м. Донецьк : зб. доп. у 3 т./ Донец. націонал. техн. ун-т; редкол.: Є.О. Башков (голова) та ін. – Донецьк: ДонНТУ, 2011. – Т.3, с. 81-85

Abstract

Nail Umiarov

Faculty of computer science and technology (CST)

Department of applied mathematics and informatics

Speciality Software Engineering

Scientific adviser: Ph.D., docent Oleg Fedyaev