DonNTU Master Alexey Potopakhin
Alexey PotopakhinFaculty of computer science and technologyDepartment of artificial intelligence and system analysisSpeciality Intelligent Systems Software TechnologiesResearch and development of a method for managing a three-dimensional scene using dynamic gesturesScientific adviser: M.Sc., Docent Konstantin Ruchkin

Abstract

Warning! This abstract refers to a work that has not been completed yet. Estimated completion date: June 2018. Contant author after that date to obtain complete text.

Content
Introduction1. Relevance of the topic2. Aims and tasks of the research3. Current status of the problem3.1 Analysis and comparison of the existing methods for dynamic gestures recognition3.2 Analysis and comparison of the implemented software for dynamic gestures recognition4. Gesture language development for the three-dimensional sceneConclusionsReferences
Introduction

Currently, existing methods of dynamic gestures recognition of a human are based on modern mathematical, evolutionary algorithms (particle swarm movement, ant colony algorithms, etc.) and statistical methods (Hidden Markov Models, Monte Carlo method, etc.). These methods can be used to create software products for contactless interaction between user and computer.

1. Relevance of the topic

The quality of the developed algorithms for hand and finger gestures recognition, both dynamic and static, using color video cameras and three-dimensional sensors is still insufficient for constructing practical human-machine interaction systems. The main drawbacks of existing methods are sensitivity to lighting changes, the need for learning the system for each operator, low quality of gestures recognition and a small recognition speed.

The relevance of the topic is justified by the need to create a simple and effective system of human-machine interaction, as a new way of interacting with a human computer user interface that can work in real time without the need of specialized equipment.

2. Aims and tasks of the research

The aim of the master's dissertation is research and development of a method for managing a three-dimensional scene using dynamic gestures.

The following tasks should be performed to achieve this goal:

3. Current status of the problem

The main feature of methods based on external gesture signs analysis is the analysis of only the target object appearance (shape, position, etc.). There is no information stored about physical properties of the object.

Various devices can be used to capture human gestures, for example, ultrasonic locators, kinematic sensors, structured illumination systems, etc. But the most common for obtaining data about user gestures is a video camera and devices similar to the three-dimensional Kinect sensor from Microsoft, which, in addition to a color video camera, also consists of two depth sensors. The use of two or more video sensors allows you to capture additional information about a three-dimensional object (holes that are invisible from another camera, the shape of lateral projections, etc.) [1]. Consider the well-known works and methods devoted to the gestures recognition of a person's hand on analysis of the external signs of gesture basis.

3.1 Analysis and comparison of the existing methods for dynamic gestures recognition

The most common approach is using the Hidden Markov Models (HMM). The gesture recognition methods based on HMM represent each gesture by a set of states that are associated with three probabilities (initial, transient, observation) calculated from pre-prepared standards. The HMM selects the model with the best probability and classifies the gesture into the corresponding category. Although the HMM-based recognition systems select the model with the best probability, it is not guaranteed that the template really looks like a reference gesture. In order to get good results, HMM should be well trained and tuned. The main drawbacks of this approach are that a large number of samples and a long learning time are required to calibrate models.

Hand gesture recognition in real time is a difficult and challenging problem because of the uncertainty in the methods for determining the boundaries of gestures. The approach based on the particle swarm movement solves this problem by simplifying the segmentation of the gestures [2]. The proposed approach avoids premature identification of the gesture, thereby increasing the accuracy of the recognition result. This resolver requires less computational and time resources than HMM and is a good candidate for implementation in applications that perform real-time recognition.

The CAMShift algorithm combines an algorithm for tracking the Mean Shift object, which is based on a skin color probability map, with an adaptive step of resizing the tracking area. Since the CAMShift algorithm is able to track faces based on the likelihood of color, it can be used to track a hand.

The advantages of this algorithm are: low computing resources requirements, flexible settings for positioning accuracy, the ability to work in different lighting conditions. Also, an additional advantage of the algorithm is the ability to work in conditions of partial overlapping of the monitored object. The above properties of the algorithm are due to the use of an object model based on the histogram of brightness and color, and using the Mean Shift procedure to accurately position the position of the object [3].

3.2 Analysis and comparison of the implemented software for dynamic gestures recognition

Overview the most popular recognition software.

The Gesture Recognition Toolkit (GRT) is a cross-platform open-source C++ library designed to make real-time machine learning and gesture recognition more accessible for non-specialists. Emphasis is placed on ease of use, with a consistent, minimalist design that promotes accessibility while supporting flexibility and customization for advanced users. The toolkit features a broad range of classification and regression algorithms and has extensive support for building real-time systems. This includes algorithms for signal processing, feature extraction and automatic gesture spotting. [4].

XKin – is an open source platform for Kinect, providing more natural and intuitive communication between human and computer. The software package is endowed with useful tools for training the system to work with user-defined postures and gestures. The XKin project is fully implemented in C with open source and allows real-time recognition of both static hands positions and dynamic gestures [5].

HandVu – is a software complex for gestures recognition and management of graphical user interface based on computer vision. With virtually any color camera and sufficient processing power, this software can provide the ability to control computer interface. HandVu detects the hand in a standard posture, then tracks it and recognizes key postures – all in real-time and without the need for camera or user calibration. The output is accessible through a client-server infrastructure in a custom format and as OSC packets. [6].

Libraries implemented in C and C ++ are less computationally expensive than Java (Java-ML) or Python libraries, but are difficult to configure and require additional calibration of input devices. However, the main advantage is real-time work and simple, inexpensive technical means.

4. Gesture language development for the three-dimensional scene

Gestures – are nonverbal way of communication. Movement of fingers, hands, head, shoulders, facial expressions of the face: all these are gestures. With the help of gestures, a person can transmit an independent information unit, supplement a verbal series, convey feelings, etc. Usually gestures are divided into static (perceived simultaneously) and dynamic (perceived at some time) gestures that have a certain interpretation in hand-written alphabets and contactless man-machine interfaces [7].

Today, a computer mouse is usually used to work with three-dimensional models, which is not very convenient for this task. Having three-dimensional coordinates of the hand and fingertips, you can create a system that will allow you to control models in all directions of three-dimensional space.

To control the three-dimensional scene within the framework of this project, 4 types of dynamic gestures are considered. The developing system captures the hand in the unfolded state and tracks the movement of the hand in space (Fig. 4.1): left (1), right (2), up (3), down (4). The gesture is determined by the position of the palm relative to the video device.

Figure 4.1 – Gestures language for managing the three-dimensional scene

When developing the system, it is necessary to take into account the position of the hand to capture and start tracking movement in space, as well as the length of time the hand moves to determine the gesture. It is also necessary to skip arbitrary movements of the person's hands that do not belong to the given gestures.

Move the hand to the left, to the right – turn the scene around OY axis, and up, down – around the OY axis . As an example, turn the Globe (Fig. 4.2) to the left around the OY axis, which is located on a three-dimensional scene.

Figure 4.2 – Example of gesture control in a three-dimensional scene
Size: 90.3Кб; Frames: 5; Repeats: no limit; Delay: 1 sec.

Also, when developing, it is necessary to take into account the hardware capabilities of the video device and the speed of the computer system. The complexity of algorithms does not always allow performing the action in fractions of a second and require a certain time, which depends on the resolution of the image, on quality of survey and other parameters [8].

Conclusions

In the current work, existing methods and software for recognizing the person's gestures are examined and analyzed. Advantages and disadvantages of methods are also highlighted.

The main drawbacks of existing methods are sensitivity to lighting changes, the need for learning the system for each operator, low recognition gestures quality and small recognition speed.

To recognize complex hand gestures, you need to use specialized input devices, as well as configure and train software. But for dynamic gestures, it is important only to know the trajectory of the movement of the hand (left, right, up, down) it is enough to use not complicated mathematical algorithms.

Thus, the actual task is to create new models, methods and algorithms for hand gesture recognition that can be used to create systems of contactless human-machine interaction.