Українська Русский
Abstract
Study of the development of a dynamic sign language for managing multimedia content

Contents

Introduction

Due to the increasing complexity of the solved scientific and technical problems, automatic processing and analysis of visual information are becoming increasingly important issues. These technologies are used in highly sought-after areas of science and technology, such as process automation, increasing productivity, improving the quality of manufactured products, monitoring production equipment, intelligent robotic systems, moving machine control systems, biomedical research and many others.

Computer vision is a dynamically developing are a of modern science. An integral part of computer vision is pattern recognition, which solves the problem of determining the belonging of an input image to one of the stored reference images of objects. When creating intelligent systems, it is also often necessary to track the position of moving objects in real time based on visual information obtained from the video camera. Having a number of successive digital images, you can select specific information about the object and then use it to detect the current position of the object and track its movements.

The main purpose of the gesture recognition research is to create a system that can identify specific human gestures and use them to transmit information or control the device.

1. Image Recognition Overview

Recognition – it is the ability of living organisms to detect in the flow of information from the senses, certain objects, patterns, phenomena. It can be carried out on the basis of visual, auditory, tactile information. Thus, a person can easily recognize another person he knows by looking at him or hearing his voice. Some animals actively use the sense of smell to recognize other individuals and search for food.

The recognition capability is based on the similarity of similar objects. Despite the fact that all objects and situations are unique in the strict sense, between some of them you can always find similarities on a particular basis. Hence the concept of classification – splitting the entire set of objects into disjoint subsets – classes whose elements have some similar properties that distinguish them from elements of other classes. And, thus, the task of recognition is to classify the objects or phenomena under consideration by their description into the necessary classes. Those. The concept of recognition can be extended if we talk about the detection of objects in the stream, not only sensual, but also any other information. For example, one can speak about the recognition of a disease by its symptoms in a patient or about the recognition of a social phenomenon from statistical information.

1.1 Types of tasks in recognition

Recognition systems have a typical functional scheme where the input data to be recognized is fed to the system input and subjected to preprocessing in order to convert them into the necessary form for the next stage or to extract the necessary characteristic features from them. Then, at the decision-making stage, a series of calculations are made on the processed data array and, based on their results, a response is generated containing the data expected from the system about the input data. The content of the input and output data is determined by the purpose of the system.

In addition to the described stages of the functioning of the recognition system, they should be configured for a variety of possible input data; This stage is called the system learning stage. The purpose of learning the system is to form in its memory a set of information necessary to recognize the intended class of input data.

At the preprocessing stage, the task of creating a formalized description of recognition objects suitable for use by the recognition algorithms is solved. As a rule, the initial data about the observed objects are presented in a form that is not suitable for direct recognition. These can be raster images, sound files, statistical data (numerical sets), videotapes. Some recognition algorithms require a higher level representation. This leads to the need to make one or more transformations of the original data, moving from code 0 to code 1, 2, etc. As an example, consider the image segmentation procedure, i.e. highlighting monochrome areas on it.

The decision making stage is the most significant in the cycle of the recognition system in terms of its characteristics as a whole. Those. The problem solved at this stage largely determines the purpose of the system. In addition, to enable the system to perform high-quality decision-making, a number of requirements are put forward for the training phase. Finally, as noted above, the decision-making algorithms require the necessary preprocessing of input data.

1.2 Image Preprocessing

Recognition operations on images of certain objects are usually preceded by image processing to create conditions that increase the efficiency and quality of the selection and recognition of the objects being studied or studied. Methods of preprocessing depend on research tasks, are quite diverse and can include, for example, highlighting the most informative fragments, increasing them, obtaining 3-dimensional images, color mapping, implementing high spatial resolution, improving contrast resolution, improving image quality, etc.

1.3 Highlighting feature tags

Selecting features allows you to simplify the implementation of recognition or identification of objects. When choosing the most informative features, it is necessary to take into account both the properties of the objects themselves and the possibility of the resolution of the primary imagers. We will select the features using the example of monochrome (single-layer) image processing. In color images, the considered algorithms can be applied to each color separately.

When processing, the following object attributes are preferred:

  • area and perimeter of the image of the object;
  • dimensions of inscribed simplest geometric shapes (circles, rectangles, triangles, etc.);
  • number and relative position of the corners;
  • moments of inertia of object images.

An important feature of most geometric features is invariance with respect to the rotation of an image of an object, and by normalizing geometric features relative to each other, invariance with respect to the scale of an object’s image is achieved.

2. Image Filtering

Usually, images formed by various information systems are distorted by the action of interference. This complicates both their visual analysis by a human operator, and automatic processing in a computer. When solving some image processing tasks, some components of the image itself can also act as noise. For example, when analyzing a satellite image of the earth's surface, there may be a problem of determining the boundaries between its individual sections — forest and field, water and land, etc. From the point of view of this task, the individual details of the image within the shared areas are a hindrance.

Weakening is achieved by filtering. When filtering, the brightness (signal) of each point of the original image distorted by interference is replaced by some other brightness value, which is recognized as the least distorted interference. The image is often a two-dimensional function of spatial coordinates, which changes along these coordinates more slowly (sometimes much more slowly) than the interference, which is also a two-dimensional function. This allows for the evaluation of the useful signal at each point of the frame to take into account some set of neighboring points, taking advantage of a certain similarity of the signal at these points. In other cases, on the contrary, abrupt changes in brightness are a sign of a useful signal. However, as a rule, the frequency of these differences is relatively small, so that over significant intervals between them the signal is either constant or changes slowly. And in this case, the properties of the signal appear when it is observed not only at a local point, but also when analyzing its surroundings. Note that the concept of a neighborhood is rather conditional. It can be formed only by neighbors closest in the frame, but there may be neighborhoods containing quite a few and fairly far removed points of the frame. In this last case, of course, the degree of influence of distant and close points on the decisions made by the filter at a given point in the frame will be completely different.

Thus, the ideology of filtering is based on the rational use of data from both the operating point and its surroundings.

The task is to find a rational computational procedure that would allow to achieve the best results. When solving this problem, it is generally accepted to rely on the use of probabilistic models of image and interference, as well as on the application of statistical optimality criteria. The reasons for this are clear – This is a random character, both of the information signal and interference, and this is the desire to get the minimum difference in the processing result from the ideal signal. The variety of methods and algorithms is associated with a wide variety of subjects, which have to be described by various mathematical models. In addition, various optimality criteria are applied, which also leads to a variety of filtering methods. Finally, even with the coincidence of models and criteria, it is often impossible to find an optimal procedure due to mathematical difficulties. The difficulty of finding exact solutions gives rise to various variants of approximate methods and procedures.

3. Highlight Borders

To successfully solve the problem of recognition, you must select the desired object in the image, and bring it to a normalized form suitable for recognition. To select the contours of objects, there is a method of optimal selection of borders

John Canny described a method (and algorithm) for detecting the borders (contours) of images based on the following three criteria:

  1. signal / noise ratio increase;
  2. correct positioning of boundaries with minimal systematic error;
  3. one detection (single response) to one boundary.

The Canny Method is a method based on selective digital filtering of the spatial function of an image of an object using the Canny-optimal operator – Gaussian σ (see Figure 3.1).

Gaussian operator formula

Figure 3.1 — Gaussian operator formula

where x - is a variable; s - is the standard deviation of the Gauss operator; * - “optimal” linear operator for convolution with image; k2 = 2.

If the Kanney optimal operator for selecting (detecting) the boundary (in the one-dimensional case) has the form (1), then in the two-dimensional case the derivative should be taken in the direction perpendicular to the border of the image, which should be estimated in the direction of the gradient of the smoothed image.

The Canny method is not limited to calculating the gradient of a smoothed image. In the border boundary, only the maximum gradient points of the image are left and the points near the border are removed. This method also uses information about the direction of the border. This is necessary to remove a point located near the boundary without breaking the boundary itself near the local maxima of the gradient. Weak borders are removed by using two thresholds. Fragment of the border is treated as a whole. Hysteresis uses two close thresholds. If the value of the modulus of the gradient vector at a defined point in space is below the first threshold value, then it is set to zero (the point is not an edge). When the value of the modulus of the vector is greater than the value of the second (high) threshold, then the point is made edge. In the case when the value of the modulus of the gradient vector is between the values of these two thresholds, it is set to zero if there is no path from this pixel to the pixel with the value of the modulus of the gradient vector greater than the value of the second threshold. The use of such a hysteresis can reduce the number of gaps in the output boundaries. Consequently, weak thresholds are removed with the help of two thresholds. The Kenny method is the most effective method of border delineation. Unlike other methods, it uses two thresholds of different values (for weak and strong boundaries) when defining light boundaries (contours).

The scale of the Gaussian sigma determines the value of the noise reduction factor: the wider the Gaussians, the greater the smoothing effect. The disadvantage is that increasing the scale reduces the accuracy of localization of the border.

It has been established that the use of bandwidth filtering and the Canny method (LoG and Canny algorithms) for noise suppression in the optimal method for noise suppression, on the one hand, increases the stability of the results, and, on the other hand, increases computational costs and leads to distortion and even loss of details . In particular, for example, the curvaceousness of objects is rounded and the borders at the junction points are destroyed.

Later, heuristic additions to the Canny method were proposed, allowing to connect the open end of the contour with nearby contours. In some cases, this leads to false boundaries.

Below in Figure 3.2, the process of selecting the contours of an object in an image using the Canny method is shown in stages.

The process of selecting the contours of an object in an image using the Canny method

Figure 3.2 — The process of selecting the contours of an object using the Canny method (animation: 6 frames, 10 repeat cycles, 236 kilobytes)

Findings

In the course of the scientific work, the preliminary processing of images, the selection of features of the object, the filtering of images and the method of selecting borders were considered.

Pre-processing of images implies the following concepts:

  • corrects the brightness and contrast of the images;
  • brightness histograms;
  • image alignment;
  • improved spatial resolution.

When processing, the following object attributes are preferred:

  • area and perimeter of the image of the object;
  • dimensions of inscribed simplest geometric shapes (circles, rectangles, triangles, etc.);
  • number and relative position of the corners;
  • moments of inertia of object images.

Notes

At the time of writing this essay, the master's work has not yet been completed. Estimated completion date: May 2019. Full text of the work, as well as materials on the topic can be obtained from the author or his manager after the specified date.

References

  1. Журавель И.М. Краткий курс теории обработки изображений / Журавель И.М. – М. : АHСССР,1987. –392 с.
  2. Можейко В.И. Автоматическое сопровождение объектов в компьютерных системах обработки изображений / В.И. Можейко, В.Т. Фисенко, Т.Ю. Фисенко // Оптический журнал. – 2007 – №11. – С. 39-46.
  3. Прэтт У. Цифровая обработка изображений / Прэтт У. – М. : Мир, 1982. – Т.1,2. – 791 c.
  4. Марр Д. Зрение. Информационный подход к изучению представления и обработки зрительных образов / Марр Д. – М. : Радио и связь, 1987. – 637 с.
  5. Гонсалес Р. Цифровая обработка изображений / Р. Гонсалес, Р. Вудс.; [пер. с англ.]. – M.: Техносфера, 2005. – 1072 c.
  6. Горьян И.С Введение в цифровую обработку изображений / И.С. Горьян, Ф.Д. Межов, В.Т. Фисенко. – СПб. : ЭИС им. М. Бонч-Бруевича, 1992. – 60 c.
  7. Красильников Н.Н. Графический интерфейс голоса и жестов / Красильников Н.Н. – М.: Вузовская книга,2001. – 320 с.
  8. Шапиро Л. Визуальная интерпретация жестов для взаимодействия человека и машины / Л. Шапиро, Дж. Стокман. ; [пер. с англ.]. – М. : БИНОМ. Лаборатория знаний, 2006. – 752 с.
  9. Павлидис Т. Алгоритмы машинной графики и обработки изображений / Павлидис Т. ; [пер. с англ.]. – М. : Радио и связь, 1986. – 400 с.
  10. Дегтярев А.А. Элементы теории адаптивного расширенного фильтра Калмана / А.А. Дегтярев, Ш. Тайль. – М. : Радио и связь, 2003. – 35 с.