Anton Karchin

Faculty: Computers Science

Speciality: The software of the automated systems

Theme of master's work: Clusterization methods for videodata retrieved

Supervisor: Ph.D. Olga Vovk

Abstract

Master's Qualification Work

"Clusterization methods for videodata retrieved"

Introduction

The constant expansion, capacity of communication channel and disk space volume were the result of large electronic libraries, which are open for public use in the internet.

And there emerged a necessity of automatic indexing and video data search methods.

Relevancy

In this paper we present solution of actual tasks included in working-out of video data search method. Today there is no problem in digitalization of large amount of visual material in a technical sense. The growing volume of digital data medium allows to save huge amount of video data. People can save not only up to date information but also archive record.

Nowadays we face the problem of providing the efficient search of information. Everyone who tries to find image or video knows about it. Traditional position of search was used with the help of scheme ranking which is similar to web search. But search by name, authors, themes, key words is insufficient. Indeed it is wasteful and not good for completed description of video data fill-in. Lack of uniqueness between visual content and textual description leads to the index fall of figure accuracy, recall, absence of video search by extract or key-frame [1].

Task assignment

It is the purpose of this paper to study the existing methods of video data search and evaluation of advantages and disadvantages.

For search information in electronic libraries we need the creation methods and search-case consumption which reflect visual image of video data. The pattern recognition techniques and scene understanding are used in closed application environment due to absence of universal algorithms. The modern technology access to video data is connected with correlation of set visual primitive operation to picture (color, shape dimensions) and evaluation assessment of adjacency picture and video segment by value primitive operation. Visual primitive [1] is image characteristics which automatically reevaluated by digitize visual data. They allow to index them and handle requests with the use of visual parameter of picture. Search-case of picture which was generated from visual primitive is not big in size in comparison to image and convenient for search. Search of picture similarity replaces the known match operation request. We search similarity of image-pattern to video segment by update values of separate visual primitive. The system defines their difference and sort video files in accord with pattern. This search doesn’t predict the object identification. Nevertheless search method on the model of visual primitive is effective and heal-all [1].

There are lots of video qualifications and procedures to get them. They allow to describe video segments in terms of visual property.

One must have quite large data base of visual property and controlled plant characteristic for video search. There are several steps to highlighting: time slotting, segregation of key-frame, indexation of key-frame, indexation on movement.

Video feature extraction

Figure 1 - Video feature extraction (animation: size - 17,9 КB; image size - 560x417px; shots quantity - 5; delay between shots - 165 ms; delay between last and first shot - 170 ms; number of repetition cycles - 5).

Real-world value

Design a system is program product which allows to form character base of video frame and search.

One has to set time of beginning and end when searching the frame in data base. This method makes easier the video data search in the Internet.

Methods of video search

The most popular methods are: method of color hystogram, method of optical flow, method of image segmentation.

1. Method of color histogram

Method of color histogram uses color specification for image indexation. Also we can use average color or elementary color for local action serialization of picture area.

All colors are divided into set of nonoverlapping subcollections. For the image we create histogram which reflects part of each subcollection in general color gamut picture. There is concept of interval between them to compare histograms [1].

This method is not good for images which have unitorm background and few objects. Histogram won`t reflect object moving from one part to another one if this object is not overlapped with another one. If you devide image to parts you will get good results. The comparison of images is based on distance formed as Euclidean in space interval between hystogram parts, the bottom line is radical from squared sum.

2. Method of optical flow

Optical flow is velocity structure of isolated point image motion. It is used for motion detection, image segmentation, time and movement rate determination, direction observation movement.

There are several optical flow computing algorithms. The most popular is difference method of optical flow calculation. Key-notes: the intensity of object points is the same, straight change of amount speed point-to-point.

Complex data of movement has to be transformed to easy and suitable form for indexation [2].

3. Method of image segmentation

Image segmentation is image decomposition to objects with different characteristic. [3] These objects are defined by position in the picture and by sizes. Besides they are connected with implication primitive [1].

There are two classes: automatic and interactive. Automatic are very interesting for science as they don`t require interaction with user.

Tasks of automatic segmenting:

  • segregation of picture area with known property;
  • image decomposition on homogeneous domain.

There is an essential difference between these two tasks. The first case assumes the search of definite objects with priori information. The segmentation is used in tasks of computer vision.

The segmentation in the other case is used at the beginning for getting the image in more accessible format for use. There is no priori information about image.

There is analogy to clustering task in segmentation task assignment. It is enough to assign image of a picture point in some object space and add metric at large of feature to trace segmentation problem to problem of clustering.

Clusterization is grouping of similar objects. It is one of fundamental problem in the context of data analysis [4].

As feature of image point we can use its color in some color space, for example Euclidean distance between vectors in feature space. The result is quantification of color for picture [3].

There are two types of algorithms: multilevel and nonhierarchical. The first one works only with categorical attributes, when person builds tree of enclosed clusters. For this we often use agglomerate methods of heir cluster formation. Multilevel algorithms provide with high quality of clusterization and don`t require preliminary parameter specification.

Nonhierarchical algorithms are based on optimization of goal function which determine the best set partition of object on cluster. The algorithms of k-means population which use sum of squares of balanced coordinate deviation as goal function. Clusters should be spherical or oval form.

We emphasize Expectation-Maximization algorithms in nonhierarchical group. Here we use probability-density function for each cluster instead of cluster core.

Scientific newness

Scientific newness in this paper is the following:

  • we suggest to implement conversion of statistical images out of color space RGB to CIE-L*u*v* space on preprocessing step;
  • we analyse primitive of individual cluster by feature extraction of some images;
  • we conduct image segmentation of different part of the image on the point of video clip segregation.

Basic result

The results of research:

  1. We made theoretical study of existing approaches and methods for problem solution of video data search.
  2. We found basic orientations for optimization of analysed method.
  3. 3. We developed program system of segregation video data. It allows:
    • to chose video file for distribution on segments;
    • to look through segmentation result;
    • to save results in data bas.
  4. 4. We realized search application of video frames with the help of primitive of color histogram.

Conclusion

This Master's Thesis is devoted to relevant topic of video search among contents.

Analyzed methods allow to choose direction for further system design of effective video search. At bottom we will use cluster methods for segregation of key images and visual primitives will be used for comparison during video pattern search.

Finally the search of video data by contest is not implemented in any search system.

References

  1. Байгарова Н.С., Бухштаб Ю.А., Евтеева Н.Н., Корягин Д.А. Некоторые подходы к организации содержательного поиска изображений и видеоинформации [Электронный ресурс] / Институт прикладной математики им. М. В. Келдыша РАН, - http://www.keldysh.ru/papers/2002/prep78/prep2002_78.html
  2. Байгарова Н.С., Бухштаб Ю.А., Евтеева Н.Н. Современная технология содержательного поиска в электронных коллекциях изображений [Электронный ресурс] / Институт прикладной математики им. М.В. Келдыша РАН, - http://www.elbib.ru/index.phtml?page=elbib/rus/journal/2001/part4/BBE
  3. Вежневец А., Баринова О. Методы сегментации изображений: автоматическая сегментация [Электронный ресурс]: http://cgm.computergraphics.ru/content/view/147
  4. Паклин Н. Алгоритмы кластеризации на службе Data Mining [Электронный ресурс]: http://www.basegroup.ru/library/analysis/clusterization/datamining
  5. Байгарова Н.С., Бухштаб Ю.А. Проект «Кинолетопись России»: представление и поиск видеоинформации // I Всероссийская конференция «Электронные библиотеки». - Санкт-Петербург, 1999. - с. 209-215.
  6. Вовк О.Л. Особенности контекстного поиска кластеризированных изображений // VII международная конференция «Интеллектуальный анализ информации ИАИ-2007». – Киев, 2007. - с. 22-31.
  7. Башков Е.А., Вовк О.Л. Статистическая кластеризация для выделения регионов изображений // V международная конференция «Интеллектуальный анализ информации ИАИ-2005». – Киев, 2005. - с. 50-59.
  8. Башков Е.А., Вовк О.Л. Математическая модель статестического иерархического алгомеративного метода кластеризации изображений // Научные труды Донецкого национального технического университета. Серия «Информатика, кибернетика и вычислительная техника». Выпуск 8 (120) – Донецк: ДонНТУ, 2008. - с. 39-46.
  9. Вовк О.Л. Новый подход к выделению визуально подобных цветов изображений // Проблемы управления и информатики Институт кибернетики В.М. Глушкова НАН Украины, институт космический исследований. – Киев, 2006 — № 6. - c. 100–105.
  10. Zhong Di,Zhang Hong-Jiang,Chang Shih-Fu Clustering Methods for Video Browsing and Annotation [Internet resource]: [PDF] http://www.ee.columbia.edu/dvmm/publications/96/zhong96a.pdf

Remark

The master’s work was not completed yet while writing this abstract of thesis. The date of completing is 1 December, 2009. The full text of the work and the material on this subject can be received from the author or its curator after the given date.