Sergey Makogon

Faculty of science and technology

Department of Software Engineering

Specialty: Software engineering

Master's thesis topic: Development and Optimization of Optical Character Recognition Algorithms and their cross-platform software implementation using OpenCV

Scientific adviser: DTS, Sergey Zori

Abstract

At the time of publishing of this abstract the work on master's thesis is still in progress. The final version of the work will be available in June 2019. The text of the dissertation and materials on the topic of research can be obtained from author or his supervisor.

1. Introduction

Computer vision is a very capacious and comprehensive concept. The scope of its capabilities is enormous. From the recognition of written and printed text to the recognition of faces in a crowd to search for criminals, the determination of deviations in MRI images, the restoration of noisy images, the restoration of a full three-dimensional scene, etc.

Compared to other areas, the field of computer vision can be characterized as young, diverse and dynamic [1].

Due to the wide variety of problems and methods of solving them in the field of computer vision, there is a wide variety of software, both general purpose and intended only for highly specialized tasks.

Due to the fact that quite often, not one developer works on applications, but a whole team, there is a need to create a simple and understandable structure that accelerates the process of developing applications using computer vision technologies. This raises the problem of the need for simpler use of existing algorithms and the availability of standardized parameters, so that when working with a large project, developers can more easily understand each otherТs code. To improve the efficiency of solving computer vision problems, libraries and frameworks for various programming languages ??were developed, which implemented basic algorithms for working with images, there is a structured list of parameters necessary for each of the algorithms, and there is uniform documentation, according to which you can understand what each means parameter and what this or that function is responsible for.

2. Actuality of the dissertation theme

Currently, there are many different methods and algorithms that allow you to recognize the presence of text in an image, select it and digitize it for further use as data.

Text recognition is a task that every person constantly encounters daily, starting from the moment of studying the principles of reading and spelling. As of June 30, 2018, 4,285,727,287 people [2] have Internet access. Accordingly, all these people use search engines. To simplify the search for information about a particular subject, it is possible to use computer vision technologies, and in particular text recognition, to form a search query.

The masterТs work solves the problem of developing and optimizing optical character recognition algorithms and their cross-platform software implementation using the OpenCV library.

3. Aim and objectives of the research

Research Purpose — explore the possibilities of modern software and technologies to improve the efficiency of pattern recognition in solving practical problems of recognition and classification of transistor images.

Object of study — software technologies of computer vision.

Subject of research — methods and means of recognizing images of symbolic information.

Main research objectives:

  • Analysis of the functionality of computer vision libraries
  • Justifying the choice of computer vision library for solving problems in the field of optical character recognition
  • Designing an architecture for character recognition software
  • Investigation of the effectiveness of the implementation of a prototype software system for solving problems of recognition and classification of images of transistors based on the OpenCV library algorithms

4. Comparison of text recognition libraries

4.1 OpenCV

OpenCV (Open Source Computer Vision Library) is a library of open source computer vision and computer learning software. OpenCV was created to provide a common application infrastructure for computer vision and to accelerate the use of machine perception in commercial products. Being a licensed BSD product, OpenCV makes it easy for businesses to use and modify code [3].

The goal for the development of this library was to increase efficiency in real-time applications. The functional library is implemented in C. OpenCV has the ability to use multi-core processors. If automatic optimization is required on Intel platforms, additional acquisition and integration with the IPP library is possible.

The library includes more than 2500 optimized algorithms that include a complete set of classical and modern computer vision and machine learning algorithms. These algorithms can be used to detect and recognize faces, identify objects, classify human actions on video, track moving objects and camera movements, etc. OpenCV has more than 47 thousand community users, and the estimated number of downloads exceeds 14 million. The library is widely used in companies, research groups and government agencies.

In addition to such well-known companies like Google, Yahoo, Microsoft, Intel, IBM, there are many different startups, such as Applied Minds, VideoSurf and Zeitera, which also use OpenCV functionality.

The library has interfaces for C ++, Python, Java, and MATLAB and supports Windows, Linux, Android, and Mac OS. OpenCV focuses primarily on real-time applications and takes advantage of the MMX and SSE commands when they are available. Currently, full-featured CUDA and OpenCL interfaces are being actively developed. Figure 1 shows the general structure of the OpenCV project.

General structure of the OpenCV project

General structure of the OpenCV project

4.2 AForge.NET

AForge.NET is an open source library created in C# that is intended for developers and researchers in computer vision. In addition, the library has functionality for developers in the field of artificial intelligence. The library has a wide range of capabilities: image processing, neural networks, genetic algorithms, fuzzy logic, machine learning, robotics and much more [4].

The library includes several major components. AForge.Imaging - a library of routines for image processing and filters. AForge.Vision - computer vision library. AForge.Video - a set of libraries for working with video information. AForge.Neuro is a library for performing various actions and operations with neural networks. AForge.Genetic - a library of routines for using genetic algorithms to solve various problems. AForge.Fuzzy is a library for working with fuzzy logic. AForge.Robotics is a library that provides support for some of the techniques used in the field of robotics. AForge.MachineLearning - a library for working with machine learning elements [5]. There is, as well as the OpenCV library, an active community in which you can pick up the necessary information, asking questions to developers, or share your own work. But, unfortunately, the number of participants in this community is inferior to its number similar to OpenCV. Another limitation on the developerТs path is the fact that all library documentation is written only in English. In view of this, difficulties are possible in studying and mastering this framework. Figure 2 shows the general structure of the AForge.NET library.

General structure of the AForge.NET library

General structure of the AForge.NET library

4.3 VXL

VXL is a set of libraries written in C++ that are intended for research and implementation of computer vision technologies. VXL was written in ANSI/ISO C++ and intended for portable platforms. The library consists of several main components: VNL (numbers) - numerical algorithms and containers, for example, matrices, vectors, optimizers, etc., VIL (images) - loading, saving and editing images in many of the most common formats (there is also the possibility works with very large images), VGL (geometry) - the geometry of points, curves and other elementary objects in one-, two- and three-dimensional spaces, VSL (input and output streams), VBL (basic templates), VUL (utilities) - different functionality for independent [6]. The peculiarity of the library is that each of its components can be used separately, without referring to other components. Thus, in the application you can use only what is really necessary. Figure 3 shows the hierarchical structure of the VXL core.

Hierarchical structure of the VXL core

Hierarchical structure of the VXL core [7]

4.4 LTI-lib

LTI or LTI-lib is an object-oriented library of algorithms and data structures. It is often used in image processing and in computer vision. LTI-lib was developed as part of research projects in the field of computer vision with technologies of robotics, object recognition, voice and gestures. The main purpose of developing this library is to create an object-oriented library in C++, which would greatly simplify the use of code and its maintenance, but at the same time would provide fast algorithms that could be used in real-world applications.

The library was developed using GCC (a set of compilers used for various programming languages) under Linux and Visual C++ under Windows NT. Many classes encapsulate Windows/Linux functionality in order to simplify system or hardware tasks (for example, classes for multithreading and synchronization, time measurement and access to a serial port) [8].

5. The system of recognition and classification of images of transistors and the choice of computer vision library for it

To solve typical problems of recognition and classification of transistor images, the task is to develop a software system. It would be best to use a client-server architecture.

The system should have high resiliency, a stable connection to the Internet for data exchange, and a properly organized privacy policy for accessing information on the database server. The client is planned to develop under the Android platform, as one of the most common and affordable mobile systems.

Figure 4 shows the integrated scheme of this system.

Integrated scheme of the system of recognition and classification of images of transistors

Integrated scheme of the system of recognition and classification of images of transistors

Among the options considered for computer vision libraries, the use of the OpenCV library is optimal, since it is fast-working, it includes functions designed not only for text recognition, but also for image processing as a whole, which simplifies the structure of the future application and there is also a library implementation specifically for the Android OS. Figure 5 shows a diagram comparing the performance of the OpenCV, OpenCV + IPP, VXL and LTI libraries [16].

The OpenCV vs VXL vs LTI comparison chart

The OpenCV vs VXL vs LTI comparison chart [9]

6. Conclusions

When analyzing the data on the above computer vision libraries, the following conclusions were drawn in favor of using the OpenCV library to develop an application for the recognition of transistors.

  1. The main advantage of a computer vision library, like any other software, is its performance. As can be seen from the diagram in Figure 5, the performance of OpenCV exceeds that of analogs (VXL and LTI-lib) even without the use of the additional IPP component.
  2. The library contains a huge number of functions for solving various tasks, starting with image processing, computer vision and ending with machine learning. In addition, the library has open source code and a license that allows you to use all the functionality for developing commercial products.
  3. The advantage of OpenCV is the availability of Russian-language documentation, the presence of a huge number of tutorials, lessons, scientific materials and books on the use of the functional and methods of working with the library. Not to mention a very active community of developers and library users who can share their experience and answer questions of interest.
  4. The decisive advantage in choosing OpenCV was its cross-platform ability to use in conjunction with almost any programming language [10].

7. References

  1. Компьютерное зрение — Википедия [Электронный ресурс]. – Режим доступа: https://ru.wikipedia.org/wiki/Компьютерное_зрение. – Загл. с экрана
  2. World Internet Users Statistics and 2018 World Population Stats [Электронный ресурс]. – Режим доступа: https://www.internetworldstats.com/stats.htm. – Загл. с экрана
  3. Оптическое распознавание символов — Википедия [Электронный ресурс]. – Режим доступа: https://ru.wikipedia.org/wiki/Оптическое_распознавание_символов. – Загл. с экрана
  4. AForge.NET :: Framework [Электронный ресурс]. – Режим доступа: http://www.aforgenet.com/framework/. - Загл. с экрана.
  5. Применение библиотеки AForge.NET и ее расширения Accord.NET Framework при распознавании лиц в режиме реального времени | Статья в журнале «Молодой ученый» [Электронный ресурс]. – Режим доступа: https://moluch.ru/archive/154/43602/. - Загл. с экрана.
  6. VXL - C++ Libraries for Computer Vision [Электронный ресурс]. – Режим доступа: https://vxl.github.io/. - Загл. с экрана.
  7. What is VXL ? Vision Something Libraries A collection of Computer Vision libraries Open Source, grass roots effort, 53 developers –Supported by good community, - ppt download [Электронный ресурс]. – Режим доступа: https://slideplayer.com/slide/7433547. - Загл. с экрана.
  8. LTI-Lib [Электронный ресурс]. – Режим доступа: http://ltilib.sourceforge.net/doc/homepage/index.shtml. - Загл. с экрана.
  9. OpenCV vs VXL vs LTI: Performance Test - AI Shack[Электронный ресурс]. – Режим доступа: http://www.aishack.in/tutorials/opencv-vs-vxl-vs-lti-performance-test/. - Загл. с экрана.
  10. Зори, С.А. Сравнение популярных библиотек компьютерного зрения для использования в приложении по распознаванию транзисторов / С.А. Зори, С. А. Макогон // II Международная научно-практическая конференция "Программная инженерия: методы и технологии разработки информационно-вычислительных систем (ПИИВС-2018)" / Сборник научных трудов II Международной научно-практической конференции, Том 1. — Донецк, ДонНТУ — 2018,. – С. 66-70.