Software engineering
Development and Optimization of Optical Character Recognition Algorithms and their cross-platform software implementation using OpenCV
At the time of publishing of this abstract the work on master's thesis is still in progress. The final version of the work will be available in June 2019. The text of the dissertation and materials on the topic of research can be obtained from author or his supervisor.
Computer vision is a very capacious and comprehensive concept. The scope of its capabilities is enormous. From the recognition of written and printed text to the recognition of faces in a crowd to search for criminals, the determination of deviations in MRI images, the restoration of noisy images, the restoration of a full three-dimensional scene, etc.
Compared to other areas, the field of computer vision can be characterized as young, diverse and dynamic [1].
Due to the wide variety of problems and methods of solving them in the field of computer vision, there is a wide variety of software, both general purpose and intended only for highly specialized tasks.
Due to the fact that quite often, not one developer works on applications, but a whole team, there is a need to create a simple and understandable structure that accelerates the process of developing applications using computer vision technologies. This raises the problem of the need for simpler use of existing algorithms and the availability of standardized parameters, so that when working with a large project, developers can more easily understand each otherТs code. To improve the efficiency of solving computer vision problems, libraries and frameworks for various programming languages ??were developed, which implemented basic algorithms for working with images, there is a structured list of parameters necessary for each of the algorithms, and there is uniform documentation, according to which you can understand what each means parameter and what this or that function is responsible for.
Currently, there are many different methods and algorithms that allow you to recognize the presence of text in an image, select it and digitize it for further use as data.
Text recognition is a task that every person constantly encounters daily, starting from the moment of studying the principles of reading and spelling. As of June 30, 2018, 4,285,727,287 people [2] have Internet access. Accordingly, all these people use search engines. To simplify the search for information about a particular subject, it is possible to use computer vision technologies, and in particular text recognition, to form a search query.
The masterТs work solves the problem of developing and optimizing optical character recognition algorithms and their cross-platform software implementation using the OpenCV library.
Research Purpose — explore the possibilities of modern software and technologies to improve the efficiency of pattern recognition in solving practical problems of recognition and classification of transistor images.
Object of study — software technologies of computer vision.
Subject of research — methods and means of recognizing images of symbolic information.
Main research objectives:
OpenCV (Open Source Computer Vision Library) is a library of open source computer vision and computer learning software. OpenCV was created to provide a common application infrastructure for computer vision and to accelerate the use of machine perception in commercial products. Being a licensed BSD product, OpenCV makes it easy for businesses to use and modify code [3].
The goal for the development of this library was to increase efficiency in real-time applications. The functional library is implemented in C. OpenCV has the ability to use multi-core processors. If automatic optimization is required on Intel platforms, additional acquisition and integration with the IPP library is possible.
The library includes more than 2500 optimized algorithms that include a complete set of classical and modern computer vision and machine learning algorithms. These algorithms can be used to detect and recognize faces, identify objects, classify human actions on video, track moving objects and camera movements, etc. OpenCV has more than 47 thousand community users, and the estimated number of downloads exceeds 14 million. The library is widely used in companies, research groups and government agencies.
In addition to such well-known companies like Google, Yahoo, Microsoft, Intel, IBM, there are many different startups, such as Applied Minds, VideoSurf and Zeitera, which also use OpenCV functionality.
The library has interfaces for C ++, Python, Java, and MATLAB and supports Windows, Linux, Android, and Mac OS. OpenCV focuses primarily on real-time applications and takes advantage of the MMX and SSE commands when they are available. Currently, full-featured CUDA and OpenCL interfaces are being actively developed. Figure 1 shows the general structure of the OpenCV project.
General structure of the OpenCV project
AForge.NET is an open source library created in C# that is intended for developers and researchers in computer vision. In addition, the library has functionality for developers in the field of artificial intelligence. The library has a wide range of capabilities: image processing, neural networks, genetic algorithms, fuzzy logic, machine learning, robotics and much more [4].
The library includes several major components. AForge.Imaging - a library of routines for image processing and filters. AForge.Vision - computer vision library. AForge.Video - a set of libraries for working with video information. AForge.Neuro is a library for performing various actions and operations with neural networks. AForge.Genetic - a library of routines for using genetic algorithms to solve various problems. AForge.Fuzzy is a library for working with fuzzy logic. AForge.Robotics is a library that provides support for some of the techniques used in the field of robotics. AForge.MachineLearning - a library for working with machine learning elements [5]. There is, as well as the OpenCV library, an active community in which you can pick up the necessary information, asking questions to developers, or share your own work. But, unfortunately, the number of participants in this community is inferior to its number similar to OpenCV. Another limitation on the developerТs path is the fact that all library documentation is written only in English. In view of this, difficulties are possible in studying and mastering this framework. Figure 2 shows the general structure of the AForge.NET library.
General structure of the AForge.NET library
VXL is a set of libraries written in C++ that are intended for research and implementation of computer vision technologies. VXL was written in ANSI/ISO C++ and intended for portable platforms. The library consists of several main components: VNL (numbers) - numerical algorithms and containers, for example, matrices, vectors, optimizers, etc., VIL (images) - loading, saving and editing images in many of the most common formats (there is also the possibility works with very large images), VGL (geometry) - the geometry of points, curves and other elementary objects in one-, two- and three-dimensional spaces, VSL (input and output streams), VBL (basic templates), VUL (utilities) - different functionality for independent [6]. The peculiarity of the library is that each of its components can be used separately, without referring to other components. Thus, in the application you can use only what is really necessary. Figure 3 shows the hierarchical structure of the VXL core.
Hierarchical structure of the VXL core [7]
LTI or LTI-lib is an object-oriented library of algorithms and data structures. It is often used in image processing and in computer vision. LTI-lib was developed as part of research projects in the field of computer vision with technologies of robotics, object recognition, voice and gestures. The main purpose of developing this library is to create an object-oriented library in C++, which would greatly simplify the use of code and its maintenance, but at the same time would provide fast algorithms that could be used in real-world applications.
The library was developed using GCC (a set of compilers used for various programming languages) under Linux and Visual C++ under Windows NT. Many classes encapsulate Windows/Linux functionality in order to simplify system or hardware tasks (for example, classes for multithreading and synchronization, time measurement and access to a serial port) [8].
To solve typical problems of recognition and classification of transistor images, the task is to develop a software system. It would be best to use a client-server architecture.
The system should have high resiliency, a stable connection to the Internet for data exchange, and a properly organized privacy policy for accessing information on the database server. The client is planned to develop under the Android platform, as one of the most common and affordable mobile systems.
Figure 4 shows the integrated scheme of this system.
Integrated scheme of the system of recognition and classification of images of transistors
Among the options considered for computer vision libraries, the use of the OpenCV library is optimal, since it is fast-working, it includes functions designed not only for text recognition, but also for image processing as a whole, which simplifies the structure of the future application and there is also a library implementation specifically for the Android OS. Figure 5 shows a diagram comparing the performance of the OpenCV, OpenCV + IPP, VXL and LTI libraries [16].
The OpenCV vs VXL vs LTI comparison chart [9]
When analyzing the data on the above computer vision libraries, the following conclusions were drawn in favor of using the OpenCV library to develop an application for the recognition of transistors.