Mikhail Titarenko
Computer Science and Technology Faculty
Department of Software Engineering
Specialty Software Engineering
Research of the classification methods of information of international trade activity of states within the framework of information retrieval system
Scientific supervisor: Ph.D., Associate Professor, Department of SE Skvortsov Anatoliy Yefremovych
Consultant: senior lecturer Kolomoitseva Irina Aleksandrovna

Search report

The presented report allows us to evaluate the information situation on the theme of master's work. It is the main documentary confirmation of the depth and completeness of information retrieval, and also serves to fix the current situation in the studied area.

The search is performed using four search engines (Google, Yandex, Bing, Meta). The results are summarized in the table. In total 20 questions related to the master's work have been completed. Of these, four queries correspond to the title of master's work in four languages, four queries from the head of the first name, as well as twelve queries with key concepts on the topic of master's work.

There below are two tables with search reports that divide the time period on three months, as well as a series of charts that allow you to compare the major changes that occurred during this period.

Search report on 14.09.2018
Search string
Russian
Исследование методов классификации информации о внешнеторговой деятельности государств в рамках информационно-поисковой системы 83000 167000000 30 39900
Коломойцева Ирина Александровна, ДонНТУ 28700 6000 10 16000
Классификация текстов 11300000 50000000 20000 5550000
Алгоритмы классификации текстов 2430000 76000000 8060 1117000
Классификация внешнеторговой информации 169000 7750 50000000 345000
Ukrainian
Дослідження методів класифікації інформації про зовнішньоторговельну діяльність держав в рамках інформаційно-пошукової системи 9100 208000000 0 3550
Коломойцева Ірина Олександрівна, ДонНТУ 2900 67000000 0 21
Класифікація текстів 3340000 52000000 12100 1710000
Алгоритми класифікації текстів 1820000 63600000 4300 928500
Класифікація зовнішньоторговельної інформації 132000 26100000 2850 65000
English
Research of information classifying methods on international trade activity of states within the framework of an information retrieval system 13200000 228000000 6490 6630000
Kolomoitseva Irina Aleksandrovna, DonNTU 31000 10000 0 14600
Text classification 343800000 81000000 7330000 150500000
Text classification algorithms 63700000 86000000 1420000 89750000
Classification of international trade information 763600000 79000000 3800000 373000000
Spanish
Investigación de métodos de clasificación de información sobre la actividad internacional comercial de los estados en el marco de un sistema de recuperación de información 9020000 45000000 2300 4250000
Kolomoitseva Irina Aleksandrovna, DonNTU 31000 10000 0 14800
Clasificación de texto 143800000 5800000 2380000 71800000
Algoritmos de clasificación de texto 3540000 6000000 51000 1720000
Clasificación de la información del comercio internacional 39700000 22000000 427000 19500000
Search report on 16.12.2018
Search string
Russian
Исследование методов классификации информации о внешнеторговой деятельности государств в рамках информационно-поисковой системы 83500 188000000 26 40300
Коломойцева Ирина Александровна, ДонНТУ 29000 11000 8 16200
Классификация текстов 11400000 66000000 25400 5600000
Алгоритмы классификации текстов 2450000 77000000 12500 1190000
Классификация внешнеторговой информации 345000 36000000 12400 171000
Ukrainian
Дослідження методів класифікації інформації про зовнішньоторговельну діяльність держав в рамках інформаційно-пошукової системи 9160 197000000 1 3560
Коломойцева Ірина Олександрівна, ДонНТУ 2900 67000000 3 18
Класифікація текстів 3360000 56000000 17200 1730000
Алгоритми класифікації текстів 1830000 67000000 4460 938000
Класифікація зовнішньоторговельної інформації 132000 29000000 2970 65600
English
Research of information classifying methods on international trade activity of states within the framework of an information retrieval system 13300000 225000000 6540 6700000
Kolomoitseva Irina Aleksandrovna, DonNTU 31200 17000 0 14800
Text classification 317000000 81000000 9550000 152000000
Text classification algorithms 64200000 86000000 2270000 90700000
Classification of international trade information 744200000 79000000 5900000 377000000
Spanish
Investigación de métodos de clasificación de información sobre la actividad internacional comercial de los estados en el marco de un sistema de recuperación de información 9080000 50000000 2330 4280000
Kolomoitseva Irina Aleksandrovna, DonNTU 31200 17000 0 15000
Clasificación de texto 144000000 6000000 3500000 71800000
Algoritmos de clasificación de texto 3540000 7000000 58900 1740000
Clasificación de la información del comercio internacional 39600000 19000000 620000 19700000
Results analysis

Comparing the results of queries from different search engines, one can come to the conclusion that it is rather difficult to identify a single leader among search engines. But as against the background of the others stand out Yandex and Google.

You can see that the number of pages found is correlated with the search string alphabet. So for Cyrillic queries, Yandex is much better suited for the task. However, if you evaluate the search terms in Latin, then the Google search engine is much better. The search engine Bing is based on the data explicit outsider, but it is necessary to take into account the aspect that the relevancy of the found pages wasn't evaluated, so it is unrealistic to talk about the shortcomings of the system. At the same time, the search engine META shows quite good results, but it should be kept in mind that it is built on the basis of Google search.

The dynamics of the number of search results for each search query in each search engine in time is shown in the diagram down below.


Picture 1 - Graph of the dynamics of output search result in time

As you can see from the diagram, the absolute majority of searches over time increased the number of found pages. Some queries saved almost identical amount of found materials, which was especially well reflected in Ukrainian and Spanish requests.

It should also be noted that some of the requests reduced the number of sites found and quite radical. This indicator suggests that search engines improve their search algorithm, filter out non-unique articles, or conduct a revision of index files.

According to the diagram, it is possible to distinguish two systems that are developing fastest in terms of the number of found pages. They are Yandex and Bing. These systems produced the greatest increases over time, namely 83% and 60%, respectively. These systems also showed the largest decrease, namely 28% and 20%, respectively. These data demonstrate that these systems are actively working with their index files and search relevancy.

According to the language principle, the undisputed leaders are Russian and English as the languages of international scientific communication, but the Spanish-speaking segment is developing very quickly at the moment. Ukrainian in this set does not produce such results, but it should be noted that it is a monocultural language by type of Japanese and Swedish, therefore, its comparison with the languages of international communication can't provide a reliable picture of the development of this language segment.