^
Русский  Українська

Mihnevich Pavel
Faculty of Computer Science and Technology
Department of Artificial Intelligence and Systems Analysis
Speciality Intelligent Systems Software Technologies
Theme of final work Web service development for phonetic processing of textual information
Scientific advisor: Assoc. Prof. Kravets Tatyana


Mihnevich Pavel

DonNTU  Masters' portal
  1. Introduction
  2. Relevance and motivation of the topic
  3. The purpose and goals of the study, the planned results
  4. Review of research and development
  5. Work a linguistic processor
  6. Statement of own results
  7. Conclusions
  8. List of sources

Introduction

With the invention of the transistor and the emergence of a new generation of computers, as well as the first programming languages, among others, experiments also began in directions that would later be titled Computational linguistics and the Processing of Natural Languages. Decades later, these areas are still replete with open questions and are ready for new solutions [1].

1. Relevance and motivation of the topic

Solving the problems of processing natural languages ​​will bring human and computer interaction to a new level and will invariably lead to the development of related areas like computational linguistics and artificial intelligence.

Processing natural languages ​​is an extremely difficult task. Natural languages, unlike artificial languages, were formed not by scientists, but by history, which entails a certain burden of language development. In addition, the tasks of processing natural languages ​​are different for each of them, which means that there are no solutions that can be projected on all of them, which also significantly complicates the development of this industry.

Most projects in the direction of processing natural languages ​​are created as lemmas of the linguistic world, i.e. they explore natural languages, create new ideas, form methodologies, methods, approaches, based on which, projects are subsequently created that are responsible for the practical application of this knowledge.

2. The purpose and goals of the study, the planned results

The aim of the work is to develop a web service for the phonetic analysis of poetic works.

Text will be sent to the service input, and phonetic repetitions, their associative force indexes, common chains will be indicated at the output.


The overall task is divided into two categories:


In a simplified form, the received text passes through the following stages:

  1. preliminary processing:
    • special character clearing
    • automatic error correction
    • definition of accent in the dictionary (including side)
  2. phonetic conversion
  3. splitting into potential syllables
  4. analysis of each potential syllable into a phonosyllabic complex with a specified number of consonants (2/3/4+)
  5. search for repeats by phonosillabs
  6. calculating the index of associative force

The web service should have the following functionality:


Separately, it is worth highlighting error handling. Since All text entering the input is entered directly by the user without restrictions, it is necessary to handle all possible input errors, especially when working with dictionaries in which the input text must directly follow the pattern, and with each exception, the user must be notified exactly how he should change the text to get rid of the error.

Also, an important element of the web service is the possibility of interactive input. In other words, changing the text for analysis or settings affecting it, cause an asynchronous download of new analysis results on the same page, without reloading the page itself.

3. Review of research and development

The work is carried out within the scope of processing natural languages, but has a main focus on poetics.

Poetics is a linguistic study of the poetic function of verbal messages in general and poetry in particular [2].

The ancestor of the term is considered to be Aristotle, with his treatise of the same name [3], in which the aesthetic side of poetics was described. In the same epoch, the development of poetics was given by Quint Horace Flaccus in his work On Poetic Art [4]. Poetics was also considered in each of the epochs, down to German idealism. And only at the beginning of the twentieth century, the ideas of the verse melody were described [5].

Of particular interest in this development are the works of George Vekshin [6-10].

4. Work a linguistic processor

The basis of natural language processing is a linguistic processor. He received the greatest fame in speech synthesis problems.

In general, the linguistic processor consists of three blocks. At its entrance serves the usual text.

Model of the linguistic processor

Figure 1 – Model of the linguistic processor

The first block is called the text preprocessing block. At this stage, the text is cleared of service symbols, for speech synthesis problems, abbreviations and abbreviations are revealed, the numeral conversion is performed, and formulas are converted. For the tasks of phonetic analysis of the abbreviation should be kept in its original form, because it is in this form that they are most often used in speech.

The model of the first block LP

Figure 2 – Model of the first block of LP

After the first stage, the text remains normalized. At the second stage it is necessary to carry out one-time text processing For this, the text is divided into meaningful units. For problems of speech synthesis, this unit will be syntagmas - segments consisting of one or several words united by intonation [11]. After separation of the text phrasal stress is highlighted. At the end of the block, intonational marking is performed, and pausing is the arrangement of the pause duration. Thus, the normalized text becomes syntagmatically tagged.

The model of the second block LP

Figure 3 – The model of the second block LP

At the third stage, word processing is carried out. In the process of this unit, word stress is highlighted. Both main and side. After that, the phonetic words are combined - the removal of intersections between shock and unstressed words. At the end of the word processing block, and the linguistic processor as a whole, phoneme transcription is performed — the conversion of spelling text into phoneme [12], according to the rules of Russian phonetics [13].

The model of the third block LP

Figure 4 – The model of the third block LP

5. Statement of own results

By the time of completion of the abstract, the system meets most requirements. The web service is hosted on a temporary server and accepts test analyzes of several users.

Analysis of the verse is performed in full. The results obtained in the analysis process are cross-checked, and are correct.

Modification of post-processing analysis results is required.

User functionality is also consistent with the goals, but still requires refinement of styles and small changes in the display of results. At the moment, many of them do not carry useful information to the end user and are necessary only for the internal operation of the algorithm. But based on them, adding a few new features, you can get significant information. It is also possible to expand the auxiliary functionality that does not affect the test results, for greater ease of use of the service.

Conclusions

Processing natural languages ​​is quite a difficult direction, primarily due to the fact that in different languages ​​there can be different conditions and goals of tasks, as a result of which researchers of different linguistic groups often solve local tasks and are also burdened specific language.

The frontier of poetics in computational linguistics is still extremely small, but it is of real interest to researchers and has the potential to develop both in the field of artificial intelligence and in its own. New analysis tools can have a very positive effect on further research and development in this area.

List of sources [ru]

  1. Manning C.D. Foundation of statistical nature language processing / C.D. Manning. – 1992. – Vol.12, N 4. – 89-94 p.
  2. Jacobson, R. Works on poetics / R. Jacobson – Progress, 1987. – 81 p.
  3. Aristotle. Poetics / Minsk: Literature, 1998.
  4. Horace. On the poetic art / Science, 1981.
  5. Eichenbaum, B. Melodic of Russian lyric verse / B. Eichenbaum – OPOZAZ, 1922.
  6. Vekshin, G. Metaphony in the sound repetition (to the poetic morphology of the word) / G. Vekshin – New Literary Review № 90, 2008. – 229–250 p.
  7. Vekshin, G. Essay on text phono history: Sonic repetition in the future of semantic formation / G. Vekshin – М., 2006. – 462 p.
  8. Vekshin, G. On the ratio of super-segment and segment-sound organization of a poetic text / G. Vekshin – Inter-level links in the language system: Sat. scientific works. – Publishing house UDN, 1989. – 86–93 p.
  9. Vekshin, G. The stream of speech and the sense-forming role of sound: the transformation of the random into the necessary / G. Vekshin – Actual problems of linguistics in high school and school: Sat. scientific works. – Penza, 1997.
  10. Vekshin, G. Communication languages ​​and functional styles (in their relation to the text) / G. Vekshin – MGUP Publishing House, 2002. – 35–67 p.
  11. Syntagma. The definition of the term – [Electronic resource]. – Access mode: http://scicenter.online/russkiy-yazyik-scicenter/sintagma-122760.html
  12. Definition of the lexical processor at the lectures on linguistics of the Belarusian State University of Informatics and Radioelectronics – [Electronic resource]. – Access mode: https://studfiles.net/preview/1401101/page:48/
  13. Description of the phonetics and phonology of the Russian language – [Electronic resource]. – Access mode: https://www.wikiwand.com/ru/Русская_фонетика
The site was developed as an independent work on the discipline internet technologies,
in the 2018–2019 academic year, in accordance with the requirements and limitations presented in the task.
All information and its appearance are relevant at the end of 2018.