Recognition of speech - complex and a vital topic taking a worthy place among computer sciences. This process will consist of a stage of transformation of a voice in the text and from a stage of automatic interpretation of semantics of speech. Sometimes recognition of a voice mean only a problem of identification speaking. But frequently this problem is inseparable from a problem of recognition of speech as these processes are connected and together provide friendliness of the interface to the user. The problem of speech input of the information is complicated a number of factors: distinction of languages, specificity of a pronunciation, noise, accents, accents, etc. The Given work is devoted to development of receptions and algorithms of recognition of speech in Russian.
It is possible to allocate the following scopes of systems of recognition of a voice:
1. The interface between the person and a computer: it is obvious, that many people experience difficulties with dialogue with machine, the new way of dialogue with a computer - idle time, fast, intuitive is necessary. Systems of recognition of a voice force the machine to adapt to the person, rather the reverse. Huge advantage of systems of recognition of a voice that they are much faster than any other types of interfaces. The voice program of e-mail allows to include a computer, to dictate and send messages not touching to the mouse and the keyboard. Also people with physical defects will receive more effective way of interaction with a computer.
2. Information services. Speech is an ideal tool for reception of the information. And speech allows to adjust interaction with a computer. By means of systems of a spoken language the user and the machine can enter direct dialogue, gradually, step by step, coming nearer to the required information. For example, systems of recognition of a voice for providing of access to the databases containing digitized clips of news of tele-radio broadcasting, systems of the order of air tickets are developed.
3. Other человеко-machine interfaces - systems of recognition of persons and touch screens, promote acceleration of introduction of systems of speech dialogue - the tendency to creation of the combined systems is observed. Technologies of recognition of persons and voices have penetrated and into the bank world - together with cash dispenses. For last decade scopes of systems of recognition of speech have considerably extended and will continue to extend.
The basic methods of recognition of a voice
There are two types of sounds: sonorous and deaf. Sonorous are generated by vibration of vocal chords at passage of air. This acoustic signal is modulated by a pressure of vocal chords. Vibrations resound in the speech channel (it is a nose, a throat and an oral cavity). The stream of air creating a sound, refers to as " a wave formed in a voice crack ". This signal , and his period refers to as the period of the basic tone.
Recognition of phonemes For recognition of phonemes, groups of phonemes and words such methods, as latent model, neural networks or their combinations are used. Most frequently and successfully at recognition of phonemes and words it is used latent мрковская model, it is defined as set of conditions and transitions from one condition in another. If there is a transition with the certain probability certain target data will be observed. Besides the probability representing probability of transition from some condition in the following condition is connected to each transition. There is a set initial and set of final conditions. Any sequence of supervision grows out transition from one of initial conditions in one of final. This model provides rather natural representation of speech.
Understanding of speech. "To understand" speech is the most difficult. At this stage of sequence of words (offer) should be transformed into representations that wanted to tell speaking. The problem connected to recognition of a voice - recognition speaking, i.e. " who speaks process of automatic definition " on the basis of included in speech signal of the individual information. Thus speech can go about identification or about verification speaking. Identification is a presence in known set of control phrases of the copy corresponding to a manner of the given announcer to speak. Verification of the announcer is a definition of identity speaking: whether that is the person? The technology of recognition of the announcer allows to use a voice for maintenance of the control of access; for example, telephone access to bank services, to databases, to systems of electronic commerce or voice mail, and also access to the confidential equipment.
Also it is necessary to note existence of different approaches to construction of systems of recognition of speech.
Single-level - the signal shares on two words (for confident division in the elementary cases enough delays between words at a pronunciation). Words, in turn, are distinguished as a unit. Thus various methods of comparison with standards which kind depends on a technique of recognition are used: at use of methods of dynamic programming standards are represented in the same kind, as an acting signal (in view of division into words), at application of methods of decomposition in numbers, standards are sets of parameters of these lines
Result of work of this scheme is the word from the list of the standards were present in set or the message on a mistake if the received image does not correspond in a sufficient measure to any standard.
Lacks: necessity of creation of set of standards actually for each person (so-called process of training of system of recognition), impossibility of creation of automatic system of correction of standards, proportionality of time spent on recognition of a word, to quantity of standards, and necessity of a final choice from several possible variants. The scheme can be applied only if necessary recognition of the limited list of words of one or several operators. For example, in various control systems with a small amount of commands.
The basic problems worth on a way of development of systems recognition of speech:
These are the basic obstacles for the automated systems of recognition. Besides users should "inform" a computer that they to it address. For this purpose usually it is necessary to press the button or to make something in this sort. It not the best variant of the user interface. In the advanced systems use of the dialogue interface that allows the person to talk to machine is welcomed, to create and receive the information, to solve the problems. Systems with the dialogue interface differ on a level of initiative of the person or a computer. Researches were focused on " is mixed initiative " systems in which both the person, and a computer play equally active role in achievement of the purpose by means of dialogue.
The purpose of work is the research devoted to a problem of segmentation of together said phrases, studying and application the most perspective methods. Development of software product devoted to the given problem at the moment of a spelling of the author's abstract (June 2006) is not finished. Planned time of the termination of work - October 2006.