DonNTU - >Master's portal

Kravchenko Dima

Faculty: "Computer Information Technologies and Automatics"

Speciality: "Computer System Diagnostics of the

Medical and Technical Equipment"

Theme of master's work: "Methods and algorithms of SCS of authentification of user in a network on a timbre of voice"

  Leader of work: associate professor Pryvalov Maxim Vladimirovich

E-mail: chywoza@rambler.ru
ubersoft@fragment.dn.ua

 
Information
· Autobiography
· Dissertation
· Library
· Links
· Report about the search
· The individual task
 

Introduction

      Biometric devices of authentification exist about twenty years. For this time they from espionage films have moved on desktops and have essentially fallen in price. In the market there is a set of systems of bioidentification in cost from several tens up to several millions dollars. With their help it is possible to protect both separately costing personal computer, and a greater corporate network.
     But they require not only corporations. For such spheres of a life of a society as the boundary control, service and registration of passengers, electronic identification documents and maps, the prevention and disclosing of crimes, safety issues are priority, and in their decision the automated systems based on biometric methods can render the essential help.
     For universal introduction of these systems prepares and Microsoft, declared about plans of embedding in Windows mechanisms of protection on the basis of biometric technologies: the personal computer will learn the owner on prints of fingers, a voice, an iris of the eye of an eye.
     During performance of masters work, I plan to create specialized computer system of authentifications users of local computer networks on a timbre of a voice.
     In the basic part of the author's abstract I consider already existing operating time in the given area, and also methods objects in view necessary for achievement

  The basic part 

     Theoretical analogue developed ÑÊÑ in our territorial area is the system developed by students of Donetsk state institute of an artificial intellect. A problem of a department in the wide plan – training of a computer to job with oral Russian and Ukrainian speech. The systems distinguishing separately voiced words of beforehand set dictionary as whole are created. Such systems demand preliminary training - creation for each word of the voice standard. On their basis a number of applied programs, among them voice management of the mobile robot, the program of a voice set of mathematical formulas is developed. Now systems of phoneme recognition, demanding from the announcer of training only a small number of phonemes by preliminary pronouncing several tens specially picked up words are developed. Basically the problem of segmentation (allocation in a speech signal of the sites responding separate phonemes) is solved.

  Methods of recognition of a voice  

      Process of recognition of a voice consist following stages:
         1. Reception of a voice signal, preliminary processing of speech. Reception of a voice signal or digitization of a voice is defined as process of reception and transformation of an acoustic signal. The voice is represented as fluctuation of acoustic pressure in a microphone.

Figure 1. Representation of a signal

     There are two types of sounds: sonorous and deaf. Sonorous are generated by vibration of vocal chords at passage of air. This acoustic signal is modulated by a pressure of vocal chords. Vibrations resound in the speech channel (it is a nose, a throat and an oral cavity). The stream of air creating a sound, is called as " wave formed in a voice crack ". This signal is kvaziperiodic, and its period is called as the period of the basic tone.
         2. Recognition of phonemes (words). For recognition of phonemes, groups of phonemes and words such methods, as hidden markov model, neural networks or their combinations are used. Most often and successfully at recognition of phonemes and words it is used hidden markov model, she is defined as set of statuses and transitions from one status in another. If there is a transition with the certain probability certain target data will be observed. Besides the probability representing probability of transition from some status in a following status is connected with each transition. There is a set initial and set of final statuses. Any sequence of supervision grows out transition from one of initial statuses in one of final. This model provides natural enough representation of speech.
         3. Understanding of speech. " To understand " speech is the most difficult. At this stage of sequence of words (offer) should be transformed to representations that wished to tell spoken. The problem connected with recognition of a voice - recognition speaking, i. å. " Who speaks process of automatic definition " on the basis of the individual information entering into a speech signal. Thus speech can go about identification or about verification speaking. Identification is a finding in known set of control phrases of the copy corresponding a manner of the given announcer to speak. Verification of the announcer is a definition of identity speaking: whether that is the person? The technology of recognition of the announcer allows to use a voice for maintenance of the control of access; for example, telephone access to bank services, to databases, to systems of electronic commerce or voice mail, and also access to the confidential equipment. The basic methods which will be used at writing the programs for recognitions the user on a timbre of a voice: fast transformation Furie and Adaptive transformation Hermit.

Refresh page to start animation

Figure 2. SCS conceptual scheme
About animation: number of cycles - 10, number of cadres - 8, created in Easy Gif Animator, size - 60 Kb

  Practical value  

     It is possible to allocate following scopes of systems of recognition of a voice:
         1. The interface between the person and a computer: it is obvious, that many people experience difficulties with dialogue with machine, the new way of dialogue with a computer - idle time, fast, intuitive is necessary. Systems of recognition of a voice force the machine to adapt to the person, rather the reverse. Huge advantage of systems of recognition of a voice that they are much faster than any other types of interfaces. The voice mailer allows including a computer, to dictate and send messages not touching to the mouse and the keyboard. Also people with physical defects will receive more effective way of interaction with a computer.
         2. Information services. Speech is an ideal tool for reception of the information. And speech allows adjusting interaction with a computer. By means of systems of a spoken language the user and the machine can enter direct dialogue, gradually, step by step, coming nearer to the required information. For example, systems of recognition of a voice are developed for providing of access to the databases containing digitized clips of news of tele-radio broadcasting, systems of the order of air tickets.
         3. Other human-machine interfaces - systems of recognition of persons and touch screens, promote acceleration of introduction of systems of speech dialogue - the tendency to creation of the combined systems is observed. Technologies of recognition of persons and voices have got and into the bank world - together with cash dispenses. For last decade scopes of systems of recognition of speech have considerably extended and will continue to extend.

  Solved questions  

What is voice biometrics?

    Identification on a voice occurs under the following scheme: the system compares the sample of the voice presented in the digital form, with so-called " a voice print ", stored in a database. The voice is the unique biometric characteristic of the person and can be used for acknowledgement of its person.

What is " a voice print "?

   The digital image of unique characteristics of a voice is called as a "voice print". Voices differ, and these distinctions are caused by physiological characteristics, such as vocal chords, tracheas, nose pass; how language moves in a mouth, and how sounds and so on are taken. The combination of these characteristics is analyzed and it is represented unique for each person.

What is the difference between verification of speaker from recognition of speech?

     Recognition of speech is connected by that has been told, as is the main difference from the verification connected with what speaks. Systems of voice identification do not depend on any language or the dictionary. The person can tell everything and in any language that does these systems very "friendly" and ideal for international use.

How entering in a database is carried out?

     All process of entering of data borrows some minutes. The system suggests responding to some simple questions, for example, your name, a patronymic, a surname or date of a birth. Answers become identification phrases which will be used later for identification of the person. Remember, it is unimportant, that you will tell, the main thing as you it will tell; questions can be the most different, the main thing that the answer was well familiar to the person, and he could reproduce it any minute. For each question the user says four times the answer. The answer should consist at least of three syllables and last more seconds to create " a voice print ". The written down answers impose against each other, clean extraneous noise and in some seconds " the voice print " is ready. Then the system in the same way acts with other questions and answers (systems of safety suggest to do a little such " voice prints "). In some minutes " voice prints " which will be applied each time when the person will pass through security service are created.

How there is a verification speaking?

   The user says the certain phrases, and the system compares said with earlier saved by " a voice print ". The person says two or three identification phrases. If two said phrases pass the biometric test, the person of the person is identified. If one of these phrases is not accepted, the system addresses to the third said phrase and if she is accepted by system the person of the user also is identified. If the system is not assured of correctness of identification of the user after three said identification phrases, she refuses to the user in access and sends to the operator, or communication simply interrupts.

What does " the normal voice " mean?

     Also as well as with other applications of biometric technologies, the success of voice identification depends on the constant, steady sample. If to compare the given technology to identification on prints of fingers which assumes absence of cuts or a dirt for voice identification the constant, steady sample is means to speak normally, easy, that is in a usual manner. Also users should understand, that the chewing elastic band, a short wind, and also alcohol are negatively reflected in a voice.

If the person is chilled, whether that its voice will be identified?

      Not all characteristics of your voice will suffer, if you are chilled. The system of voice identification all the same can learn you in case of usual cold. At serious diseases of a throat, such as a laryngitis additional means of identification, certainly, be required.

What borders of recognition?

      During voice identification (comparison of the said phrase with earlier written down) stands out the list which shows how much close coincides the said identification phrase with brought in a database. The system gives out figures from-10,000 up to +10,000. In the theory, figure 0 or less zero shows " possibly deceiver "; the figure of more zero shows " possibly correct user ". To be assured in a high level of safety, saving friendliness of system, for each threshold the minimum is established. Also it is necessary to tell, that the established thresholds extend as on an estimation of false access (false acceptance rate FAR), and on an estimation of erroneous refusal (false reject rate FRR).

FAR and FRR. What is it designates?

    The level on which the system will pass users, is defined by each organization. Often the administration declares, that those whom the system regards, as the deceiver, to not pass at all (false acceptance rate FAR), and that no more õ % of correct, valid users can be not learned by system (false reject rate FRR). In a reality it is necessary to recognize, that any system cannot guarantee 100 % accuracy. FAR and FRR will change accordingly. Also much depends on characteristics of surrounding conditions, and also from a skill level of the personnel.

Whether really voice identification provides 100 % a guarantee of safety?

     There is no such decision, including and biometrics which could guarantee 100 % safety.

Whether the truth, what preservation of a voice print needs a lot of place?

     Depending on length of the steady sample, it is required to system from 20 up to 40 KB for a voice print. It is expected, that in the near future the sizes will be reduced up to 10-15 KB.

  The conclusion  

     Applications of systems of voice identification can be met worldwide. The companies of radio and telecasting use systems of voice identification for good safety the data passed to greater distances. The governmental agencies use such systems for protection vital and the classified information.
     The purpose of the given job is the research, devoted to a problem of a safety of intranetwork information resources by restriction of access to them under the biometric characteristic: to a timbre of a voice.
    Development of software product devoted to the given problem at the moment of a writing of the author's abstract (May 2008) is not finished. Planned time of the termination of job - November 2008.

The literature

1. Vintsuk Since " the Analysis, recognition and interpretation of speech signals. "-Kiev: Sciences mind, 1987. - 262 pgs.
2. Koffman " Introduction in the theory of indistinct sets " - Ì.:Radio and communication, 1982.-432 pgs
3. Sekunov N.Processing of a sound on PC. - SPB.:BHV-Peterbueg, 2001-1248 pgs.;
4. http://speech-soft.ru - A site devoted to speech technologies and recognition of speech
5. http://www.osp.ru - The analysis of the market of biometric systems of authentifications, the forecast of development by 2009.
6. http://www.keldysh.ru - neural networks analysis and comparison of time-and-frequency vectors on the basis of short-term spectral representation and adaptive transformation of Hermit.
7. " high technologies and intellectual systems in XXI century ". - the Collection of scientific jobs. Moscow. On March, 16-17th, 2000 Ñ.126-130.

To the beginning

© DonNTU 2008 Kravchenko Dima

Sections
· Introduction
· Basic part
   · Methods of recognition
    of voice
   · Practical value
   · Decided questions
· Conclusions
· Literature