Olga Kulibaba

Faculty: Computers Sciences and Technologies

Department: Automated Control Systems

Speciality: Information Control System and Technologies

Theme of Master's Work:

Development of computer access control system using authentication by voice

Scientific Supervisor: Ph. D. Maxim V. Privalov

ABSTRACT

of the qualification master’s work

“Development of computer access control system using authentication by voice”

ACTUALITY OF THEME
PURPOSE AND TASKS
PLANNED SCIENTIFIC NOVELTY
REVIEW OF RESEARCHES AND DEVELOPMENTS ON THE TOPIC
4.1 At national level
4.2 At global level
ANALYSIS OF UNIQUE INDIVIDUAL FEATURES
CHOICE OF THE STRUCTURE OF COMPUTER ACCESS CONTROL SYSTEM USING AUTHENTICATION BY VOICE

LIST OF LITERATURE

1 ACTUALITY OF THEME

Information is the most expensive and popular commodity in our time. The main problem is the protection of information. Biometrics is a reliable and comprehensive technology for authenticate of users. Among the various biometric systems authentication by voice has the following advantages:

customary for a person method of authentication;
voice can not be stolen or forgotten at home;
authentication by voice does not require complex and expensive readers of biometric information.

Authentication by voice is rapidly evolving and is in increasing demand each year [1]. However unsolved problem is the choice of optimal set of features that would minimize the false rejection rate and false acceptance rate.

2 PURPOSE AND TASKS

The purpose of master’s work is the minimization of false rejection rate and false acceptance rate and increase the speed of authentication in computer access control system.

Necessary to solve the following tasks:

Analysis of unique individual features that characterize the personality of the speaker.
Substantiation and choice of these features.
Substantiation and choice of methods for classification of speakers.
Choice of the structure of computer access control system using authentication by voice.
Development of computer access control system using authentication by voice.
System test.

3 PLANNED SCIENTIFIC NOVELTY

Planned scientific novelty is the minimization of false rejection rate and false acceptance rate at the expense of choosing an effective combination of methods of feature extraction and their classifications.

4 REVIEW OF RESEARCHES AND DEVELOPMENTS ON THE TOPIC

4.1 At national level

In Ukraine this topic is researched at The Institute for Artificial Intelligence Problems [2], Kharkov National University of Radioelectroniks [3], The National technical university of Ukraine “Kyiv polytechnic institute”[4].

4.2 At global level

Foreign systems:

Voice Key Service;
SPIRIT SV-система;
Speech Secure.

Voice Key Service is a voice biometric authentication system, developed by Russian company "Speech Technology Center" [5]. SPIRIT SV-system is an authentication system, developed by Russian company SPIRIT Corp [6]. Speech Secure is an identification system by voice, developed by U.S. company Nuance Technology [7].

5 ANALYSIS OF UNIQUE INDIVIDUAL FEATURES

Feature extraction is the key to the front-end process in speaker identification systems. The performance of a speaker identification system is highly dependent on the quality of the selected speech feature. For a speech feature used in speaker identification to be effective, it should reflect the unique properties of the speaker’s vocal apparatus and contains little or no information about the linguistic content of the speech.

As a unique feature vector can be used one-dimensional frequency vector of cepstral coefficients and a vector consisting of its derivatives [8]. Cepstral coefficients are determined in accordance with the scheme shown in Fig. 5.1:

Figure 5.1 – The general scheme of cepstral signal analysis (FFT – block of fast Fourier transform signal, LOG – block of logarithm of spectrum, IFFT – block of inverse fast Fourier transform)

Also as a vector of features you can use the reflection coefficients.

The vocal tract can be modeled as an electrical transmission line, a waveguide, or an analogous series of cylindrical acoustic tubes [9]. At each junction, there can be an impedance mismatch or an analogous difference in cross-sectional areas between tubes. At each boundary, a portion of the wave is transmitted and the remainder is reflected (assuming lossless tubes). The reflection coefficients k_i are the percentage of the reflection at these discontinuities. If the acoustic tubes are of equal length, the time required for sound to propagate through each tube is equal (assuming planar wave propagation). Equal propagation times allow simple z transformation for digital filter simulation. For example, a series of five acoustic tubes of equal lengths with cross-sectional areas A₁, …, A_p is shown in Fig. 5.2. This series of five tubes represents a fourth-order system that might fit a vocal tract minus the nasal cavity. The reflection coefficients are determined by the ratios of the adjacent cross-sectional areas with appropriate boundary conditions. For a pth-order system, the boundary conditions given in Eq. (5.2) correspond to a closed glottis (zero area) and a large area following the lips.

A₀=0, A_p+1>>A_p

(5.1)

(5.2)

Figure 5.2 - Acoustic tube model of speech production

Narrow bandwidth poles result in |k_i|=1. An inaccurate representation of these RCs can cause gross spectral distortion. Taking the log of the area ratios results in more uniform spectral sensitivity. The LARs are defined as the log of the ratio of adjacent cross-sectional areas [10]:

(5.3)

Cross-sectional areas between tubes can use as vector of features. Shape of a vocal tract varies with age and independents of sore throat.

6 CHOICE OF THE STRUCTURE OF COMPUTER ACCESS CONTROL SYSTEM USING AUTHENTICATION BY VOICE

Structure of computer access control system using authentication by voice is shown in Fig. 6.1.

Figure 6.1 – Structure of computer access control system using authentication by voice (animation: volume – 50 480 byte; size – 771х453; consists of 4 frames; a delay between last and the first frames – 1 500 msec; a delay between frames – 800 msec; quantity of recycle – continuous)

This system consists of two basic subsystems: input voice signal subsystem and authentication subsystem. The first is located on the client side and provides the input voice message of user through the microphone. Message is written to .wav file in audio PCM format, 22050 kHz, 16 bit, mono. Then a signal goes to the authentication subsystem which is located on the server. The authentication subsystem consists of database, block of parameterization, block of learning, block of clustering and block of decision-making. Block of parameterization is responsible for feature extraction. Block clustering uses Fuzzy c-means algorithm. Block of decision-making creates a solution: rejection or acceptance of a speaker identity. Then a result goes (depending on the specific tasks) to execution unit, or to authorization subsystem.

LIST OF LITERATURE

Г.Н. Зубов, М.В.Хитров, Состояние и перспективы голосовой биометрии, 2007. [Электронный ресурс]: Режим доступа: URL: http://www.chip-news.ru/archive/chipnews/200710/Article_12.pdf
А.С. Алексеев, Е.Е. Федоров, Количественный анализ систем признаков и методов идентификации, Штучний інтелект, Институт проблем искусственного интеллекта,г.Донецк, № 3, 2005. [Электронный ресурс]: Режим доступа: URL: http://www.iai.dn.ua/ public/JournalAI_2005_3/Razdel7/02_Alekseev_Fedorov.pdf
Научная библиотека ХНУРЭ. [Электронный ресурс]: Режим доступа: URL: http://lib.kture.kharkov.ua/ua/bibllist/2.php
Научная электронная библиотека «ВЕДА». [Электронный ресурс]: Режим доступа: URL: http://www.lib.ua-ru.net/diss/cont/15579.html
Ю.Н. Хитрова, Применение речевой биометрии в системах ограничения доступа. [Электронный ресурс]: Режим доступа: URL: http://www.e-expo.ru/docs/sp/cat/data/media/18_ru.pdf
В.А.Свириденко, П.В.Мартынович, Системы верификации и идентификации диктора от SPIRIT Corp. [Электронный ресурс]: Режим доступа: URL: http://www.dancom.ru/rus/AIA/Archive/RUII_SPIRIT_ DOKLAD_R.pdf
Официальный сайт американской компании Nuance Technology. [Электронный ресурс]: Режим доступа: URL: www.nuance-tech.com
В.И. Галунов, Верификация и идентификация говорящего, С-Петербургский государственный университет, 2007.[Электронный ресурс]: Режим доступа: URL: http://www.auditech.ru/article/cntrid/click.php?action=download&id=21
Кучерявый А.А. Бортовые информационные системы: Курс лекций / Под ред. В.А. Мишина и Г.И. Клюева. - 2-е изд. перераб. и доп. - Ульяновск: УлГТУ, 2004. - 504 с.
Т.В.Шарий, О проблеме параметризации речевого сигнала в современных системах распознавания речи, Вісник Донецького національного університету,Сер.А:Природничі науки, № 2, 2008. [Электронный ресурс]: Режим доступа: URL: http://www.nbuv.gov.ua/ portal/Natural/VDU/a/2008_2/Control%20systems/9_Shariy.pdf
Маркел Дж., Грей А. Х. Линейное предсказание речи / Пер. с англ. М.: Связь, 1980.
David Chow, Waleed H. Abdulla Robust Speaker Identification Based on Perceptual Log Area Ratio and Gaussian Mixture Models. Auckland, New Zealand, 2002
Л.Р. Рабинер, Р.В. Шафер, Цифровая обработка речевых сигналов, М.: Радио и связь, 1981. - 495с

At writing of this abstract of thesis master's degree work is not yet completed. Final completion: December, 2010. Complete text of work and materials on the topic can be got for an author or his leader after the indicated date.

DonNTU >> DonNTU Master's portal
Autobiography| Abstract