|
|||||||
ABSTRACT of the qualification master’s work “Development of computer access control system using authentication by voice”
1 ACTUALITY OF THEME Information is the most expensive and popular commodity in our time. The main problem is the protection of information. Biometrics is a reliable and comprehensive technology for authenticate of users. Among the various biometric systems authentication by voice has the following advantages:
Authentication by voice is rapidly evolving and is in increasing demand each year [1]. However unsolved problem is the choice of optimal set of features that would minimize the false rejection rate and false acceptance rate. 2 PURPOSE AND TASKS The purpose of master’s work is the minimization of false rejection rate and false acceptance rate and increase the speed of authentication in computer access control system. Necessary to solve the following tasks:
3 PLANNED SCIENTIFIC NOVELTY Planned scientific novelty is the minimization of false rejection rate and false acceptance rate at the expense of choosing an effective combination of methods of feature extraction and their classifications. 4 REVIEW OF RESEARCHES AND DEVELOPMENTS ON THE TOPIC 4.1 At national level In Ukraine this topic is researched at The Institute for Artificial Intelligence Problems [2], Kharkov National University of Radioelectroniks [3], The National technical university of Ukraine “Kyiv polytechnic institute”[4]. 4.2 At global level Foreign systems:
Voice Key Service is a voice biometric authentication system, developed by Russian company "Speech Technology Center" [5]. SPIRIT SV-system is an authentication system, developed by Russian company SPIRIT Corp [6]. Speech Secure is an identification system by voice, developed by U.S. company Nuance Technology [7]. 5 ANALYSIS OF UNIQUE INDIVIDUAL FEATURES Feature extraction is the key to the front-end process in speaker identification systems. The performance of a speaker identification system is highly dependent on the quality of the selected speech feature. For a speech feature used in speaker identification to be effective, it should reflect the unique properties of the speaker’s vocal apparatus and contains little or no information about the linguistic content of the speech. As a unique feature vector can be used one-dimensional frequency vector of cepstral coefficients and a vector consisting of its derivatives [8]. Cepstral coefficients are determined in accordance with the scheme shown in Fig. 5.1: Figure 5.1 – The general scheme of cepstral signal analysis (FFT – block of fast Fourier transform signal, LOG – block of logarithm of spectrum, IFFT – block of inverse fast Fourier transform) Also as a vector of features you can use the reflection coefficients. The vocal tract can be modeled as an electrical transmission line, a waveguide, or an analogous series of cylindrical acoustic tubes [9]. At each junction, there can be an impedance mismatch or an analogous difference in cross-sectional areas between tubes. At each boundary, a portion of the wave is transmitted and the remainder is reflected (assuming lossless tubes). The reflection coefficients ki are the percentage of the reflection at these discontinuities. If the acoustic tubes are of equal length, the time required for sound to propagate through each tube is equal (assuming planar wave propagation). Equal propagation times allow simple z transformation for digital filter simulation. For example, a series of five acoustic tubes of equal lengths with cross-sectional areas A1, …, Ap is shown in Fig. 5.2. This series of five tubes represents a fourth-order system that might fit a vocal tract minus the nasal cavity. The reflection coefficients are determined by the ratios of the adjacent cross-sectional areas with appropriate boundary conditions. For a pth-order system, the boundary conditions given in Eq. (5.2) correspond to a closed glottis (zero area) and a large area following the lips.
Figure 5.2 - Acoustic tube model of speech production Narrow bandwidth poles result in |ki|=1. An inaccurate representation of these RCs can cause gross spectral distortion. Taking the log of the area ratios results in more uniform spectral sensitivity. The LARs are defined as the log of the ratio of adjacent cross-sectional areas [10]:
Cross-sectional areas between tubes can use as vector of features. Shape of a vocal tract varies with age and independents of sore throat. 6 CHOICE OF THE STRUCTURE OF COMPUTER ACCESS CONTROL SYSTEM USING AUTHENTICATION BY VOICE Structure of computer access control system using authentication by voice is shown in Fig. 6.1. Figure 6.1 – Structure of computer access control system using authentication by voice (animation: volume – 50 480 byte; size – 771х453; consists of 4 frames; a delay between last and the first frames – 1 500 msec; a delay between frames – 800 msec; quantity of recycle – continuous) This system consists of two basic subsystems: input voice signal subsystem and authentication subsystem. The first is located on the client side and provides the input voice message of user through the microphone. Message is written to .wav file in audio PCM format, 22050 kHz, 16 bit, mono. Then a signal goes to the authentication subsystem which is located on the server. The authentication subsystem consists of database, block of parameterization, block of learning, block of clustering and block of decision-making. Block of parameterization is responsible for feature extraction. Block clustering uses Fuzzy c-means algorithm. Block of decision-making creates a solution: rejection or acceptance of a speaker identity. Then a result goes (depending on the specific tasks) to execution unit, or to authorization subsystem. LIST OF LITERATURE
At writing of this abstract of thesis master's degree work is not yet completed. Final completion: December, 2010. Complete text of work and materials on the topic can be got for an author or his leader after the indicated date. © DonNTU 2010, Olga Kulibaba |
|||||||
DonNTU >>
DonNTU Master's portal Autobiography| Abstract |