Nautsch, Andreas (2019)
Speaker Recognition in Unconstrained Environments.
Technische Universität Darmstadt
Ph.D. Thesis, Primary publication
|
Text
Dissertation-anautsch-Fassung-20191028.pdf - Published Version Copyright Information: CC BY-SA 4.0 International - Creative Commons, Attribution ShareAlike. Download (8MB) | Preview |
Item Type: | Ph.D. Thesis | ||||
---|---|---|---|---|---|
Type of entry: | Primary publication | ||||
Title: | Speaker Recognition in Unconstrained Environments | ||||
Language: | English | ||||
Referees: | Mühlhäuser, Prof. Dr. Max ; Busch, Prof. Dr. Christoph ; Meuwly, Prof. Dr. Didier | ||||
Date: | 2019 | ||||
Place of Publication: | Darmstadt | ||||
Date of oral examination: | 10 October 2019 | ||||
Abstract: | Speaker recognition is applied in smart home devices, interactive voice response systems, call centers, online banking and payment solutions as well as in forensic scenarios. This dissertation is concerned with speaker recognition systems in unconstrained environments. Before this dissertation, research on making better decisions in unconstrained environments was insufficient. Aside from decision making, unconstrained environments imply two other subjects: security and privacy. Within the scope of this dissertation, these research subjects are regarded as both security against short-term replay attacks and privacy preservation within state-of-the-art biometric voice comparators in the light of a potential leak of biometric data. The aforementioned research subjects are united in this dissertation to sustain good decision making processes facing uncertainty from varying signal quality and to strengthen security as well as preserve privacy. Conventionally, biometric comparators are trained to classify between mated and non-mated reference,--,probe pairs under idealistic conditions but are expected to operate well in the real world. However, the more the voice signal quality degrades, the more erroneous decisions are made. The severity of their impact depends on the requirements of a biometric application. In this dissertation, quality estimates are proposed and employed for the purpose of making better decisions on average in a formalized way (quantitative method), while the specifications of decision requirements of a biometric application remain unknown. By using the Bayesian decision framework, the specification of application-depending decision requirements is formalized, outlining operating points: the decision thresholds. The assessed quality conditions combine ambient and biometric noise, both of which occurring in commercial as well as in forensic application scenarios. Dual-use (civil and governmental) technology is investigated. As it seems unfeasible to train systems for every possible signal degradation, a low amount of quality conditions is used. After examining the impact of degrading signal quality on biometric feature extraction, the extraction is assumed ideal in order to conduct a fair benchmark. This dissertation proposes and investigates methods for propagating information about quality to decision making. By employing quality estimates, a biometric system's output (comparison scores) is normalized in order to ensure that each score encodes the least-favorable decision trade-off in its value. Application development is segregated from requirement specification. Furthermore, class discrimination and score calibration performance is improved over all decision requirements for real world applications. In contrast to the ISOIEC 19795-1:2006 standard on biometric performance (error rates), this dissertation is based on biometric inference for probabilistic decision making (subject to prior probabilities and cost terms). This dissertation elaborates on the paradigm shift from requirements by error rates to requirements by beliefs in priors and costs. Binary decision error trade-off plots are proposed, interrelating error rates with prior and cost beliefs, i.e., formalized decision requirements. Verbal tags are introduced to summarize categories of least-favorable decisions: the plot's canvas follows from Bayesian decision theory. Empirical error rates are plotted, encoding categories of decision trade-offs by line styles. Performance is visualized in the latent decision subspace for evaluating empirical performance regarding changes in prior and cost based decision requirements. Security against short-term audio replay attacks (a collage of sound units such as phonemes and syllables) is strengthened. The unit-selection attack is posed by the ASVspoof 2015 challenge (English speech data), representing the most difficult to detect voice presentation attack of this challenge. In this dissertation, unit-selection attacks are created for German speech data, where support vector machine and Gaussian mixture model classifiers are trained to detect collage edges in speech representations based on wavelet and Fourier analyses. Competitive results are reached compared to the challenged submissions. Homomorphic encryption is proposed to preserve the privacy of biometric information in the case of database leakage. In this dissertation, log-likelihood ratio scores, representing biometric evidence objectively, are computed in the latent biometric subspace. Conventional comparators rely on the feature extraction to ideally represent biometric information, latent subspace comparators are trained to find ideal representations of the biometric information in voice reference and probe samples to be compared. Two protocols are proposed for the the two-covariance comparison model, a special case of probabilistic linear discriminant analysis. Log-likelihood ratio scores are computed in the encrypted domain based on encrypted representations of the biometric reference and probe. As a consequence, the biometric information conveyed in voice samples is, in contrast to many existing protection schemes, stored protected and without information loss. The first protocol preserves privacy of end-users, requiring one public/private key pair per biometric application. The latter protocol preserves privacy of end-users and comparator vendors with two key pairs. Comparators estimate the biometric evidence in the latent subspace, such that the subspace model requires data protection as well. In both protocols, log-likelihood ratio based decision making meets the requirements of the ISOIEC 24745:2011 biometric information protection standard in terms of unlinkability, irreversibility, and renewability properties of the protected voice data. |
||||
Alternative Abstract: |
|
||||
URN: | urn:nbn:de:tuda-tuprints-91993 | ||||
Classification DDC: | 000 Generalities, computers, information > 004 Computer science | ||||
Divisions: | 20 Department of Computer Science > Telecooperation LOEWE > LOEWE-Zentren > CRISP - Center for Research in Security and Privacy LOEWE > LOEWE-Zentren > CASED – Center for Advanced Security Research Darmstadt |
||||
Date Deposited: | 21 Nov 2019 12:50 | ||||
Last Modified: | 09 Mar 2022 13:43 | ||||
URI: | https://tuprints.ulb.tu-darmstadt.de/id/eprint/9199 | ||||
PPN: | 45595206X | ||||
Export: |
View Item |