Robust Speaker Verification With Joint Sparse Coding Over Learned Dictionaries

Robust Speaker Verification With Joint Sparse Coding Over Learned Dictionaries This paper presents a novel paradigm for speaker verification (SV) exploiting sparse representation (SR) over a learned dictionary. The proposed approach is intended to overcome the shortcomings of existing SR over an exemplar dictionary-based SV systems. In this paper, the supervectors created by concatenating the mean vectors of adapted Gaussian mixture models are used as speaker representations. Both simple and discriminative methods are explored for learning the dictionary in the supervector domain. The learned dictionary-based approach is further extended to enable the compensation of the session/channel variability by using a joint sparse coding over speaker and channel dictionaries. The proposed systems are evaluated on the NIST 2012 SRE data set and are contrasted with the state-of-the-art i-vector probabilistic linear discriminant analysis-based SV system. The proposed system is found to possess the following attributes: 1) a significantly higher performance for very low-false alarm rates, which makes the system attractive for high-security applications; 2) a higher robustness to the short duration test data condition; 3) a competitive robustness to additive noise in test data; and 4) a much lower computational complexity. Even on comparing with the fastest i-vector computation methods reported in the literature, the complexity of the proposed system is found to be comparable. With these features, the proposed approach seems to be a promising candidate for practical voice biometric applications.