Expression-Preserving Face Frontalization Improves Visually Assisted Speech Processing-外文期刊论文-农业学术服务平台

您的位置：首页 > 外文期刊论文 > 详情页

关键词：: Student's t-distribution; 3D; MIXTURE; Variational auto-encoders; ALIGNMENT; DATABASE; Face frontalization; Robust point registration; MODEL; Bayesian filtering; Lip reading; Audio-visual speech enhancement;

摘要：: Face frontalization consists of synthesizing a frontal view from a profile one. This paper proposes a frontalization method that preserves non-rigid facial deformations, i.e. facial expressions. It is shown that expression-preserving frontalization boosts the performance of visually assisted speech processing. The method alternates between the estimation of (i) the rigid transformation (scale, rotation, and translation) and (ii) the non-rigid deformation between an arbitrarily-viewed face and a face model. The method has two important merits: it can deal with non-Gaussian errors in the data and it incorporates a dynamical face deformation model. For that purpose, we use the Student's t-distribution in combination with a Bayesian filter in order to account for both rigid head motions and time-varying facial deformations, e.g. caused by speech production. The zero-mean normalized cross-correlation score is used to evaluate the ability of the method to preserve facial expressions. The method is thoroughly evaluated and compared with several state of the art methods, either based on traditional geometric models or on deep learning. Moreover, we show that the method, when incorporated into speech processing pipelines, improves word recognition rates and speech intelligibility scores by a considerable margin.

忘记密码