我要吃瓜

Conference Paper (published)

A Data Driven Approach to Audiovisual Speech Mapping

Details

Citation

Abel A, Marxer R, Hussain A, Barker J, Watt R, Whitmer B & Derleth P (2016) A Data Driven Approach to Audiovisual Speech Mapping. In: Liu C, Hussain A, Luo B, Tan K, Zeng Y & Zhang Z (eds.) Advances in Brain Inspired Cognitive Systems. Lecture Notes in Computer Science, 10023. BICS 2016: International Conference on Brain Inspired Cognitive Systems, Beijing, China, 28.11.2016-30.11.2016. Cham, Switzerland: Springer, pp. 331-342. https://doi.org/10.1007/978-3-319-49685-6_30

Abstract
The concept of using visual information as part of audio speech processing has been of significant recent interest. This paper presents a data driven approach that considers estimating audio speech acoustics using only temporal visual information without considering linguistic features such as phonemes and visemes. Audio (log filterbank) and visual (2D-DCT) features are extracted, and various configurations of MLP and datasets are used to identify optimal results, showing that given a sequence of prior visual frames an equivalent reasonably accurate audio frame estimation can be mapped.

Keywords
Audiovisual; Speech processing; Speech mapping; ANNs

StatusPublished
Funders
Title of seriesLecture Notes in Computer Science
Number in series10023
Publication date31/12/2016
Publication date online30/11/2016
URL
PublisherSpringer
Place of publicationCham, Switzerland
ISSN of series0302-9743
ISBN978-3-319-49685-6
ConferenceBICS 2016: International Conference on Brain Inspired Cognitive Systems
Conference locationBeijing, China
Dates

People (1)

Professor Roger Watt

Professor Roger Watt

Emeritus Professor, Psychology

Files (1)