Visual speech features representation for automatic lip-reading

Alaa Sagheer, Naoyuki Tsuruta, Rin Ichiro Taniguchi, Sakashi Maeda

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Citations (Scopus)

Abstract

A fundamental task in pattern recognition field is to find a suitable representation for a feature. In this paper, we present a new visual speech feature representation approach that combines Hypercolumn Model (HCM) with HMM to perform a complete lip-reading system. In this system, we use HCM to extract visual speech features from input image. The extracted features are modeled by Gaussian distributions through using HMM. The proposed lip-reading system can work under varying lip positions and sizes. All images were captured in a natural environment without using special lighting or lip markers. Experimental results are shown to compare favourably with the results of two reported systems: SOM and DCT base systems. HCM provides better performance than both systems.

Original languageEnglish
Title of host publication2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Proceedings - Image and Multidimensional Signal Processing Multimedia Signal Processing
PagesII781-II784
DOIs
Publication statusPublished - 2005
Event2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Philadelphia, PA, United States
Duration: Mar 18 2005Mar 23 2005

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
VolumeII
ISSN (Print)1520-6149

Other

Other2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05
CountryUnited States
CityPhiladelphia, PA
Period3/18/053/23/05

All Science Journal Classification (ASJC) codes

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint Dive into the research topics of 'Visual speech features representation for automatic lip-reading'. Together they form a unique fingerprint.

  • Cite this

    Sagheer, A., Tsuruta, N., Taniguchi, R. I., & Maeda, S. (2005). Visual speech features representation for automatic lip-reading. In 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Proceedings - Image and Multidimensional Signal Processing Multimedia Signal Processing (pp. II781-II784). [1415521] (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. II). https://doi.org/10.1109/ICASSP.2005.1415521