Hyper column model vs. Fast DCT for feature extraction in visual arabic speech recognition

Alaa Sagheer, Naoyuki Tsuruta, Rin-Ichiro Taniguchi, Sakashi Maeda

Research output: Contribution to conferencePaper

9 Citations (Scopus)

Abstract

Recently, the multimedia signal processing community has shown increasing interest for research development on visual speech recognition domain. In this paper we present a novel visual speech recognition approach based on our model Hyper Column Model (HCM). HCM is used for feature extraction task. The extracted features are modeled by Gaussian distributions through using Hidden Markov Model (HMM). The proposed system, HCM and HMM, can be used for any visual recognition task. We use it here to comprise a complete lip-reading system and evaluate its performance using Arabic database set. According to our knowledge, this is the first time that visual speech recognition is applied for Arabic language. Toward fair evaluation we compare our accuracy results with those using Fast Discrete Cosine Transform (FDCT) approach, in a separate experiment and using same data set and conditions of HCM experiment. Comparison turns out that HCM shows higher recognition accuracy than FDCT for Arabic sentences and words. HCM does not provide higher accuracy only but also it capable to achieve shift invariant recognition whereas FDCT can not.

Original languageEnglish
Pages761-766
Number of pages6
DOIs
Publication statusPublished - Dec 1 2005
Event5th IEEE International Symposium on Signal Processing and Information Technology - Athens, Greece
Duration: Dec 18 2005Dec 21 2005

Other

Other5th IEEE International Symposium on Signal Processing and Information Technology
CountryGreece
CityAthens
Period12/18/0512/21/05

Fingerprint

Speech recognition
Feature extraction
Discrete cosine transforms
Hidden Markov models
Multimedia signal processing
Gaussian distribution
Experiments

All Science Journal Classification (ASJC) codes

  • Engineering(all)

Cite this

Sagheer, A., Tsuruta, N., Taniguchi, R-I., & Maeda, S. (2005). Hyper column model vs. Fast DCT for feature extraction in visual arabic speech recognition. 761-766. Paper presented at 5th IEEE International Symposium on Signal Processing and Information Technology, Athens, Greece. https://doi.org/10.1109/ISSPIT.2005.1577194

Hyper column model vs. Fast DCT for feature extraction in visual arabic speech recognition. / Sagheer, Alaa; Tsuruta, Naoyuki; Taniguchi, Rin-Ichiro; Maeda, Sakashi.

2005. 761-766 Paper presented at 5th IEEE International Symposium on Signal Processing and Information Technology, Athens, Greece.

Research output: Contribution to conferencePaper

Sagheer, A, Tsuruta, N, Taniguchi, R-I & Maeda, S 2005, 'Hyper column model vs. Fast DCT for feature extraction in visual arabic speech recognition', Paper presented at 5th IEEE International Symposium on Signal Processing and Information Technology, Athens, Greece, 12/18/05 - 12/21/05 pp. 761-766. https://doi.org/10.1109/ISSPIT.2005.1577194
Sagheer A, Tsuruta N, Taniguchi R-I, Maeda S. Hyper column model vs. Fast DCT for feature extraction in visual arabic speech recognition. 2005. Paper presented at 5th IEEE International Symposium on Signal Processing and Information Technology, Athens, Greece. https://doi.org/10.1109/ISSPIT.2005.1577194
Sagheer, Alaa ; Tsuruta, Naoyuki ; Taniguchi, Rin-Ichiro ; Maeda, Sakashi. / Hyper column model vs. Fast DCT for feature extraction in visual arabic speech recognition. Paper presented at 5th IEEE International Symposium on Signal Processing and Information Technology, Athens, Greece.6 p.
@conference{0660796c0467479db90deebcbe37cf53,
title = "Hyper column model vs. Fast DCT for feature extraction in visual arabic speech recognition",
abstract = "Recently, the multimedia signal processing community has shown increasing interest for research development on visual speech recognition domain. In this paper we present a novel visual speech recognition approach based on our model Hyper Column Model (HCM). HCM is used for feature extraction task. The extracted features are modeled by Gaussian distributions through using Hidden Markov Model (HMM). The proposed system, HCM and HMM, can be used for any visual recognition task. We use it here to comprise a complete lip-reading system and evaluate its performance using Arabic database set. According to our knowledge, this is the first time that visual speech recognition is applied for Arabic language. Toward fair evaluation we compare our accuracy results with those using Fast Discrete Cosine Transform (FDCT) approach, in a separate experiment and using same data set and conditions of HCM experiment. Comparison turns out that HCM shows higher recognition accuracy than FDCT for Arabic sentences and words. HCM does not provide higher accuracy only but also it capable to achieve shift invariant recognition whereas FDCT can not.",
author = "Alaa Sagheer and Naoyuki Tsuruta and Rin-Ichiro Taniguchi and Sakashi Maeda",
year = "2005",
month = "12",
day = "1",
doi = "10.1109/ISSPIT.2005.1577194",
language = "English",
pages = "761--766",
note = "5th IEEE International Symposium on Signal Processing and Information Technology ; Conference date: 18-12-2005 Through 21-12-2005",

}

TY - CONF

T1 - Hyper column model vs. Fast DCT for feature extraction in visual arabic speech recognition

AU - Sagheer, Alaa

AU - Tsuruta, Naoyuki

AU - Taniguchi, Rin-Ichiro

AU - Maeda, Sakashi

PY - 2005/12/1

Y1 - 2005/12/1

N2 - Recently, the multimedia signal processing community has shown increasing interest for research development on visual speech recognition domain. In this paper we present a novel visual speech recognition approach based on our model Hyper Column Model (HCM). HCM is used for feature extraction task. The extracted features are modeled by Gaussian distributions through using Hidden Markov Model (HMM). The proposed system, HCM and HMM, can be used for any visual recognition task. We use it here to comprise a complete lip-reading system and evaluate its performance using Arabic database set. According to our knowledge, this is the first time that visual speech recognition is applied for Arabic language. Toward fair evaluation we compare our accuracy results with those using Fast Discrete Cosine Transform (FDCT) approach, in a separate experiment and using same data set and conditions of HCM experiment. Comparison turns out that HCM shows higher recognition accuracy than FDCT for Arabic sentences and words. HCM does not provide higher accuracy only but also it capable to achieve shift invariant recognition whereas FDCT can not.

AB - Recently, the multimedia signal processing community has shown increasing interest for research development on visual speech recognition domain. In this paper we present a novel visual speech recognition approach based on our model Hyper Column Model (HCM). HCM is used for feature extraction task. The extracted features are modeled by Gaussian distributions through using Hidden Markov Model (HMM). The proposed system, HCM and HMM, can be used for any visual recognition task. We use it here to comprise a complete lip-reading system and evaluate its performance using Arabic database set. According to our knowledge, this is the first time that visual speech recognition is applied for Arabic language. Toward fair evaluation we compare our accuracy results with those using Fast Discrete Cosine Transform (FDCT) approach, in a separate experiment and using same data set and conditions of HCM experiment. Comparison turns out that HCM shows higher recognition accuracy than FDCT for Arabic sentences and words. HCM does not provide higher accuracy only but also it capable to achieve shift invariant recognition whereas FDCT can not.

UR - http://www.scopus.com/inward/record.url?scp=33847006913&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33847006913&partnerID=8YFLogxK

U2 - 10.1109/ISSPIT.2005.1577194

DO - 10.1109/ISSPIT.2005.1577194

M3 - Paper

AN - SCOPUS:33847006913

SP - 761

EP - 766

ER -