Speaker normalization based on time-frequency warp with inter-frame consistency

Kei Yamada, Seiichi Uchida, Hiroaki Sakoe

Research output: Contribution to journalArticle

Abstract

A new algorithm for speaker-independent spoken word recognition is presented. The algorithm is based on the time-frequency warping technique where frequency axis warping is performed in order to adjust individual spectral difference, in addition to time axis warping. In the conventional algorithm, frequency axis warping is independently determined at each frame (i.e., time). In this case, such warp have a tendency to yield excessive deformations of time-frequency plane, it is feared. In order to suppress such excessive deformations, inter-frame consistency of frequency axis warping is newly taken into account as constraints on the warping. The optimal warping is obtained by using dynamic programming with the constraints. As an implementation technique, beam search based acceleration is also investigated. Experimental results indicates advantageous characteristics of the present algorithm over the conventional algorithm.

Original languageEnglish
Pages (from-to)197-202
Number of pages6
JournalResearch Reports on Information Science and Electrical Engineering of Kyushu University
Volume3
Issue number2
Publication statusPublished - 1998
Externally publishedYes

Fingerprint

Dynamic programming

All Science Journal Classification (ASJC) codes

  • Electrical and Electronic Engineering
  • Hardware and Architecture
  • Engineering (miscellaneous)

Cite this

@article{d9e83a76f2f5401fa802356e55bd26d5,
title = "Speaker normalization based on time-frequency warp with inter-frame consistency",
abstract = "A new algorithm for speaker-independent spoken word recognition is presented. The algorithm is based on the time-frequency warping technique where frequency axis warping is performed in order to adjust individual spectral difference, in addition to time axis warping. In the conventional algorithm, frequency axis warping is independently determined at each frame (i.e., time). In this case, such warp have a tendency to yield excessive deformations of time-frequency plane, it is feared. In order to suppress such excessive deformations, inter-frame consistency of frequency axis warping is newly taken into account as constraints on the warping. The optimal warping is obtained by using dynamic programming with the constraints. As an implementation technique, beam search based acceleration is also investigated. Experimental results indicates advantageous characteristics of the present algorithm over the conventional algorithm.",
author = "Kei Yamada and Seiichi Uchida and Hiroaki Sakoe",
year = "1998",
language = "English",
volume = "3",
pages = "197--202",
journal = "Research Reports on Information Science and Electrical Engineering of Kyushu University",
issn = "1342-3819",
publisher = "Kyushu University, Faculty of Science",
number = "2",

}

TY - JOUR

T1 - Speaker normalization based on time-frequency warp with inter-frame consistency

AU - Yamada, Kei

AU - Uchida, Seiichi

AU - Sakoe, Hiroaki

PY - 1998

Y1 - 1998

N2 - A new algorithm for speaker-independent spoken word recognition is presented. The algorithm is based on the time-frequency warping technique where frequency axis warping is performed in order to adjust individual spectral difference, in addition to time axis warping. In the conventional algorithm, frequency axis warping is independently determined at each frame (i.e., time). In this case, such warp have a tendency to yield excessive deformations of time-frequency plane, it is feared. In order to suppress such excessive deformations, inter-frame consistency of frequency axis warping is newly taken into account as constraints on the warping. The optimal warping is obtained by using dynamic programming with the constraints. As an implementation technique, beam search based acceleration is also investigated. Experimental results indicates advantageous characteristics of the present algorithm over the conventional algorithm.

AB - A new algorithm for speaker-independent spoken word recognition is presented. The algorithm is based on the time-frequency warping technique where frequency axis warping is performed in order to adjust individual spectral difference, in addition to time axis warping. In the conventional algorithm, frequency axis warping is independently determined at each frame (i.e., time). In this case, such warp have a tendency to yield excessive deformations of time-frequency plane, it is feared. In order to suppress such excessive deformations, inter-frame consistency of frequency axis warping is newly taken into account as constraints on the warping. The optimal warping is obtained by using dynamic programming with the constraints. As an implementation technique, beam search based acceleration is also investigated. Experimental results indicates advantageous characteristics of the present algorithm over the conventional algorithm.

UR - http://www.scopus.com/inward/record.url?scp=0032155250&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0032155250&partnerID=8YFLogxK

M3 - Article

VL - 3

SP - 197

EP - 202

JO - Research Reports on Information Science and Electrical Engineering of Kyushu University

JF - Research Reports on Information Science and Electrical Engineering of Kyushu University

SN - 1342-3819

IS - 2

ER -