Speaker normalization based on time-frequency warp with inter-frame consistency

Kei Yamada, Seiichi Uchida, Hiroaki Sakoe

研究成果: ジャーナルへの寄稿記事

抄録

A new algorithm for speaker-independent spoken word recognition is presented. The algorithm is based on the time-frequency warping technique where frequency axis warping is performed in order to adjust individual spectral difference, in addition to time axis warping. In the conventional algorithm, frequency axis warping is independently determined at each frame (i.e., time). In this case, such warp have a tendency to yield excessive deformations of time-frequency plane, it is feared. In order to suppress such excessive deformations, inter-frame consistency of frequency axis warping is newly taken into account as constraints on the warping. The optimal warping is obtained by using dynamic programming with the constraints. As an implementation technique, beam search based acceleration is also investigated. Experimental results indicates advantageous characteristics of the present algorithm over the conventional algorithm.

元の言語英語
ページ(範囲)197-202
ページ数6
ジャーナルResearch Reports on Information Science and Electrical Engineering of Kyushu University
3
発行部数2
出版物ステータス出版済み - 1998
外部発表Yes

Fingerprint

Dynamic programming

All Science Journal Classification (ASJC) codes

  • Electrical and Electronic Engineering
  • Hardware and Architecture
  • Engineering (miscellaneous)

これを引用

Speaker normalization based on time-frequency warp with inter-frame consistency. / Yamada, Kei; Uchida, Seiichi; Sakoe, Hiroaki.

:: Research Reports on Information Science and Electrical Engineering of Kyushu University, 巻 3, 番号 2, 1998, p. 197-202.

研究成果: ジャーナルへの寄稿記事

@article{d9e83a76f2f5401fa802356e55bd26d5,
title = "Speaker normalization based on time-frequency warp with inter-frame consistency",
abstract = "A new algorithm for speaker-independent spoken word recognition is presented. The algorithm is based on the time-frequency warping technique where frequency axis warping is performed in order to adjust individual spectral difference, in addition to time axis warping. In the conventional algorithm, frequency axis warping is independently determined at each frame (i.e., time). In this case, such warp have a tendency to yield excessive deformations of time-frequency plane, it is feared. In order to suppress such excessive deformations, inter-frame consistency of frequency axis warping is newly taken into account as constraints on the warping. The optimal warping is obtained by using dynamic programming with the constraints. As an implementation technique, beam search based acceleration is also investigated. Experimental results indicates advantageous characteristics of the present algorithm over the conventional algorithm.",
author = "Kei Yamada and Seiichi Uchida and Hiroaki Sakoe",
year = "1998",
language = "English",
volume = "3",
pages = "197--202",
journal = "Research Reports on Information Science and Electrical Engineering of Kyushu University",
issn = "1342-3819",
publisher = "Kyushu University, Faculty of Science",
number = "2",

}

TY - JOUR

T1 - Speaker normalization based on time-frequency warp with inter-frame consistency

AU - Yamada, Kei

AU - Uchida, Seiichi

AU - Sakoe, Hiroaki

PY - 1998

Y1 - 1998

N2 - A new algorithm for speaker-independent spoken word recognition is presented. The algorithm is based on the time-frequency warping technique where frequency axis warping is performed in order to adjust individual spectral difference, in addition to time axis warping. In the conventional algorithm, frequency axis warping is independently determined at each frame (i.e., time). In this case, such warp have a tendency to yield excessive deformations of time-frequency plane, it is feared. In order to suppress such excessive deformations, inter-frame consistency of frequency axis warping is newly taken into account as constraints on the warping. The optimal warping is obtained by using dynamic programming with the constraints. As an implementation technique, beam search based acceleration is also investigated. Experimental results indicates advantageous characteristics of the present algorithm over the conventional algorithm.

AB - A new algorithm for speaker-independent spoken word recognition is presented. The algorithm is based on the time-frequency warping technique where frequency axis warping is performed in order to adjust individual spectral difference, in addition to time axis warping. In the conventional algorithm, frequency axis warping is independently determined at each frame (i.e., time). In this case, such warp have a tendency to yield excessive deformations of time-frequency plane, it is feared. In order to suppress such excessive deformations, inter-frame consistency of frequency axis warping is newly taken into account as constraints on the warping. The optimal warping is obtained by using dynamic programming with the constraints. As an implementation technique, beam search based acceleration is also investigated. Experimental results indicates advantageous characteristics of the present algorithm over the conventional algorithm.

UR - http://www.scopus.com/inward/record.url?scp=0032155250&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0032155250&partnerID=8YFLogxK

M3 - Article

VL - 3

SP - 197

EP - 202

JO - Research Reports on Information Science and Electrical Engineering of Kyushu University

JF - Research Reports on Information Science and Electrical Engineering of Kyushu University

SN - 1342-3819

IS - 2

ER -