A method for estimating vocal-tract shape from a target speech spectrum

研究成果: ジャーナルへの寄稿記事

2 引用 (Scopus)

抄録

We present a method to simultaneously estimate the cross-sectional area and length of the vocal tract from a speech spectrum. An iterative procedure determines the vocal-tract shape by gradually optimizing the parameter values to produce the target speech spectrum. The vocal-tract shape is updated in each iteration using a sensitivity function representing the change in formant frequency caused by a slight perturbation of the vocal-tract shape. Our method effectively optimizes the vocal-tract shape when combined with the perturbation relationship between the speech spectrum parameters (i.e., cepstral parameters) and formants. The estimation accuracy is examined using area function data for 10 English vowels (Story and Titze, J. Phon., 26, 223-260, 1998). The resulting average errors are 0.36 cm2 for the cross-sectional area and 0.21 cm for the vocal-tract length. This corresponds to a 17.6% and 1.24% error, respectively. The formant frequency recovered from the estimated vocal-tract shape has an error of less than 4% for each of the first four formants. We also determine that the fundamental frequency of the target speech spectrum has an influence on the estimation accuracy.

元の言語英語
ページ(範囲)428-437
ページ数10
ジャーナルAcoustical Science and Technology
36
発行部数5
DOI
出版物ステータス出版済み - 1 1 2015

Fingerprint

estimating
perturbation
vowels
iteration
sensitivity
estimates

All Science Journal Classification (ASJC) codes

  • Acoustics and Ultrasonics

これを引用

A method for estimating vocal-tract shape from a target speech spectrum. / Kaburagi, Tokihiko.

:: Acoustical Science and Technology, 巻 36, 番号 5, 01.01.2015, p. 428-437.

研究成果: ジャーナルへの寄稿記事

@article{1c95a88dc1874ccca1f3f468a18441ec,
title = "A method for estimating vocal-tract shape from a target speech spectrum",
abstract = "We present a method to simultaneously estimate the cross-sectional area and length of the vocal tract from a speech spectrum. An iterative procedure determines the vocal-tract shape by gradually optimizing the parameter values to produce the target speech spectrum. The vocal-tract shape is updated in each iteration using a sensitivity function representing the change in formant frequency caused by a slight perturbation of the vocal-tract shape. Our method effectively optimizes the vocal-tract shape when combined with the perturbation relationship between the speech spectrum parameters (i.e., cepstral parameters) and formants. The estimation accuracy is examined using area function data for 10 English vowels (Story and Titze, J. Phon., 26, 223-260, 1998). The resulting average errors are 0.36 cm2 for the cross-sectional area and 0.21 cm for the vocal-tract length. This corresponds to a 17.6{\%} and 1.24{\%} error, respectively. The formant frequency recovered from the estimated vocal-tract shape has an error of less than 4{\%} for each of the first four formants. We also determine that the fundamental frequency of the target speech spectrum has an influence on the estimation accuracy.",
author = "Tokihiko Kaburagi",
year = "2015",
month = "1",
day = "1",
doi = "10.1250/ast.36.428",
language = "English",
volume = "36",
pages = "428--437",
journal = "Acoustical Science and Technology",
issn = "1346-3969",
publisher = "The Acoustical Society of Japan (ASJ)",
number = "5",

}

TY - JOUR

T1 - A method for estimating vocal-tract shape from a target speech spectrum

AU - Kaburagi, Tokihiko

PY - 2015/1/1

Y1 - 2015/1/1

N2 - We present a method to simultaneously estimate the cross-sectional area and length of the vocal tract from a speech spectrum. An iterative procedure determines the vocal-tract shape by gradually optimizing the parameter values to produce the target speech spectrum. The vocal-tract shape is updated in each iteration using a sensitivity function representing the change in formant frequency caused by a slight perturbation of the vocal-tract shape. Our method effectively optimizes the vocal-tract shape when combined with the perturbation relationship between the speech spectrum parameters (i.e., cepstral parameters) and formants. The estimation accuracy is examined using area function data for 10 English vowels (Story and Titze, J. Phon., 26, 223-260, 1998). The resulting average errors are 0.36 cm2 for the cross-sectional area and 0.21 cm for the vocal-tract length. This corresponds to a 17.6% and 1.24% error, respectively. The formant frequency recovered from the estimated vocal-tract shape has an error of less than 4% for each of the first four formants. We also determine that the fundamental frequency of the target speech spectrum has an influence on the estimation accuracy.

AB - We present a method to simultaneously estimate the cross-sectional area and length of the vocal tract from a speech spectrum. An iterative procedure determines the vocal-tract shape by gradually optimizing the parameter values to produce the target speech spectrum. The vocal-tract shape is updated in each iteration using a sensitivity function representing the change in formant frequency caused by a slight perturbation of the vocal-tract shape. Our method effectively optimizes the vocal-tract shape when combined with the perturbation relationship between the speech spectrum parameters (i.e., cepstral parameters) and formants. The estimation accuracy is examined using area function data for 10 English vowels (Story and Titze, J. Phon., 26, 223-260, 1998). The resulting average errors are 0.36 cm2 for the cross-sectional area and 0.21 cm for the vocal-tract length. This corresponds to a 17.6% and 1.24% error, respectively. The formant frequency recovered from the estimated vocal-tract shape has an error of less than 4% for each of the first four formants. We also determine that the fundamental frequency of the target speech spectrum has an influence on the estimation accuracy.

UR - http://www.scopus.com/inward/record.url?scp=84941073141&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84941073141&partnerID=8YFLogxK

U2 - 10.1250/ast.36.428

DO - 10.1250/ast.36.428

M3 - Article

VL - 36

SP - 428

EP - 437

JO - Acoustical Science and Technology

JF - Acoustical Science and Technology

SN - 1346-3969

IS - 5

ER -