Precise control of articulatory parameters is difficult and prevents a physical model from generating natural sounding speech signals. To determine vocal-tract shape from speech, this paper presents an inversion method for simultaneously esti- mating the cross-sectional area and length of the vocal tract. In addition, we performed speech resynthesis from a time-series of estimated vocal-tract shapes. The vocal-tract shape is deter- mined through an iterative procedure that gradually optimizes the parameter values to produce the target speech spectrum. The vocal-tract shape is updated using a sensitivity function that represents the change in formant frequency caused by a small perturbation of the vocal-tract shape. When combined with a perturbation relationship of speech spectrum parameters (i.e., cepstrum parameters) and formants, our method effectively op- Timizes the vocal-tract shape. We quantitatively examined the accuracy using area function data for 10 isolated vowels. The results showed that the average area error was 0.43 cm2 and the average length error was 0.23 cm. This indicates that the vocal- Tract shape was determined with satisfactory accuracy. We also performed an estimation experiment for continuous speech and synthesized speech from the estimated vocal-tract shape.
|ジャーナル||Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH|
|出版ステータス||出版済み - 1月 1 2014|
|イベント||15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014 - Singapore, シンガポール|
継続期間: 9月 14 2014 → 9月 18 2014
!!!All Science Journal Classification (ASJC) codes