Generation of the vocal tract spectrum from the underlying articulatory mechanism

研究成果: ジャーナルへの寄稿記事

抄録

A method for synthesizing vocal-tract spectra from phoneme sequences by mimicking the speech production process of humans is presented. The model consists of four main processes and is particularly characterized by an adaptive formation of articulatory movements. First, our model determines the time when each phoneme is articulated. Next, it generates articulatory constraints that must be met for the production of each phoneme, and then it generates trajectories of the articulatory movements that satisfy the constraints. Finally, the time sequence of spectra is estimated from the produced articulatory trajectories. The articulatory constraint of each phoneme does not change with the phonemic context, but the contextual variability of speech is reproduced because of the dynamic articulatory model. The accuracy of the synthesis model was evaluated using data collected by the simultaneous measurement of speech and articulatory movements. The accuracy of the phonemic timing estimates were measured and compared the synthesized results to the measured results. Experimental results showed that the model captured the contextual variability of both the articulatory movements and speech acoustics.

元の言語英語
ページ(範囲)456-468
ページ数13
ジャーナルJournal of the Acoustical Society of America
121
発行部数1
DOI
出版物ステータス出版済み - 1 22 2007

Fingerprint

phonemes
phonemics
trajectories
dynamic models
time measurement
acoustics
Vocal Tract
synthesis
estimates
Phoneme

All Science Journal Classification (ASJC) codes

  • Arts and Humanities (miscellaneous)
  • Acoustics and Ultrasonics

これを引用

Generation of the vocal tract spectrum from the underlying articulatory mechanism. / Kaburagi, Tokihiko; Kim, Jiji.

:: Journal of the Acoustical Society of America, 巻 121, 番号 1, 22.01.2007, p. 456-468.

研究成果: ジャーナルへの寄稿記事

@article{d5d915fae00648a0b4c7602f43e1fced,
title = "Generation of the vocal tract spectrum from the underlying articulatory mechanism",
abstract = "A method for synthesizing vocal-tract spectra from phoneme sequences by mimicking the speech production process of humans is presented. The model consists of four main processes and is particularly characterized by an adaptive formation of articulatory movements. First, our model determines the time when each phoneme is articulated. Next, it generates articulatory constraints that must be met for the production of each phoneme, and then it generates trajectories of the articulatory movements that satisfy the constraints. Finally, the time sequence of spectra is estimated from the produced articulatory trajectories. The articulatory constraint of each phoneme does not change with the phonemic context, but the contextual variability of speech is reproduced because of the dynamic articulatory model. The accuracy of the synthesis model was evaluated using data collected by the simultaneous measurement of speech and articulatory movements. The accuracy of the phonemic timing estimates were measured and compared the synthesized results to the measured results. Experimental results showed that the model captured the contextual variability of both the articulatory movements and speech acoustics.",
author = "Tokihiko Kaburagi and Jiji Kim",
year = "2007",
month = "1",
day = "22",
doi = "10.1121/1.2384847",
language = "English",
volume = "121",
pages = "456--468",
journal = "Journal of the Acoustical Society of America",
issn = "0001-4966",
publisher = "Acoustical Society of America",
number = "1",

}

TY - JOUR

T1 - Generation of the vocal tract spectrum from the underlying articulatory mechanism

AU - Kaburagi, Tokihiko

AU - Kim, Jiji

PY - 2007/1/22

Y1 - 2007/1/22

N2 - A method for synthesizing vocal-tract spectra from phoneme sequences by mimicking the speech production process of humans is presented. The model consists of four main processes and is particularly characterized by an adaptive formation of articulatory movements. First, our model determines the time when each phoneme is articulated. Next, it generates articulatory constraints that must be met for the production of each phoneme, and then it generates trajectories of the articulatory movements that satisfy the constraints. Finally, the time sequence of spectra is estimated from the produced articulatory trajectories. The articulatory constraint of each phoneme does not change with the phonemic context, but the contextual variability of speech is reproduced because of the dynamic articulatory model. The accuracy of the synthesis model was evaluated using data collected by the simultaneous measurement of speech and articulatory movements. The accuracy of the phonemic timing estimates were measured and compared the synthesized results to the measured results. Experimental results showed that the model captured the contextual variability of both the articulatory movements and speech acoustics.

AB - A method for synthesizing vocal-tract spectra from phoneme sequences by mimicking the speech production process of humans is presented. The model consists of four main processes and is particularly characterized by an adaptive formation of articulatory movements. First, our model determines the time when each phoneme is articulated. Next, it generates articulatory constraints that must be met for the production of each phoneme, and then it generates trajectories of the articulatory movements that satisfy the constraints. Finally, the time sequence of spectra is estimated from the produced articulatory trajectories. The articulatory constraint of each phoneme does not change with the phonemic context, but the contextual variability of speech is reproduced because of the dynamic articulatory model. The accuracy of the synthesis model was evaluated using data collected by the simultaneous measurement of speech and articulatory movements. The accuracy of the phonemic timing estimates were measured and compared the synthesized results to the measured results. Experimental results showed that the model captured the contextual variability of both the articulatory movements and speech acoustics.

UR - http://www.scopus.com/inward/record.url?scp=33846190239&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33846190239&partnerID=8YFLogxK

U2 - 10.1121/1.2384847

DO - 10.1121/1.2384847

M3 - Article

VL - 121

SP - 456

EP - 468

JO - Journal of the Acoustical Society of America

JF - Journal of the Acoustical Society of America

SN - 0001-4966

IS - 1

ER -