TY - JOUR
T1 - Articulatory-to-speech conversion using bi-directional long short-term memory
AU - Taguchi, Fumiaki
AU - Kaburagi, Tokihiko
N1 - Funding Information:
This work was supported by JSPS KAKENHI Grant Number JP16K00242. We thank Benjamin Knight, MSc., from Edanz Group (www.edanzediting.com/ac)for editing a draft of this manuscript.
Publisher Copyright:
© 2018 International Speech Communication Association. All rights reserved.
PY - 2018
Y1 - 2018
N2 - Methods for synthesizing speech sounds from the motion of articulatory organs can be used to produce substitute speech for people who have undergone laryngectomy. To achieve this goal, feature parameters representing the spectral envelope of speech, directly related to the acoustic characteristics of the vocal tract, has been estimated from articulatory movements. Within this framework, speech can be synthesized by driving the filter obtained from a spectral envelope with noise signals. In the current study, we examined an alternative method that generates speech sounds directly from the motion pattern of articulatory organs based on the implicit relationships between articulatory movements and the source signal of speech. These implicit relationships were estimated by considering that articulatory movements are involved in phonological representations of speech that are also related to sound source information such as the temporal pattern of pitch and voiced/unvoiced flag. We developed a method for simultaneously estimating the spectral envelope and sound source parameters from articulatory data obtained with an electromagnetic articulography (EMA) sensor. Furthermore, objective evaluation of estimated speech parameters and subjective evaluation of the word error rate were performed to examine the effectiveness of our method.
AB - Methods for synthesizing speech sounds from the motion of articulatory organs can be used to produce substitute speech for people who have undergone laryngectomy. To achieve this goal, feature parameters representing the spectral envelope of speech, directly related to the acoustic characteristics of the vocal tract, has been estimated from articulatory movements. Within this framework, speech can be synthesized by driving the filter obtained from a spectral envelope with noise signals. In the current study, we examined an alternative method that generates speech sounds directly from the motion pattern of articulatory organs based on the implicit relationships between articulatory movements and the source signal of speech. These implicit relationships were estimated by considering that articulatory movements are involved in phonological representations of speech that are also related to sound source information such as the temporal pattern of pitch and voiced/unvoiced flag. We developed a method for simultaneously estimating the spectral envelope and sound source parameters from articulatory data obtained with an electromagnetic articulography (EMA) sensor. Furthermore, objective evaluation of estimated speech parameters and subjective evaluation of the word error rate were performed to examine the effectiveness of our method.
UR - http://www.scopus.com/inward/record.url?scp=85054956167&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85054956167&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2018-999
DO - 10.21437/Interspeech.2018-999
M3 - Conference article
AN - SCOPUS:85054956167
SN - 2308-457X
VL - 2018-September
SP - 2499
EP - 2503
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
T2 - 19th Annual Conference of the International Speech Communication, INTERSPEECH 2018
Y2 - 2 September 2018 through 6 September 2018
ER -