Articulatory-to-speech conversion using bi-directional long short-term memory

Fumiaki Taguchi, Tokihiko Kaburagi

研究成果: ジャーナルへの寄稿Conference article

抄録

Methods for synthesizing speech sounds from the motion of articulatory organs can be used to produce substitute speech for people who have undergone laryngectomy. To achieve this goal, feature parameters representing the spectral envelope of speech, directly related to the acoustic characteristics of the vocal tract, has been estimated from articulatory movements. Within this framework, speech can be synthesized by driving the filter obtained from a spectral envelope with noise signals. In the current study, we examined an alternative method that generates speech sounds directly from the motion pattern of articulatory organs based on the implicit relationships between articulatory movements and the source signal of speech. These implicit relationships were estimated by considering that articulatory movements are involved in phonological representations of speech that are also related to sound source information such as the temporal pattern of pitch and voiced/unvoiced flag. We developed a method for simultaneously estimating the spectral envelope and sound source parameters from articulatory data obtained with an electromagnetic articulography (EMA) sensor. Furthermore, objective evaluation of estimated speech parameters and subjective evaluation of the word error rate were performed to examine the effectiveness of our method.

元の言語英語
ページ(範囲)2499-2503
ページ数5
ジャーナルProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
2018-September
DOI
出版物ステータス出版済み - 1 1 2018
イベント19th Annual Conference of the International Speech Communication, INTERSPEECH 2018 - Hyderabad, インド
継続期間: 9 2 20189 6 2018

Fingerprint

Memory Term
Acoustic waves
Envelope
Subjective Evaluation
Motion
Speech
Long short-term memory
Short-term Memory
Substitute
Error Rate
Acoustics
Filter
Sensor
Sound
Spectrality
Alternatives
Sensors
Evaluation

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

これを引用

Articulatory-to-speech conversion using bi-directional long short-term memory. / Taguchi, Fumiaki; Kaburagi, Tokihiko.

:: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 巻 2018-September, 01.01.2018, p. 2499-2503.

研究成果: ジャーナルへの寄稿Conference article

@article{cff30805c2aa48d2be2f1f8c5361d342,
title = "Articulatory-to-speech conversion using bi-directional long short-term memory",
abstract = "Methods for synthesizing speech sounds from the motion of articulatory organs can be used to produce substitute speech for people who have undergone laryngectomy. To achieve this goal, feature parameters representing the spectral envelope of speech, directly related to the acoustic characteristics of the vocal tract, has been estimated from articulatory movements. Within this framework, speech can be synthesized by driving the filter obtained from a spectral envelope with noise signals. In the current study, we examined an alternative method that generates speech sounds directly from the motion pattern of articulatory organs based on the implicit relationships between articulatory movements and the source signal of speech. These implicit relationships were estimated by considering that articulatory movements are involved in phonological representations of speech that are also related to sound source information such as the temporal pattern of pitch and voiced/unvoiced flag. We developed a method for simultaneously estimating the spectral envelope and sound source parameters from articulatory data obtained with an electromagnetic articulography (EMA) sensor. Furthermore, objective evaluation of estimated speech parameters and subjective evaluation of the word error rate were performed to examine the effectiveness of our method.",
author = "Fumiaki Taguchi and Tokihiko Kaburagi",
year = "2018",
month = "1",
day = "1",
doi = "10.21437/Interspeech.2018-999",
language = "English",
volume = "2018-September",
pages = "2499--2503",
journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
issn = "2308-457X",

}

TY - JOUR

T1 - Articulatory-to-speech conversion using bi-directional long short-term memory

AU - Taguchi, Fumiaki

AU - Kaburagi, Tokihiko

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Methods for synthesizing speech sounds from the motion of articulatory organs can be used to produce substitute speech for people who have undergone laryngectomy. To achieve this goal, feature parameters representing the spectral envelope of speech, directly related to the acoustic characteristics of the vocal tract, has been estimated from articulatory movements. Within this framework, speech can be synthesized by driving the filter obtained from a spectral envelope with noise signals. In the current study, we examined an alternative method that generates speech sounds directly from the motion pattern of articulatory organs based on the implicit relationships between articulatory movements and the source signal of speech. These implicit relationships were estimated by considering that articulatory movements are involved in phonological representations of speech that are also related to sound source information such as the temporal pattern of pitch and voiced/unvoiced flag. We developed a method for simultaneously estimating the spectral envelope and sound source parameters from articulatory data obtained with an electromagnetic articulography (EMA) sensor. Furthermore, objective evaluation of estimated speech parameters and subjective evaluation of the word error rate were performed to examine the effectiveness of our method.

AB - Methods for synthesizing speech sounds from the motion of articulatory organs can be used to produce substitute speech for people who have undergone laryngectomy. To achieve this goal, feature parameters representing the spectral envelope of speech, directly related to the acoustic characteristics of the vocal tract, has been estimated from articulatory movements. Within this framework, speech can be synthesized by driving the filter obtained from a spectral envelope with noise signals. In the current study, we examined an alternative method that generates speech sounds directly from the motion pattern of articulatory organs based on the implicit relationships between articulatory movements and the source signal of speech. These implicit relationships were estimated by considering that articulatory movements are involved in phonological representations of speech that are also related to sound source information such as the temporal pattern of pitch and voiced/unvoiced flag. We developed a method for simultaneously estimating the spectral envelope and sound source parameters from articulatory data obtained with an electromagnetic articulography (EMA) sensor. Furthermore, objective evaluation of estimated speech parameters and subjective evaluation of the word error rate were performed to examine the effectiveness of our method.

UR - http://www.scopus.com/inward/record.url?scp=85054956167&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85054956167&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2018-999

DO - 10.21437/Interspeech.2018-999

M3 - Conference article

VL - 2018-September

SP - 2499

EP - 2503

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SN - 2308-457X

ER -