Spontaneous speech recognition taking account of characteristics of speaker-dependent occurrence of filled-pauses

Yumi Shima, Mariko Koga, Masaru Yamashita, Katsuya Yamauchi, Shoichi Matsunaga

研究成果: 書籍/レポート タイプへの寄稿会議への寄与

抄録

One of the characteristics of spontaneous speech is the occurrence of many types of filled-pauses that usually hamper the speech recognition accuracy considerably. In this study, we first investigated the occurrence frequency of filled-pauses in spontaneous speech by using a large corpus. The investigation results revealed that the cumulative occurrence frequency of filled-pauses reaches 0.8 with only four specific filled-pauses on an average; these frequent filled-pauses were differed among speakers. On the basis of these results, we propose a speech recognition procedure that employs a combination of two recognition processes; the first process involves the use of a common lexicon and the second involves the use of an individual lexicon. The filled-pause entries in the individual lexicon were estimated on the basis of their occurrence frequencies; these occurrence frequencies were observed from the preparatory results of the first recognition process. The proposed procedure demonstrated a statistically significant improvement in the word accuracy (1.1% word-error reduction) and indicated that the filled-pauses that are rarely used by speakers hinder improvements in word accuracy. We also showed that the use of an individual lexicon that was configured from a combination of the N-best results and word confidence score limitations induced a significant improvement in the word accuracy (1.3% word-error reduction). Furthermore, we examined the applicability of certain methods for reducing the processing time by implementing multiple candidates and confidence score limitations. Our procedure facilitated a significant improvement in the total processing amount (41% reduction in the number of recognition segments of the first recognition process) by using the N-best results and the word confidence score limitations.

本文言語英語
ホスト出版物のタイトル20th International Congress on Acoustics 2010, ICA 2010 - Incorporating Proceedings of the 2010 Annual Conference of the Australian Acoustical Society
ページ3872-3876
ページ数5
出版ステータス出版済み - 12月 1 2010
外部発表はい
イベント20th International Congress on Acoustics 2010, ICA 2010 - Incorporating the 2010 Annual Conference of the Australian Acoustical Society - Sydney, NSW, オーストラリア
継続期間: 8月 23 20108月 27 2010

出版物シリーズ

名前20th International Congress on Acoustics 2010, ICA 2010 - Incorporating Proceedings of the 2010 Annual Conference of the Australian Acoustical Society
5

その他

その他20th International Congress on Acoustics 2010, ICA 2010 - Incorporating the 2010 Annual Conference of the Australian Acoustical Society
国/地域オーストラリア
CitySydney, NSW
Period8/23/108/27/10

!!!All Science Journal Classification (ASJC) codes

  • 音響学および超音波学

フィンガープリント

「Spontaneous speech recognition taking account of characteristics of speaker-dependent occurrence of filled-pauses」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル