Three factors are critical in order to synthesize intelligible noise-vocoded Japanese speech

Takuya Kishida, Yoshitaka Nakajima, Kazuo Ueda, Gerard B. Remijn

Research output: Contribution to journalArticlepeer-review

7 Citations (Scopus)

Abstract

Factor analysis (principal component analysis followed by varimax rotation) had shown that 3 common factors appear across 20 critical-band power fluctuations derived from spoken sentences of eight different languages [Ueda et al. (2010). Fechner Day 2010, Padua]. The present study investigated the contributions of such power-fluctuation factors to speech intelligibility. The method of factor analysis was modified to obtain factors suitable for resynthesizing speech sounds as 20-critical-band noise-vocoded speech. The resynthesized speech sounds were used for an intelligibility test. The modification of factor analysis ensured that the resynthesized speech sounds were not accompanied by a steady background noise caused by the data reduction procedure. Spoken sentences of British English, Japanese, and Mandarin Chinese were subjected to this modified analysis. Confirming the earlier analysis, indeed 3-4 factors were common to these languages. The number of power-fluctuation factors needed to make noise-vocoded speech intelligible was then examined. Critical-band power fluctuations of the Japanese spoken sentences were resynthesized from the obtained factors, resulting in noise-vocoded-speech stimuli, and the intelligibility of these speech stimuli was tested by 12 native Japanese speakers. Japanese mora (syllable-like phonological unit) identification performances were measured when the number of factors was 1-9. Statistically significant improvement in intelligibility was observed when the number of factors was increased stepwise up to 6. The 12 listeners identified 92.1% of the morae correctly on average in the 6-factor condition. The intelligibility improved sharply when the number of factors changed from 2 to 3. In this step, the cumulative contribution ratio of factors improved only by 10.6%, from 37.3 to 47.9%, but the average mora identification leaped from 6.9 to 69.2%. The results indicated that, if the number of factors is 3 or more, elementary linguistic information is preserved in such noise-vocoded speech.

Original languageEnglish
Article number517
JournalFrontiers in Psychology
Volume7
Issue numberAPR
DOIs
Publication statusPublished - 2016

All Science Journal Classification (ASJC) codes

  • Psychology(all)

Fingerprint Dive into the research topics of 'Three factors are critical in order to synthesize intelligible noise-vocoded Japanese speech'. Together they form a unique fingerprint.

Cite this