Multimodal analysis of public speaking performance by EFL learners: Applying deep learning to understanding how successful speakers use facial movement

Miharu Fuyuno, Rinko Komiya, Takeshi Saitoh

    Research output: Contribution to journalArticlepeer-review

    3 Citations (Scopus)

    Abstract

    Although multimodal corpus analysis has been widely practiced in the field of applied linguistics, few studies have investigated performance of English public speaking by EFL learners. Needs for effective public speaking are fundamental in the globalizing society; however, performing public speaking in English is challenging for EFL learners, and objective analysis on factors of eye contact and speech pauses still remain few though such information is crucial in efficient teaching. This study analyses public speaking performance by EFL learners based on data from a multimodal corpus. Data were collected in an annual speech contest at a Japanese high school. Speakers presented English speeches to an audience and judges. The data consist of video and digital audio recordings of performance, as well as speech scripts and evaluation scores by contest judges. Characteristics of speakers' facial movement patterns in regard to spoken contents and the correlation between facial movements and eye movements were examined. Facial and eye movements were detected with motion tracking and the deep learning method. The results indicated that facial direction changes were not synchronized with speech pauses among highly evaluated speakers. Furthermore, the facial direction changes tended to be synchronized with content words in the spoken utterance rather than function words.

    Original languageEnglish
    Pages (from-to)117-129
    Number of pages13
    JournalAsian Journal of Applied Linguistics
    Volume5
    Issue number1
    Publication statusPublished - 2018

    All Science Journal Classification (ASJC) codes

    • Language and Linguistics
    • Linguistics and Language

    Fingerprint

    Dive into the research topics of 'Multimodal analysis of public speaking performance by EFL learners: Applying deep learning to understanding how successful speakers use facial movement'. Together they form a unique fingerprint.

    Cite this