Automatic acoustic synthesis of human-like laughter

J Acoust Soc Am. 2007 Jan;121(1):527-35. doi: 10.1121/1.2390679.

Abstract

A technique to synthesize laughter based on time-domain behavior of real instances of human laughter is presented. In the speech synthesis community, interest in improving the expressive quality of synthetic speech has grown considerably. While the focus has been on the linguistic aspects, such as precise control of speech intonation to achieve desired expressiveness, inclusion of nonlinguistic cues could further enhance the expressive quality of synthetic speech. Laughter is one such cue used for communicating, say, a happy or amusing context. It can be generated in many varieties and qualities: from a short exhalation to a long full-blown episode. Laughter is modeled at two levels, the overall episode level and at the local call level. The first attempts to capture the overall temporal behavior in a parametric model based on the equations that govern the simple harmonic motion of a mass-spring system is presented. By changing a set of easily available parameters, the authors are able to synthesize a variety of laughter. At the call level, the authors relied on a standard linear prediction based analysis-synthesis model. Results of subjective tests to assess the acceptability and naturalness of the synthetic laughter relative to real human laughter samples are presented.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Communication Devices for People with Disabilities*
  • Computers
  • Humans
  • Laughter*
  • Models, Biological*
  • Software
  • Speech Acoustics*
  • User-Computer Interface*