Basic Phonemes and Lip-Synching for Animation

Man shouting
Steve Ganem/Photodisc/Getty Images

Animating speech can be one of the most difficult tasks in animation. The process of matching the mouth movements of your animation to the phonemes of your audio track is most commonly known as lip-synching. For a quick fix, it's no problem to just animate the mouth opening and closing, and it's a simple shortcut, especially when animating for the web. But if you want to add actual expression and realistic mouth movements, it helps to study how the shape of the mouth changes with each sound. There are dozens upon dozens of variations, but our sketches are renderings from the basic ten shapes of the Preston Blair phoneme series.

Basic Phonemes and Lip-Synching for Animation

These ten basic phoneme shapes can match almost any sound of speech, in varying degrees of expression — and with the in-between frames moving from one to the other, are remarkably accurate. You may want to keep this for reference.

  • A and I: For the A and I vowel sounds, the lips are generally pulled a bit wider, teeth open, tongue visible and flat against the floor of the mouth.
  • E: The E phoneme is similar to the A and I but the lips are stretched a bit wider, the corners uplifted more, and the mouth and teeth closed a bit more.
  • U: For the U sound, the lips are pursed outwards, drawn into a pucker but still somewhat open; the teeth open, and the tongue somewhat lifted.
  • O: Again the mouth is drawn to a pucker, but the lips don't purse outwards, and the mouth is rounder, the tongue flat against the floor of the mouth.
  • C, D, G, K, N, R, S, Th, Y, and Z: This configuration pretty much covers all the major hard consonants: lips mostly closed, stretched wide, teeth closed or nearly closed.
  • F and V: Mouth at about standard width, but teeth pressed down into the lower lip. At times there can be variations closer to the D/Th configuration.
  • L: The mouth is open and stretched apart much like the A/I configuration, but
  • M, B, and P: These sounds are made with the lips pressed together; it's the duration that matters. "M" is a long hold, "mmm"; "B" is a shorter hold then part, almost a "buh" sound; P is a quick hold, puff of air.
  • W and Q: These two sounds purse the mouth the most, almost closing it over the teeth, with just the bottoms of the upper teeth visible, sometimes not even that. Think of a "rosebud mouth".
  • Rest Position: Think of this as the "slack" position, when the mouth is at rest--only with the thread of drool distinctly absent.

When you're drawing or modeling your animation, by listening to each word and the syllable combinations inherent you can usually break them down into a variation of these ten phoneme sets. Note that my drawings aren't perfectly symmetrical; that wasn't just shoddy sketching. No two people express themselves in an identical fashion, and each has individual facial quirks that make their speech and expressions asymmetrical.

Was this page helpful?