I'm not really asking how it's done, I'm sure I could make a circuit
that does it, I'm just wondering how something like this gets into an
otherwise-professional-looking video - listen to the spoken voice
through this video and you'll hear. The volume is effectively
over-modulated by the voice envelope.
http://www.youtube.com/watch?v=lpydGpig0wI