Innate vocal sounds such as laughing, screaming or crying convey one's feelings to others. In many species, including humans, scaling the amplitude and duration of vocalizations is essential for effective social communication1-3. In mice, female scent triggers male mice to emit innate courtship ultrasonic vocalizations (USVs)4,5. However, whether mice flexibly scale their vocalizations and how neural circuits are structured to generate flexibility remain largely unknown. Here we identify mouse neurons from the lateral preoptic area (LPOA) that express oestrogen receptor 1 (LPOAESR1 neurons) and, when activated, elicit the complete repertoire of USV syllables emitted during natural courtship. Neural anatomy and functional data reveal a two-step, di-synaptic circuit motif in which primary long-range inhibitory LPOAESR1 neurons relieve a clamp of local periaqueductal grey (PAG) inhibition, enabling excitatory PAG USV-gating neurons to trigger vocalizations. We find that social context shapes a wide range of USV amplitudes and bout durations. This variability is absent when PAG neurons are stimulated directly; PAG-evoked vocalizations are time-locked to neural activity and stereotypically loud. By contrast, increasing the activity of LPOAESR1 neurons scales the amplitude of vocalizations, and delaying the recovery of the inhibition clamp prolongs USV bouts. Thus, the LPOA disinhibition motif contributes to flexible loudness and the duration and persistence of bouts, which are key aspects of effective vocal social communication.