Neurological disorders can lead to significant impairments in speech communication and, in severe cases, cause the complete loss of the ability to speak. Brain-Computer Interfaces have shown promise as an alternative communication modality by directly transforming neural activity of speech processes into a textual or audible representations. Previous studies investigating such speech neuroprostheses relied on electrocorticography (ECoG) or microelectrode arrays that acquire neural signals from superficial areas on the cortex. While both measurement methods have demonstrated successful speech decoding, they do not capture activity from deeper brain structures and this activity has therefore not been harnessed for speech-related BCIs. In this study, we bridge this gap by adapting a previously presented decoding pipeline for speech synthesis based on ECoG signals to implanted depth electrodes (sEEG). For this purpose, we propose a multi-input convolutional neural network that extracts speech-related activity separately for each electrode shaft and estimates spectral coefficients to reconstruct an audible waveform. We evaluate our approach on open-loop data from 5 patients who conducted a recitation task of Dutch utterances. We achieve correlations of up to 0.80 between original and reconstructed speech spectrograms, which are significantly above chance level for all patients (p < 0.001). Our results indicate that sEEG can yield similar speech decoding performance to prior ECoG studies and is a promising modality for speech BCIs.