Main Content

Speech Transcription and Synthesis

Use pretrained models or third-party APIs for text-to-speech and speech-to-text

Audio Toolbox™ provides examples for small-vocabulary recognition and sound synthesis. Use pretrained models to perform general speech-to-text transcription and text-to-speech synthesis with speech2text and text2speech. You can download Audio Toolbox extended functionality from File Exchange for text-to-speech and speech-to-text through interfaces to popular third-party APIs. Supported APIs include Google®, IBM® Watson, Microsoft® Azure, and Amazon®.

You can interact with speech-to-text functionality graphically in the Signal Labeler app to quickly label regions of speech.


Signal LabelerLabel signal attributes, regions, and points of interest, and extract features


speech2textTranscribe speech signal to text (Since R2022b)
text2speechSynthesize speech from text (Since R2022b)
speechClientInterface with pretrained model or third-party speech service (Since R2022b)