text2speech

Synthesize speech from text

Since R2022b

Syntax

[speech,fs] = text2speech(text)

[speech,fs] = text2speech(text,Client=clientObj)

[speech,fs,rawOutput] = text2speech(___)

Description

[speech,fs] = text2speech(text) synthesizes a speech signal from the provided text using a HiFi-GAN/Tacotron2 pretrained model.

Note

Using the HiFi-GAN/Tacotron2 pretrained model requires Deep Learning Toolbox™ and Audio Toolbox™ Interface for SpeechBrain and Torchaudio Libraries. You can download this support package from the Add-On Explorer. For more information, see Get and Manage Add-Ons.

example

[speech,fs] = text2speech(text,Client=clientObj) synthesizes a speech signal using the specified pretrained deep learning model or third-party speech service..

Note

To use third-party speech services, you must download the extended Audio Toolbox functionality from File Exchange. The File Exchange submission includes a tutorial to get started with the third-party services.

example

[speech,fs,rawOutput] = text2speech(___) also returns the unprocessed server output from the third-party speech service.

Examples

collapse all

Synthesize Speech from Text

Open Live Script

Call text2speech with a string to synthesize a speech signal using the HiFi-GAN/Tacotron2 pretrained model. This model requires Audio Toolbox Interface for SpeechBrain and Torchaudio Libraries. If this support package is not installed, the function provides a link to the Add-On Explorer, where you can download and install the support package.

[speech,fs] = text2speech("hello world");

Listen to the synthesized speech.

sound(speech,fs)

Use GPU for Text-to-Speech Synthesis

This example uses:

Open Live Script

Create a speechClient object that uses the HiFi-GAN/Tacotron2 pretrained model. Set ExecutionEnvironment to "gpu" to use the GPU when running the model.

hifiganSpeechClient = speechClient("hifigan",ExecutionEnvironment="gpu");

Call text2speech on a string of text with the HiFi-GAN/Tacotron2 speechClient object to synthesize the speech signal.

[x,fs] = text2speech("hello world",Client=hifiganSpeechClient);

Listen to the synthesized speech.

sound(x,fs)

Input Arguments

collapse all

`text` — Text
string | character array

Text to synthesize into speech, specified as a string or character array.

Example: "Hello world"

Data Types: char | string

`clientObj` — Client object
`speechClient("hifigan")` (default) | `speechClient` object

Client object, specified as an object returned by speechClient. The object is an interface to a pretrained model or to a third-party speech service. By default, text2speech uses a HiFi-GAN/Tacotron2 client object.

You cannot use text2speech with a speechClient object that interfaces with the wav2vec 2.0 or Emformer pretrained models.

Using the HiFi-GAN/Tacotron2 model requires Deep Learning Toolbox and Audio Toolbox Interface for SpeechBrain and Torchaudio Libraries. If this support package is not installed, calling speechClient with "hifigan" provides a link to the Add-On Explorer, where you can download and install the support package.

To use the third-party speech services, you must download the extended Audio Toolbox functionality from File Exchange. The File Exchange submission includes a tutorial to get started with the third-party services.

Example: speechClient("IBM")

Output Arguments

collapse all

`speech` — Synthesized speech
column vector

Synthesized speech signal, returned as a column vector (single channel).

Data Types: double

`fs` — Sample rate (Hz)
positive double

Sample rate of speech signal in Hz, returned as a positive double. The sample rate depends on the third-party service and the server options set through the clientObj. See the documentation for the specific speech service for more information.

Data Types: double

`rawOutput` — Unprocessed server output
`ResponseMessage` | structure

Unprocessed server output, returned as a matlab.net.http.ResponseMessage object containing the HTTP response from the third-party speech service. If the third-party speech service is Amazon^®, text2speech returns the server output as a structure.

Limitations

The HiFi-GAN/Tacotron2 model cannot synthesize speech signals longer than approximately 10 seconds.

Version History

Introduced in R2022b

text2speech

Syntax

Description

Examples

Synthesize Speech from Text

Use GPU for Text-to-Speech Synthesis

Input Arguments

text — Text string | character array

clientObj — Client object speechClient("hifigan") (default) | speechClient object

Output Arguments

speech — Synthesized speech column vector

fs — Sample rate (Hz) positive double

rawOutput — Unprocessed server output ResponseMessage | structure

Limitations

Version History

See Also

`text` — Text
string | character array

`clientObj` — Client object
`speechClient("hifigan")` (default) | `speechClient` object

`speech` — Synthesized speech
column vector

`fs` — Sample rate (Hz)
positive double

`rawOutput` — Unprocessed server output
`ResponseMessage` | structure