openl3Preprocess

Preprocess audio for OpenL3 feature extraction

Since R2021a

Syntax

features = openl3Preprocess(audioIn,fs)

features = openl3Preprocess(audioIn,fs,Name=Value)

[features,cf,ts] = openl3Preprocess(___)

Description

features = openl3Preprocess(audioIn,fs) generates spectrograms from audioIn that can be fed to the OpenL3 pretrained network.

example

features = openl3Preprocess(audioIn,fs,Name=Value) specifies options using one or more name-value arguments. For example, features = openl3Preprocess(audioIn,fs,OverlapPercentage=75) applies a 75% overlap between consecutive frames used to generate the spectrograms.

[features,cf,ts] = openl3Preprocess(___) also returns the center frequencies of the bands and the time locations of the windows in the generated spectrograms.

example

Examples

collapse all

Extract OpenL3 Embeddings from Audio Signal

Open Live Script

Use openl3Preprocess to extract embeddings from an audio signal.

Read in an audio signal.

[audioIn,fs] = audioread("Counting-16-44p1-mono-15secs.wav");

To extract spectrograms from the audio, call the openl3Preprocess function with the audio and sample rate. Use 50% overlap and set the spectrum type to linear. The openl3Preprocess function returns an array of 30 spectrograms produced using an FFT length of 512.

features = openl3Preprocess(audioIn,fs,OverlapPercentage=50,SpectrumType="linear");
[posFFTbinsOvLap50,numHopsOvLap50,~,numSpectOvLap50] = size(features)

posFFTbinsOvLap50 = 257

numHopsOvLap50 = 197

numSpectOvLap50 = 30

Call openl3Preprocess again, this time using the default overlap of 90%. The openl3Preprocess function now returns an array of 146 spectrograms.

features = openl3Preprocess(audioIn,fs,SpectrumType="linear");
[posFFTbinsOvLap90,numHopsOvLap90,~,numSpectOvLap90] = size(features)

posFFTbinsOvLap90 = 257

numHopsOvLap90 = 197

numSpectOvLap90 = 146

Visualize one of the spectrograms at random.

randSpect = randi(numSpectOvLap90);
viewRandSpect = features(:,:,:,randSpect);
N = size(viewRandSpect,2); 
binsToHz = (0:N-1)*fs/N;
nyquistBin = round(N/2);
semilogx(binsToHz(1:nyquistBin),mag2db(abs(viewRandSpect(1:nyquistBin))))
xlabel("Frequency (Hz)")
ylabel("Power (dB)");
title([num2str(randSpect),"th Spectrogram"])
axis tight
grid on

Figure contains an axes object. The axes object with title 19 th Spectrogram, xlabel Frequency (Hz), ylabel Power (dB) contains an object of type line.

Create an OpenL3 network using the same SpectrumType.

net = audioPretrainedNetwork("openl3",SpectrumType="linear");

Extract and visualize the audio embeddings.

embeddings = predict(net,features);
surf(embeddings,EdgeColor="none")
view([90,-90])
axis([1 numSpectOvLap90 1 numSpectOvLap90])
xlabel("Embedding Length")
ylabel("Spectrum Number")
title("OpenL3 Feature Embeddings")
axis tight

Figure contains an axes object. The axes object with title OpenL3 Feature Embeddings, xlabel Embedding Length, ylabel Spectrum Number contains an object of type surface.

Visualize Spectrogram for OpenL3 Input

Open Live Script

Read in an audio signal

[audioIn,fs] = audioread("SpeechDFT-16-8-mono-5secs.wav");

Use audioViewer to visualize and listen to the audio.

audioViewer(audioIn,fs)

Figure Audio Viewer contains an object of type uiaudioplayer.

Use openl3Preprocess to generate spectrograms that can be fed to the OpenL3 pretrained network. Specify additional outputs to get the center frequencies of the bands and the locations of the windows in time.

[spectrograms,cf,ts] = openl3Preprocess(audioIn,fs);

Choose a random spectrogram from the input to visualize. Use the center frequency and time location information to label the axes.

spectIdx = randi(size(spectrograms,4));
randSpect = spectrograms(:,:,1,spectIdx)';
surf(cf/1000,ts(:,spectIdx),randSpect,EdgeColor="none")
view([90 -90])
xlabel("Frequency (kHz)")
ylabel("Time (s)")
axis tight

Figure contains an axes object. The axes object with xlabel Frequency (kHz), ylabel Time (s) contains an object of type surface.

Input Arguments

collapse all

`audioIn` — Input signal
column vector | matrix

Input signal, specified as a column vector or matrix. If you specify a matrix, openl3Preprocess treats the columns of the matrix as individual audio channels.

Data Types: single | double

`fs` — Sample rate (Hz)
positive scalar

Sample rate of the input signal in Hz, specified as a positive scalar.

Data Types: single | double

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: openl3Preprocess(audioIn,fs,'SpectrumType','mel256')

`OverlapPercentage` — Percentage overlap between consecutive spectrograms
`90` (default) | scalar in the range [0,100)

Percentage overlap between consecutive spectrograms, specified as a scalar in the range [0,100).

Data Types: single | double

`SpectrumType` — Spectrum type
`'mel128'` (default) | `'mel256'` | `'linear'`

Spectrum type generated from audio and used as input to the neural network, specified as one of these:

'mel128' –– Generates mel spectrograms using 128 mel bands.
'mel256' –– Generates mel spectrograms using 256 mel bands.
'linear' –– Generates positive one-sided spectrograms using an FFT length of 512.

Data Types: char | string

Output Arguments

collapse all

`features` — Spectrograms that can be fed to OpenL3 pretrained network
N-by-M-by-1-by-K array

Spectrograms generated from audioIn, returned as an N-by-M-by-1-by-K array.

When you specify 'SpectrumType' as one of these:

'mel128' –– The dimensions are 128-by-199-by-1-by-K, where 128 is the number of mel bands and 199 is the number of time hops.
'mel256' –– The dimensions are 256-by-199-by-1-by-K, where 256 is the number of mel bands and 199 is the number of time hops.
'linear' –– The dimensions are 257-by-197-by-1-by-K, where 257 is the positive one-sided FFT length and 197 is the number of time hops.

K represents the number of spectrograms and depends on the length of audioIn, the number of channels in audioIn, as well as OverlapPercentage.

Data Types: single

`cf` — Center frequencies of spectrogram
row vector

Center frequencies of the spectrogram in Hz, returned as a row vector with length depending on the spectrum type:

mel128 –– 128
mel256 –– 256
linear –– 257

`ts` — Time location of each window
N-by-K matrix

Time location of the center of each analysis window of audio in seconds, returned as an N-by-K matrix where N corresponds to the number of time hops and K corresponds to the number of spectrograms in features. For multichannel inputs, the time stamps are stacked along the second dimension.

References

[1] Cramer, Jason, et al. "Look, Listen, and Learn More: Design Choices for Deep Audio Embeddings." In ICASSP 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2019, pp. 3852-56. DOI.org (Crossref), doi:/10.1109/ICASSP.2019.8682475.

Extended Capabilities

expand all

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

This function fully supports GPU arrays. For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).

Version History

Introduced in R2021a

expand all

R2024b: Additional outputs for center frequencies of bands and locations of windows in time

Call openl3Preprocess with additional output arguments to get the center frequencies of the bands and the time locations of the windows in the generated spectrograms.

openl3Preprocess

Syntax

Description

Examples

Extract OpenL3 Embeddings from Audio Signal

Visualize Spectrogram for OpenL3 Input

Input Arguments

audioIn — Input signal column vector | matrix

fs — Sample rate (Hz) positive scalar

Name-Value Arguments

OverlapPercentage — Percentage overlap between consecutive spectrograms 90 (default) | scalar in the range [0,100)

SpectrumType — Spectrum type 'mel128' (default) | 'mel256' | 'linear'

Output Arguments

features — Spectrograms that can be fed to OpenL3 pretrained network N-by-M-by-1-by-K array

cf — Center frequencies of spectrogram row vector

ts — Time location of each window N-by-K matrix

References

Extended Capabilities

C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™.

GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

R2024b: Additional outputs for center frequencies of bands and locations of windows in time

See Also

`audioIn` — Input signal
column vector | matrix

`fs` — Sample rate (Hz)
positive scalar

`OverlapPercentage` — Percentage overlap between consecutive spectrograms
`90` (default) | scalar in the range [0,100)

`SpectrumType` — Spectrum type
`'mel128'` (default) | `'mel256'` | `'linear'`

`features` — Spectrograms that can be fed to OpenL3 pretrained network
N-by-M-by-1-by-K array

`cf` — Center frequencies of spectrogram
row vector

`ts` — Time location of each window
N-by-K matrix

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.