Main Content

VGGish

VGGish embeddings extraction network

Since R2022a

  • VGGish block

Libraries:
Audio Toolbox / Deep Learning

Description

The VGGish block leverages a pretrained convolutional neural network that is trained on the AudioSet data set to extract feature embeddings from audio signals.

Ports

Input

expand all

Mel spectrograms, specified as a 96-by-64 matrix or a 96-by-64-by-1-by-N array, where:

  • 96 –– Represents the number of 25 ms frames in each mel spectrogram

  • 64 –– Represents the number of mel bands spanning 125 Hz to 7.5 kHz

  • N –– Represents the number of mel spectrograms.

You can use the VGGish Preprocess block to generate mel spectrograms. All spectrograms are of the dimension 96-by-64.

Data Types: single | double

Output

expand all

VGGish feature embeddings, returned as an N-by-128 matrix, where N is the number of mel spectrograms in the input. The feature embeddings are a compact representation of audio data.

Data Types: single

Parameters

expand all

Size of mini-batches to use for prediction specified as a positive integer. Larger mini-batch sizes require more memory but can lead to faster predictions.

Block Characteristics

Data Types

double | single

Direct Feedthrough

no

Multidimensional Signals

no

Variable-Size Signals

no

Zero-Crossing Detection

no

References

[1] Gemmeke, Jort F., Daniel P. W. Ellis, Dylan Freedman, Aren Jansen, Wade Lawrence, R. Channing Moore, Manoj Plakal, and Marvin Ritter. “Audio Set: An Ontology and Human-Labeled Dataset for Audio Events.” In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 776–80. New Orleans, LA: IEEE, 2017. https://doi.org/10.1109/ICASSP.2017.7952261.

[2] Hershey, Shawn, Sourish Chaudhuri, Daniel P. W. Ellis, Jort F. Gemmeke, Aren Jansen, R. Channing Moore, Manoj Plakal, et al. “CNN Architectures for Large-Scale Audio Classification.” In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 131–35. New Orleans, LA: IEEE, 2017. https://doi.org/10.1109/ICASSP.2017.7952132.

Extended Capabilities

Version History

Introduced in R2022a