yamnetPreprocess
Syntax
Description
Examples
Download YAMNet
Download and unzip the Audio Toolbox™ model for YAMNet.
Type yamnet
at the Command Window. If the Audio Toolbox model for YAMNet is not installed, then the function provides a link to the location of the network weights. To download the model, click the link. Unzip the file to a location on the MATLAB path.
Alternatively, execute the following commands to download and unzip the YAMNet model to your temporary directory.
downloadFolder = fullfile(tempdir,'YAMNetDownload'); loc = websave(downloadFolder,'https://ssd.mathworks.com/supportfiles/audio/yamnet.zip'); YAMNetLocation = tempdir; unzip(loc,YAMNetLocation) addpath(fullfile(YAMNetLocation,'yamnet'))
Check that the installation is successful by typing yamnet
at the Command Window. If the network is installed, then the function returns a SeriesNetwork
(Deep Learning Toolbox) object.
yamnet
ans = SeriesNetwork with properties: Layers: [86×1 nnet.cnn.layer.Layer] InputNames: {'input_1'} OutputNames: {'Sound'}
Load Pretrained YAMNet
Load a pretrained YAMNet convolutional neural network and examine the layers and classes.
Use yamnet
to load the pretrained YAMNet network. The output net is a SeriesNetwork
(Deep Learning Toolbox) object.
net = yamnet
net = SeriesNetwork with properties: Layers: [86×1 nnet.cnn.layer.Layer] InputNames: {'input_1'} OutputNames: {'Sound'}
View the network architecture using the Layers
property. The network has 86 layers. There are 28 layers with learnable weights: 27 convolutional layers, and 1 fully connected layer.
net.Layers
ans = 86x1 Layer array with layers: 1 'input_1' Image Input 96×64×1 images 2 'conv2d' Convolution 32 3×3×1 convolutions with stride [2 2] and padding 'same' 3 'b' Batch Normalization Batch normalization with 32 channels 4 'activation' ReLU ReLU 5 'depthwise_conv2d' Grouped Convolution 32 groups of 1 3×3×1 convolutions with stride [1 1] and padding 'same' 6 'L11' Batch Normalization Batch normalization with 32 channels 7 'activation_1' ReLU ReLU 8 'conv2d_1' Convolution 64 1×1×32 convolutions with stride [1 1] and padding 'same' 9 'L12' Batch Normalization Batch normalization with 64 channels 10 'activation_2' ReLU ReLU 11 'depthwise_conv2d_1' Grouped Convolution 64 groups of 1 3×3×1 convolutions with stride [2 2] and padding 'same' 12 'L21' Batch Normalization Batch normalization with 64 channels 13 'activation_3' ReLU ReLU 14 'conv2d_2' Convolution 128 1×1×64 convolutions with stride [1 1] and padding 'same' 15 'L22' Batch Normalization Batch normalization with 128 channels 16 'activation_4' ReLU ReLU 17 'depthwise_conv2d_2' Grouped Convolution 128 groups of 1 3×3×1 convolutions with stride [1 1] and padding 'same' 18 'L31' Batch Normalization Batch normalization with 128 channels 19 'activation_5' ReLU ReLU 20 'conv2d_3' Convolution 128 1×1×128 convolutions with stride [1 1] and padding 'same' 21 'L32' Batch Normalization Batch normalization with 128 channels 22 'activation_6' ReLU ReLU 23 'depthwise_conv2d_3' Grouped Convolution 128 groups of 1 3×3×1 convolutions with stride [2 2] and padding 'same' 24 'L41' Batch Normalization Batch normalization with 128 channels 25 'activation_7' ReLU ReLU 26 'conv2d_4' Convolution 256 1×1×128 convolutions with stride [1 1] and padding 'same' 27 'L42' Batch Normalization Batch normalization with 256 channels 28 'activation_8' ReLU ReLU 29 'depthwise_conv2d_4' Grouped Convolution 256 groups of 1 3×3×1 convolutions with stride [1 1] and padding 'same' 30 'L51' Batch Normalization Batch normalization with 256 channels 31 'activation_9' ReLU ReLU 32 'conv2d_5' Convolution 256 1×1×256 convolutions with stride [1 1] and padding 'same' 33 'L52' Batch Normalization Batch normalization with 256 channels 34 'activation_10' ReLU ReLU 35 'depthwise_conv2d_5' Grouped Convolution 256 groups of 1 3×3×1 convolutions with stride [2 2] and padding 'same' 36 'L61' Batch Normalization Batch normalization with 256 channels 37 'activation_11' ReLU ReLU 38 'conv2d_6' Convolution 512 1×1×256 convolutions with stride [1 1] and padding 'same' 39 'L62' Batch Normalization Batch normalization with 512 channels 40 'activation_12' ReLU ReLU 41 'depthwise_conv2d_6' Grouped Convolution 512 groups of 1 3×3×1 convolutions with stride [1 1] and padding 'same' 42 'L71' Batch Normalization Batch normalization with 512 channels 43 'activation_13' ReLU ReLU 44 'conv2d_7' Convolution 512 1×1×512 convolutions with stride [1 1] and padding 'same' 45 'L72' Batch Normalization Batch normalization with 512 channels 46 'activation_14' ReLU ReLU 47 'depthwise_conv2d_7' Grouped Convolution 512 groups of 1 3×3×1 convolutions with stride [1 1] and padding 'same' 48 'L81' Batch Normalization Batch normalization with 512 channels 49 'activation_15' ReLU ReLU 50 'conv2d_8' Convolution 512 1×1×512 convolutions with stride [1 1] and padding 'same' 51 'L82' Batch Normalization Batch normalization with 512 channels 52 'activation_16' ReLU ReLU 53 'depthwise_conv2d_8' Grouped Convolution 512 groups of 1 3×3×1 convolutions with stride [1 1] and padding 'same' 54 'L91' Batch Normalization Batch normalization with 512 channels 55 'activation_17' ReLU ReLU 56 'conv2d_9' Convolution 512 1×1×512 convolutions with stride [1 1] and padding 'same' 57 'L92' Batch Normalization Batch normalization with 512 channels 58 'activation_18' ReLU ReLU 59 'depthwise_conv2d_9' Grouped Convolution 512 groups of 1 3×3×1 convolutions with stride [1 1] and padding 'same' 60 'L101' Batch Normalization Batch normalization with 512 channels 61 'activation_19' ReLU ReLU 62 'conv2d_10' Convolution 512 1×1×512 convolutions with stride [1 1] and padding 'same' 63 'L102' Batch Normalization Batch normalization with 512 channels 64 'activation_20' ReLU ReLU 65 'depthwise_conv2d_10' Grouped Convolution 512 groups of 1 3×3×1 convolutions with stride [1 1] and padding 'same' 66 'L111' Batch Normalization Batch normalization with 512 channels 67 'activation_21' ReLU ReLU 68 'conv2d_11' Convolution 512 1×1×512 convolutions with stride [1 1] and padding 'same' 69 'L112' Batch Normalization Batch normalization with 512 channels 70 'activation_22' ReLU ReLU 71 'depthwise_conv2d_11' Grouped Convolution 512 groups of 1 3×3×1 convolutions with stride [2 2] and padding 'same' 72 'L121' Batch Normalization Batch normalization with 512 channels 73 'activation_23' ReLU ReLU 74 'conv2d_12' Convolution 1024 1×1×512 convolutions with stride [1 1] and padding 'same' 75 'L122' Batch Normalization Batch normalization with 1024 channels 76 'activation_24' ReLU ReLU 77 'depthwise_conv2d_12' Grouped Convolution 1024 groups of 1 3×3×1 convolutions with stride [1 1] and padding 'same' 78 'L131' Batch Normalization Batch normalization with 1024 channels 79 'activation_25' ReLU ReLU 80 'conv2d_13' Convolution 1024 1×1×1024 convolutions with stride [1 1] and padding 'same' 81 'L132' Batch Normalization Batch normalization with 1024 channels 82 'activation_26' ReLU ReLU 83 'global_average_pooling2d' Global Average Pooling Global average pooling 84 'dense' Fully Connected 521 fully connected layer 85 'softmax' Softmax softmax 86 'Sound' Classification Output crossentropyex with 'Speech' and 520 other classes
To view the names of the classes learned by the network, you can view the Classes
property of the classification output layer (the final layer). View the first 10 classes by specifying the first 10 elements.
net.Layers(end).Classes(1:10)
ans = 10×1 categorical
Speech
Child speech, kid speaking
Conversation
Narration, monologue
Babbling
Speech synthesizer
Shout
Bellow
Whoop
Yell
Use analyzeNetwork
(Deep Learning Toolbox) to visually explore the network.
analyzeNetwork(net)
YAMNet was released with a corresponding sound class ontology, which you can explore using the yamnetGraph
object.
ygraph = yamnetGraph;
p = plot(ygraph);
layout(p,'layered')
The ontology graph plots all 521 possible sound classes. Plot a subgraph of the sounds related to respiratory sounds.
allRespiratorySounds = dfsearch(ygraph,"Respiratory sounds");
ygraphSpeech = subgraph(ygraph,allRespiratorySounds);
plot(ygraphSpeech)
Preprocess Audio and Classify Sounds with YAMNet
Read in an audio signal.
[audioIn,fs] = audioread('SpeechDFT-16-8-mono-5secs.wav');
Plot and listen to the audio signal.
T = 1/fs; t = 0:T:(length(audioIn)*T) - T; plot(t,audioIn); grid on xlabel('Time (t)') ylabel('Ampltiude')
soundsc(audioIn,fs)
Use yamnetPreprocess
to extract mel spectrograms from the audio signal. Visualize an arbitrary spectrogram from the array.
melSpectYam = yamnetPreprocess(audioIn,fs); arbSpect = melSpectYam(:,:,1,randi(size(melSpectYam,4))); surf(arbSpect,'EdgeColor','none') view([90,-90]) axis([1 size(arbSpect,1) 1 size(arbSpect,2)]) xlabel('Mel Band') ylabel('Frame') title('Mel Spectrogram for YAMNet') axis tight
Create a YAMNet neural network (This requires Deep Learning Toolbox). Call classify
with your YAMNet network and the preprocessed mel spectrogram images.
net = yamnet; classes = classify(net,melSpectYam);
Classify the audio signal as the most frequently occurring sound.
mySound = mode(classes)
mySound = categorical
Speech
Input Arguments
audioIn
— Input signal
column vector | matrix
Input signal, specified as a column vector or matrix. If you specify a matrix,
yamnetPreprocess
treats the columns of the matrix as individual
audio channels.
Data Types: single
| double
fs
— Sample rate (Hz)
positive scalar
Sample rate of the input signal in Hz, specified as a positive scalar.
Data Types: single
| double
OP
— Overlap percentage between consecutive mel spectrograms
50
(default) | scalar in the range [0,100)
Percentage overlap between consecutive mel spectrograms, specified as a scalar in the range [0,100).
Data Types: single
| double
Output Arguments
features
— Mel spectrograms that can be fed to YAMNet pretrained network
96
-by-64
-by-1
-by-K
array
Mel spectrograms generated from audioIn
, returned as a
96
-by-64
-by-1
-by-K
array, where:
96
–– Represents the number of 25 ms frames in each mel spectrogram64
–– Represents the number of mel bands spanning 125 Hz to 7.5 kHzK –– Represents the number of mel spectrograms and depends on the length of
audioIn
, the number of channels inaudioIn
, as well asOverlapPercentage
Note
Each
96
-by-64
-by-1
patch represents a single mel spectrogram image. For multichannel inputs, mel spectrograms are stacked along the fourth dimension.
Data Types: single
References
[1] Gemmeke, Jort F., et al. “Audio Set: An Ontology and Human-Labeled Dataset for Audio Events.” 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2017, pp. 776–80. DOI.org (Crossref), doi:10.1109/ICASSP.2017.7952261.
[2] Hershey, Shawn, et al. “CNN Architectures for Large-Scale Audio Classification.” 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2017, pp. 131–35. DOI.org (Crossref), doi:10.1109/ICASSP.2017.7952132.
Extended Capabilities
C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.
GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.
This function fully supports GPU arrays. For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).
Version History
Introduced in R2021a
See Also
Apps
Blocks
Functions
Ouvrir l'exemple
Vous possédez une version modifiée de cet exemple. Souhaitez-vous ouvrir cet exemple avec vos modifications ?
Commande MATLAB
Vous avez cliqué sur un lien qui correspond à cette commande MATLAB :
Pour exécuter la commande, saisissez-la dans la fenêtre de commande de MATLAB. Les navigateurs web ne supportent pas les commandes MATLAB.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)