Audio Processing Using Deep Learning

Extend deep learning workflows with audio and speech processing applications

Apply deep learning to audio and speech processing applications by using Deep Learning Toolbox™ together with Audio Toolbox™. For signal processing applications, see Signal Processing Using Deep Learning. For applications in wireless communications, see Wireless Communications Using Deep Learning.


Audio LabelerDefine and visualize ground-truth labels


ivectorSystemCreate i-vector system
crepeCREPE neural network
crepePreprocessPreprocess audio for CREPE deep learning network
crepePostprocessPostprocess output of CREPE deep learning network
pitchnnEstimate pitch with deep learning neural network
openl3OpenL3 neural network
openl3PreprocessPreprocess audio for OpenL3 feature extraction
openl3FeaturesExtract OpenL3 features
audioDatastoreDatastore for collection of audio files
audioDataAugmenterAugment audio data
audioFeatureExtractorStreamline audio feature extraction
vggishPreprocessPreprocess audio for VGGish feature extraction
vggishFeaturesExtract VGGish features
vggishVGGish neural network
yamnetYAMNet neural network
yamnetPreprocessPreprocess audio for YAMNet classification
yamnetGraphGraph of YAMNet AudioSet ontology
classifySoundClassify sounds in audio signal


Introduction to Deep Learning for Audio Applications (Audio Toolbox)

Learn common tools and workflows to apply deep learning to audio applications.

Classify Sound Using Deep Learning (Audio Toolbox)

Train, validate, and test a simple long short-term memory (LSTM) to classify sounds.

Transfer Learning with Pretrained Audio Networks (Audio Toolbox)

Use transfer learning to retrain YAMNet, a pretrained convolutional neural network (CNN), to classify a new set of audio signals.

Speaker Identification Using Custom SincNet Layer and Deep Learning (Audio Toolbox)

Perform speech recognition using a custom deep learning layer that implements a mel-scale filter bank.

Dereverberate Speech Using Deep Learning Networks (Audio Toolbox)

Train a deep learning model that removes reverberation from speech.

Speech Command Recognition in Simulink (Audio Toolbox)

Detect the presence of speech commands in audio using a Simulink® model.

