Main Content

extract

Extract audio features

Since R2019b

Description

example

features = extract(aFE,audioIn) returns an array containing features of the audio input.

Examples

collapse all

Read in an audio signal.

[audioIn,fs] = audioread("Counting-16-44p1-mono-15secs.wav");

Create an audioFeatureExtractor to extract the centroid of the Bark spectrum, the kurtosis of the Bark spectrum, and the pitch of an audio signal.

aFE = audioFeatureExtractor("SampleRate",fs, ...
    "SpectralDescriptorInput","barkSpectrum", ...
    "spectralCentroid",true, ...
    "spectralKurtosis",true, ...
    "pitch",true)
aFE = 
  audioFeatureExtractor with properties:

   Properties
                     Window: [1024x1 double]
              OverlapLength: 512
                 SampleRate: 44100
                  FFTLength: []
    SpectralDescriptorInput: 'barkSpectrum'
        FeatureVectorLength: 3

   Enabled Features
     spectralCentroid, spectralKurtosis, pitch

   Disabled Features
     linearSpectrum, melSpectrum, barkSpectrum, erbSpectrum, mfcc, mfccDelta
     mfccDeltaDelta, gtcc, gtccDelta, gtccDeltaDelta, spectralCrest, spectralDecrease
     spectralEntropy, spectralFlatness, spectralFlux, spectralRolloffPoint, spectralSkewness, spectralSlope
     spectralSpread, harmonicRatio, zerocrossrate, shortTimeEnergy


   To extract a feature, set the corresponding property to true.
   For example, obj.mfcc = true, adds mfcc to the list of enabled features.

Call extract to extract the features from the audio signal. Normalize the features by their mean and standard deviation.

features = extract(aFE,audioIn);
features = (features - mean(features,1))./std(features,[],1);

Plot the normalized features over time.

idx = info(aFE);
duration = size(audioIn,1)/fs;

subplot(2,1,1)
t = linspace(0,duration,size(audioIn,1));
plot(t,audioIn)

subplot(2,1,2)
t = linspace(0,duration,size(features,1));
plot(t,features(:,idx.spectralCentroid), ...
     t,features(:,idx.spectralKurtosis), ...
     t,features(:,idx.pitch));
legend("Spectral Centroid","Spectral Kurtosis", "Pitch")
xlabel("Time (s)")

Figure contains 2 axes objects. Axes object 1 contains an object of type line. Axes object 2 with xlabel Time (s) contains 3 objects of type line. These objects represent Spectral Centroid, Spectral Kurtosis, Pitch.

Input Arguments

collapse all

Input audio, specified as a column vector or matrix of independent channels (columns).

Data Types: single | double

Output Arguments

collapse all

Extracted audio features, returned as an L-by-M-by-N array, where:

  • L –– Number of feature vectors (hops)

  • M –– Number of features extracted per analysis window

  • N –– Number of channels

Data Types: single | double

Version History

Introduced in R2019b