cepstralFeatureExtractor
(Removed) Extract cepstral features from audio segment
The cepstralFeatureExtractor
System object™ has been removed. For more information, see Version History.
Description
The cepstralFeatureExtractor
System object extracts cepstral features from an audio segment. Cepstral features are commonly
used to characterize speech and music signals.
To extract cepstral features:
Create the
cepstralFeatureExtractor
object and set its properties.Call the object with arguments, as if it were a function.
To learn more about how System objects work, see What Are System Objects?
Creation
Description
cepFeatures = cepstralFeatureExtractor
creates a System object, cepFeatures
, that calculates cepstral features
independently across each input channel. Columns of the input are treated as individual
channels.
cepFeatures = cepstralFeatureExtractor(
sets each property Name,Value
)Name
to the specified Value
.
Unspecified properties have default values.
Example: cepFeatures =
cepstralFeatureExtractor('InputDomain','Frequency','SampleRate',fs,'LogEnergy','Replace')
accepts a signal in the frequency domain, sampled at fs
Hz. The first
element of the coefficients vector is replaced by the log energy value.
Properties
Unless otherwise indicated, properties are nontunable, which means you cannot change their
values after calling the object. Objects lock when you call them, and the
release
function unlocks them.
If a property is tunable, you can change its value at any time.
For more information on changing property values, see System Design in MATLAB Using System Objects.
FilterBank
— Type of filter bank
'Mel'
(default) | 'Gammatone'
Type of filter bank, specified as either 'Mel'
or
'Gammatone'
. When FilterBank
is set to
Mel
, the object computes the mel frequency cepstral coefficients
(MFCC). When FilterBank
is set to Gammatone
, the
object computes the gammatone cepstral coefficients (GTCC).
Data Types: char
| string
InputDomain
— Domain of input signal
'Time'
(default) | 'Frequency'
Domain of the input signal, specified as either 'Time'
or
'Frequency'
.
Data Types: char
| string
NumCoeffs
— Number of coefficients to return
13
(default) | positive integer
Number of coefficients to return, specified as an integer in the range [2, v], where v is the number of valid passbands. The number of valid passbands depends on the type of filter bank:
Mel
–– The number of valid passbands is defined assum(
.BandEdges
<= floor(SampleRate
/2))-2Gammatone
–– The number of valid passbands is defined asceil(
.hz2erb
(FrequencyRange
(2))-hz2erb
(FrequencyRange
(1)))
Data Types: single
| double
Rectification
— Nonlinear rectification type
'Log'
(default) | 'Cubic-Root'
Nonlinear rectification type, specified as 'Log'
or
'Cubic-Root'
.
Data Types: char
| string
FFTLength
— FFT length
[]
(default) | positive integer
FFT length, specified as a positive integer. The default, []
,
means that the FFT length is equal to the number of rows in the input signal.
Dependencies
To enable this property, set InputDomain
to
'Time'
.
Data Types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
LogEnergy
— Specify how the log energy is shown
'Append'
(default) | 'Replace'
| 'Ignore'
Specify how the log energy is shown in the coefficients vector output, specified as:
'Append'
–– The object prepends the log energy to the coefficients vector. The length of the coefficients vector is 1 +NumCoeffs
.'Replace'
–– The object replaces the first coefficient with the log energy of the signal. The length of the coefficients vector isNumCoeffs
.'Ignore'
–– The object does not calculate or return the log energy.
Data Types: char
| string
SampleRate
— Input sample rate (Hz)
16000
(default) | positive scalar
Input sample rate in Hz, specified as a real positive scalar.
Data Types: single
| double
BandEdges
— Band edges of mel filter bank (Hz)
row vector
Band edges of the filter bank in Hz, specified as a nonnegative monotonically increasing row vector in the range [0, ∞). The maximum bandedge frequency can be any finite number. The number of bandedges must be in the range [4, 80].
The default band edges are spaced linearly for the first ten and then logarithmically after. The default band edges are set as recommended by [1].
Dependencies
To enable this property, set FilterBank
to
Mel
.
Data Types: single
| double
FrequencyRange
— Frequency range of gammatone filter bank (Hz)
[50 8000]
(default) | two-element row vector
Frequency range of the filter bank in Hz, specified as a positive, monotonically
increasing two-element row vector. The maximum frequency can be any finite number. The
center frequencies of the filter bank are equally spaced between
and
hz2erb
(FrequencyRange
(1))
on the ERB scale.hz2erb
(FrequencyRange
(2))
Dependencies
To enable this property, set FilterBank
to
Gammatone
.
Data Types: single
| double
FilterBankDesignDomain
— Domain for mel filter bank design
'Hz'
(default) | 'Bin'
Domain for filter bank design, specified as either 'Hz'
or
'Bin'
. The filter bank is designed as overlapped triangles with
band edges specified by the BandEdges
property.
The BandEdges
property is specified in Hz. When you set the
design domain to:
'Hz'
–– Filter bank triangles are drawn in Hz and are mapped onto bins.Here is an example that plots the filter bank in bins when the
FilterBankDesignDomain
is set to'Hz'
:[audioFile, fs] = audioread('NoisySpeech-16-22p5-mono-5secs.wav'); duration = round(0.02*fs); % 20 ms audio segment audioSegment = audioFile(5500:5500+duration-1); cepFeatures = cepstralFeatureExtractor('SampleRate',fs)
Pass the audio segment as an input to the cepstral feature extractor algorithm to lock the object.cepFeatures = cepstralFeatureExtractor with properties: Properties InputDomain: 'Time' NumCoeffs: 13 FFTLength: [] LogEnergy: 'Append' SampleRate: 22500 Advanced Properties BandEdges: [1×42 double] FilterBankDesignDomain: 'Hz' FilterBankNormalization: 'Bandwidth'
Use the[coeffs,delta,deltaDelta] = cepFeatures(audioSegment);
getFilters
function to get the filter bank. Plot the filter bank.[filterbank, freq] = getFilters(cepFeatures); plot(freq(1:150),filterbank(1:150,:))
For details, see [1].
'Bin'
–– The bandedge frequencies in'Hz'
are converted to bins. The filter bank triangles are drawn symmetrically in bins.Change the
FilterBankDesignDomain
property to'Bin'
:release(cepFeatures); cepFeatures.FilterBankDesignDomain = 'Bin'; [coeffs,delta,deltaDelta] = cepFeatures(audioSegment); [filterbank, freq] = getFilters(cepFeatures); plot(freq(1:150),filterbank(1:150,:))
For details, see [2].
Dependencies
To enable this property, set FilterBank
to
Mel
.
Data Types: char
| string
FilterBankNormalization
— Normalize filter bank
'Bandwidth'
(default) | 'Area'
| 'None'
Normalization technique used on the weights of the filter bank, specified as:
'Bandwidth'
–– The weights of each bandpass filter are normalized by the corresponding bandwidth of the filter.'Area'
–– The weights of each bandpass filter are normalized by the corresponding area of the bandpass filter.'None'
–– The weights of the filter are not normalized.
Data Types: char
| string
Usage
Description
[
returns the cepstral coefficients, the log energy, the delta, and the delta-delta.coeffs
,delta
,deltaDelta
]
= cepFeatures(audioIn
)
The log energy value prepends the coefficient vector or replaces the first element of
the coefficients vector based on whether you set the LogEnergy
property to 'Append'
or 'Replace'
. For details, see
coeffs.
Input Arguments
audioIn
— Input signal
column vector | matrix
Input signal, specified as a column vector or matrix. If
InputDomain
is set to 'Time'
, specify
audioIn
as a real-valued frame of audio data. If
InputDomain
is set to 'Frequency'
, specify
audioIn
as a real- or complex-valued discrete Fourier
transform. If specified as a matrix, the columns are treated as independent audio
channels.
Data Types: single
| double
Complex Number Support: Yes
Output Arguments
coeffs
— Cepstral coefficients
column vector | matrix
Cepstral coefficients, returned as a column vector or a matrix. If the
coefficients matrix is an N-by-M matrix,
N is determined by the values you specify in
NumCoeffs
and LogEnergy
properties.
M equals the number of input audio channels.
When the LogEnergy
property is set to:
'Append'
–– The object prepends the log energy value to the coefficients vector. The length of the coefficients vector is 1 +NumCoeffs
. This is the default setting of theLogEnergy
property.'Replace'
–– The object replaces the first coefficient with the log energy of the signal. The length of the coefficients vector isNumCoeffs
.'Ignore'
–– The object does not calculate or return the log energy.
Data Types: single
| double
delta
— Change in coefficients
column vector | matrix
Change in coefficients over consecutive calls to the algorithm, returned as a
vector or a matrix. The delta
array is of the same size and data
type as the coeffs
array.
In this example, cepFeatures
is the cepstral feature extractor
that accepts audio input signal sampled at 12 kHz. Stream in three segments of audio
signal on three consecutive calls to the object algorithm. Return the cepstral
coefficients of the filter bank and the corresponding delta
values.
cepFeatures = cepstralFeatureExtractor('SampleRate',12000);
[coeff1,delta1] = cepFeatures(audioIn);
[coeff2,delta2] = cepFeatures(audioIn);
[coeff3,delta3] = cepFeatures(audioIn);
delta2
is computed as coeff2-coeff1
,
while delta3
is computed as coeff3-coeff2
.
The initial array, delta1
, is an array of zeros.
Data Types: single
| double
deltaDelta
— Change in delta values
column vector | matrix
Change in delta
values over consecutive calls to the
algorithm, returned as a vector or a matrix. The deltaDelta
array
is the same size and data type as the coeffs
and
delta
arrays.
In this example, consecutive calls to the cepstral feature extractor algorithm
return the deltaDelta
values in addition to the coefficients and
the delta
values.
cepFeatures = cepstralFeatureExtractor('SampleRate',12000);
[coeff1,delta1,deltaDelta1] = cepFeatures(audioIn);
[coeff2,delta2,deltaDelta2] = cepFeatures(audioIn);
[coeff3,delta3,deltaDelta3] = cepFeatures(audioIn);
deltaDelta2
is computed as
delta2-delta1
, while deltaDelta3
is computed
as delta3-delta2
. The initial array,
deltaDelta1
, is an array of zeros.
Data Types: single
| double
Object Functions
To use an object function, specify the
System object as the first input argument. For
example, to release system resources of a System object named obj
, use
this syntax:
release(obj)
Specific to cepstralFeatureExtractor
getFilters | Get auditory filter bank |
Examples
Get MFCC Data for Speech Segment
Extract the mel frequency cepstral coefficients and the log energy
values of segments in a speech file. Return delta
, the difference
between current and the previous cepstral coefficients, and
deltaDelta
, the difference between the current and the previous
delta
values. The log energy value the object computes can prepend
the coefficients vector or replace the first element of the coefficients vector. This is
done based on whether you set the LogEnergy
property of the
cepstralFeatureExtractor
object to 'Replace'
or
'Append'
.
Read an audio signal from 'Counting-16-44p1-mono-15secs.wav'
file. Extract a 40 ms segment from the audio data. Create a
cepstralFeatureExtractor
object. The cepstral coefficients computed
by the default object are the mel frequency coefficients. In addition, the object
computes the log energy, delta, and delta-delta values of the audio segment.
[audioFile, fs] = audioread('Counting-16-44p1-mono-15secs.wav'); duration = round(0.04*fs); % 40 ms audioSegment = audioFile(40000:40000+duration-1); cepFeatures = cepstralFeatureExtractor('SampleRate',fs)
cepFeatures = cepstralFeatureExtractor with properties: Properties FilterBank: 'Mel' InputDomain: 'Time' NumCoeffs: 13 Rectification: 'Log' FFTLength: [] LogEnergy: 'Append' SampleRate: 44100 Show all properties
The LogEnergy
property is set to 'Append'
. The
first element in the coefficients vector is the log energy value and the remaining
elements are the 13 cepstral coefficients computed by the object. The number of cepstral
coefficients is determined by the value you specify in the NumCoeffs
property.
[coeffs,delta,deltaDelta] = cepFeatures(audioSegment)
coeffs = 14×1
5.2999
-4.9406
3.6130
0.4397
-0.2280
-1.1068
0.6679
0.6367
-0.3869
0.6127
⋮
delta = 14×1
0
0
0
0
0
0
0
0
0
0
⋮
deltaDelta = 14×1
0
0
0
0
0
0
0
0
0
0
⋮
The initial values for the delta
and
deltaDelta
arrays are always zero. Consider another 40 ms audio
segment in the file and extract the cepstral features from this segment.
audioSegmentTwo = audioFile(5820:5820+duration-1); [coeffsTwo,deltaTwo,deltaDeltaTwo] = cepFeatures(audioSegmentTwo)
coeffsTwo = 14×1
-0.1582
-15.9507
2.4295
0.2835
0.4345
0.4382
0.6040
0.4168
0.1846
0.2636
⋮
deltaTwo = 14×1
-5.4581
-11.0101
-1.1836
-0.1561
0.6625
1.5449
-0.0639
-0.2199
0.5715
-0.3491
⋮
deltaDeltaTwo = 14×1
-5.4581
-11.0101
-1.1836
-0.1561
0.6625
1.5449
-0.0639
-0.2199
0.5715
-0.3491
⋮
Verify that the difference between coeffsTwo
and
coeffs
vectors equals deltaTwo
.
isequal(coeffsTwo-coeffs,deltaTwo)
ans = logical
1
Verify that the difference between deltaTwo
and
delta
vectors equals deltaDeltaTwo
.
isequal(deltaTwo-delta,deltaDeltaTwo)
ans = logical
1
Algorithms
Auditory Cepstrum Coefficients
Auditory cepstrum coefficients are popular features extracted from speech signals for use in recognition tasks. In the source-filter model of speech, cepstral coefficients are understood to represent the filter (vocal tract). The vocal tract frequency response is relatively smooth, whereas the source of voiced speech can be modeled as an impulse train. As a result, the vocal tract can be estimated by the spectral envelope of a speech segment.
The motivating idea of cepstral coefficients is to compress information about the vocal tract (smoothed spectrum) into a small number of coefficients based on an understanding of the cochlea. Although there is no hard standard for calculating the coefficients, the basic steps are outlined by the diagram.
Two popular implementations of the filter bank are the mel filter bank and the gammatone filter bank.
The default mel filter bank linearly spaces the first 10 triangular filters and logarithmically spaces the remaining filters.
The default gammatone filter bank is composed of gammatone filters spaced linearly
on the ERB scale between 50 and 8000 Hz. The filter bank is designed by gammatoneFilterBank
.
Log Energy
If the input (x) is a time-domain signal, the log energy is computed using the following equation:
If the input (x) is a frequency-domain signal, the log energy is computed using the following equation:
References
[1] Auditory Toolbox. https://engineering.purdue.edu/~malcolm/interval/1998-010/AuditoryToolboxTechReport.pdf
[2] ETSI ES 201 108 V1.1.3 (2003-09). https://www.etsi.org/deliver/etsi_es/201100_201199/201108/01.01.03_60/es_201108v010103p.pdf
Extended Capabilities
C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.
Usage notes and limitations:
System Objects in MATLAB Code Generation (MATLAB Coder)
Version History
Introduced in R2018aR2024a: Removed
The cepstraFeatureExtractor
object has been removed. Use the mfcc
and
gtcc
functions
to compute the same features for batch signals. For streaming applications, improve
performance by designing the filter bank once with designAuditoryFilterBank
, and then apply the filter bank and extract the same
features with cepstralCoefficients
and audioDelta
in
the streaming loop. If you are extracting multiple audio features, use the audioFeatureExtractor
object.
cepstralFeatureExtractor Configuration | Recommended Replacement |
---|---|
| Use the [audioIn,fs] = audioread("Counting-16-44p1-mono-15secs.wav");
[coeffs,delta,deltaDelta] = mfcc(audioIn,fs); Alternatively,
use a combination of |
| Use the [audioIn,fs] = audioread("Counting-16-44p1-mono-15secs.wav");
[coeffs,delta,deltaDelta] = gtcc(audioIn,fs); Alternatively,
use a combination of |
| No replacement |
| Use the |
R2022b: Warns
The cepstraFeatureExtractor
object issues a warning that it will be
removed in a future release.
R2020b: To be removed
The cepstraFeatureExtractor
object runs without warning, but it will be
removed in a future release.
See Also
mfcc
| gtcc
| gammatoneFilterBank
| cepstralCoefficients
| audioFeatureExtractor
Commande MATLAB
Vous avez cliqué sur un lien qui correspond à cette commande MATLAB :
Pour exécuter la commande, saisissez-la dans la fenêtre de commande de MATLAB. Les navigateurs web ne supportent pas les commandes MATLAB.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)