Main Content

identify

Identify label

Since R2021a

    Description

    example

    tableOut = identify(ivs,data) identifies the label corresponding to the data.

    example

    tableOut = identify(ivs,data,scorer) specifies the scorer used to perform identification.

    tableOut = identify(___,NumCandidates=N) specifies the number of candidates to return in tableOut.

    Examples

    collapse all

    Use the Census Database (also known as AN4 Database) from the CMU Robust Speech Recognition Group [1]. The data set contains recordings of male and female subjects speaking words and numbers. The helper function in this example downloads the data set for you and converts the raw files to FLAC, and returns two audioDatastore objects containing the training set and test set. By default, the data set is reduced so that the example runs quickly. You can use the full data set by setting ReduceDataset to false.

    [adsTrain,adsTest] = HelperAN4Download(ReduceDataset=true);

    Split the test data set into enroll and test sets. Use two utterances for enrollment and the remaining for the test set. Generally, the more utterances you use for enrollment, the better the performance of the system. However, most practical applications are limited to a small set of enrollment utterances.

    [adsEnroll,adsTest] = splitEachLabel(adsTest,2);

    Inspect the distribution of speakers in the training, test, and enroll sets. The speakers in the training set do not overlap with the speakers in the test and enroll sets.

    summary(adsTrain.Labels)
         fejs      13 
         fmjd      13 
         fsrb      13 
         ftmj      13 
         fwxs      12 
         mcen      13 
         mrcb      13 
         msjm      13 
         msjr      13 
         msmn       9 
    
    summary(adsEnroll.Labels)
         fvap      2 
         marh      2 
    
    summary(adsTest.Labels)
         fvap      11 
         marh      11 
    

    Create an i-vector system that accepts feature input.

    fs = 16e3;
    iv = ivectorSystem(SampleRate=fs,InputType="features");

    Create an audioFeatureExtractor object to extract the gammatone cepstral coefficients (GTCC), the delta GTCC, the delta-delta GTCC, and the pitch from 50 ms periodic Hann windows with 45 ms overlap.

    afe = audioFeatureExtractor(gtcc=true,gtccDelta=true,gtccDeltaDelta=true,pitch=true,SampleRate=fs);
    afe.Window = hann(round(0.05*fs),"periodic");
    afe.OverlapLength = round(0.045*fs);
    afe
    afe = 
      audioFeatureExtractor with properties:
    
       Properties
                         Window: [800×1 double]
                  OverlapLength: 720
                     SampleRate: 16000
                      FFTLength: []
        SpectralDescriptorInput: 'linearSpectrum'
            FeatureVectorLength: 40
    
       Enabled Features
         gtcc, gtccDelta, gtccDeltaDelta, pitch
    
       Disabled Features
         linearSpectrum, melSpectrum, barkSpectrum, erbSpectrum, mfcc, mfccDelta
         mfccDeltaDelta, spectralCentroid, spectralCrest, spectralDecrease, spectralEntropy, spectralFlatness
         spectralFlux, spectralKurtosis, spectralRolloffPoint, spectralSkewness, spectralSlope, spectralSpread
         harmonicRatio, zerocrossrate, shortTimeEnergy
    
    
       To extract a feature, set the corresponding property to true.
       For example, obj.mfcc = true, adds mfcc to the list of enabled features.
    
    

    Create transformed datastores by adding feature extraction to the read function of adsTrain and adsEnroll.

    trainLabels = adsTrain.Labels;
    adsTrain = transform(adsTrain,@(x)extract(afe,x));
    enrollLabels = adsEnroll.Labels;
    adsEnroll = transform(adsEnroll,@(x)extract(afe,x));

    Train both the extractor and classifier using the training set.

    trainExtractor(iv,adsTrain, ...
        UBMNumComponents=64, ...
        UBMNumIterations=5, ...
        TVSRank=32, ...
        TVSNumIterations=3);
    Calculating standardization factors ....done.
    Training universal background model ........done.
    Training total variability space ......done.
    i-vector extractor training complete.
    
    trainClassifier(iv,adsTrain,trainLabels, ...
        NumEigenvectors=16, ...
        ...
        PLDANumDimensions=16, ...
        PLDANumIterations=5);
    Extracting i-vectors ...done.
    Training projection matrix .....done.
    Training PLDA model ........done.
    i-vector classifier training complete.
    

    To calibrate the system so that scores can be interpreted as a measure of confidence in a positive decision, use calibrate.

    calibrate(iv,adsTrain,trainLabels)
    Extracting i-vectors ...done.
    Calibrating CSS scorer ...done.
    Calibrating PLDA scorer ...done.
    Calibration complete.
    

    Enroll the speakers from the enrollment set.

    enroll(iv,adsEnroll,enrollLabels)
    Extracting i-vectors ...done.
    Enrolling i-vectors .....done.
    Enrollment complete.
    

    Evaluate the file-level prediction accuracy on the test set.

    numCorrect = 0;
    reset(adsTest)
    for index = 1:numel(adsTest.Files)
        features = extract(afe,read(adsTest));
        
        results = identify(iv,features);
        
        trueLabel = adsTest.Labels(index);
        predictedLabel = results.Label(1);
        isPredictionCorrect = trueLabel==predictedLabel;
        
        numCorrect = numCorrect + isPredictionCorrect;
    end
    display("File Accuracy: " + round(100*numCorrect/numel(adsTest.Files),2) + " (%)")
        "File Accuracy: 100 (%)"
    

    References

    [1] "CMU Sphinx Group - Audio Databases." http://www.speech.cs.cmu.edu/databases/an4/. Accessed 19 Dec. 2019.

    Input Arguments

    collapse all

    i-vector system, specified as an object of type ivectorSystem.

    Data to identify, specified as a column vector representing a single-channel (mono) audio signal or a matrix of audio features.

    • If InputType is set to "audio" when the i-vector system is created, data must be a column vector with underlying type single or double.

    • If InputType is set to "features" when the i-vector system is created, data must be a matrix with underlying type single or double. The matrix must consist of audio features where the number of features (columns) is locked the first time trainExtractor is called and the number of hops (rows) is variable-sized.

    Data Types: single | double

    Scoring algorithm used by the i-vector system, specified as "plda", which corresponds to probabilistic linear discriminant analysis (PLDA), or "css", which corresponds to cosine similarity score (CSS).

    To use "plda", you must train the PLDA model using trainClassifier. If the PLDA model has been trained, then scorer defaults to "plda". Otherwise, the scorer defaults to "css".

    Data Types: char | string

    Number of candidates to return in tableOut, specified as a positive scalar.

    Note

    If you request a number of candidates greater than the number of labels enrolled in the i-vector system, then all candidates are returned. If unspecified, the number of candidates defaults to the number of enrolled labels.

    Data Types: single | double

    Output Arguments

    collapse all

    Candidate labels and corresponding scores, returned as a table. The number of rows of tableOut is equal to N, the number of candidates. The candidates are sorted in order of confidence.

    Data Types: table

    Version History

    Introduced in R2021a

    expand all