updateMetricsAndFit
Update performance metrics in incremental drift-aware learning model given new data and train model
Since R2022b
Description
Mdl = updateMetricsAndFit(Mdl,X,Y)Mdl, which is the
        input incremental drift-aware learning model Mdl with the following modifications:
- updateMetricsAndFitmeasures the model performance on the incoming predictor and response data,- Xand- Yrespectively. When the input model is warm (- Mdl.IsWarmis- true),- updateMetricsAndFitoverwrites previously computed metrics, stored in the- Metricsproperty, with the new values. Otherwise,- updateMetricsAndFitstores- NaNvalues in- Metricsinstead.
- updateMetricsAndFitfits the modified model to the incoming data by performing incremental drift-aware learning.
The input and output models have the same data type.
Mdl = updateMetricsAndFit(Mdl,X,Y,Name=Value)
Examples
Create the random concept data and concept drift generator using the helper functions, HelperSineGenerator and HelperConceptDriftGenerator, respectively.
concept1 = HelperSineGenerator(ClassificationFunction=1,IrrelevantFeatures=true,TableOutput=false); concept2 = HelperSineGenerator(ClassificationFunction=3,IrrelevantFeatures=true,TableOutput=false); driftGenerator = HelperConceptDriftGenerator(concept1,concept2,15000,1000);
When ClassificationFunction is 1, HelperSineGenerator labels all points that satisfy x1 < sin(x2) as 1, otherwise the function labels them as 0. When ClassificationFunction is 3, this is reversed. That is, HelperSineGenerator labels all points that satisfy x1 >= sin(x2) as 1, otherwise the function labels them as 0 [2]. The software returns the data in matrices for using in incremental learners.
HelperConceptDriftGenerator establishes the concept drift. The object uses a sigmoid function 1./(1+exp(-4*(numobservations-position)./width)) to decide the probability of choosing the first stream when generating data [3]. In this case, the position argument is 15000 and the width argument is 1000. As the number of observations exceeds the position value minus half of the width, the probability of sampling from the first stream when generating data decreases. The sigmoid function allows a smooth transition from one stream to the other. Larger width values indicate a larger transition period where both streams are approximately equally likely to be selected.
Initiate an incremental drift-aware model for classification as follows:
- Create an incremental Naive Bayes classification model for binary classification. 
- Initiate an incremental concept drift detector that uses the Hoeffding's Bounds Drift Detection Method with moving average (HDDMA). 
- Using the incremental linear model and the concept drift detector, initiate an incremental drift-aware model. Specify the training period as 5000 observations. 
BaseLearner = incrementalClassificationNaiveBayes(MaxNumClasses=2,Metrics="classiferror"); dd = incrementalConceptDriftDetector("hddma"); idal = incrementalDriftAwareLearner(BaseLearner,DriftDetector=dd,TrainingPeriod=5000);
Preallocate the number of variables in each chunk and number of iterations for creating a stream of data.
numObsPerChunk = 10; numIterations = 4000;
Preallocate the variables for tracking the drift status and drift time, and storing the classification error.
dstatus = zeros(numIterations,1); statusname = strings(numIterations,1); driftTimes = []; ce = array2table(zeros(numIterations,2),VariableNames=["Cumulative" "Window"]);
Simulate a data stream with incoming chunks of 10 observations each and perform incremental drift-aware learning. At each iteration:
- Simulate predictor data and labels, and update - driftGeneratorusing the helper function- hgenerate.
- Call - updateMetricsAndFitto update the performance metrics and fit the incremental drift-aware model to the incoming data.
- Track and record the drift status and the classification error for visualization purposes. 
rng(12); % For reproducibility for j = 1:numIterations % Generate data [driftGenerator,X,Y] = hgenerate(driftGenerator,numObsPerChunk); % Update performance metrics and fit idal = updateMetricsAndFit(idal,X,Y); % Record drift status and classification error statusname(j) = string(idal.DriftStatus); ce{j,:} = idal.Metrics{"ClassificationError",:}; if idal.DriftDetected dstatus(j) = 2; elseif idal.WarningDetected dstatus(j) = 1; else dstatus(j) = 0; end if idal.DriftDetected driftTimes(end+1) = j; end end
Plot the cumulative and per window classification error. Mark the warmup and training periods, and where the drift was introduced.
h = plot(ce.Variables); xlim([0 numIterations]) ylim([0 0.22]) ylabel("Classification Error") xlabel("Iteration") xline(idal.MetricsWarmupPeriod/numObsPerChunk,"g-.","Warmup Period",LineWidth=1.5) xline(idal.MetricsWarmupPeriod/numObsPerChunk+driftTimes,"g-.","Warmup Period",LineWidth=1.5) xline(idal.TrainingPeriod/numObsPerChunk,"b-.","Training Period",LabelVerticalAlignment="middle",LineWidth=1.5) xline(driftTimes,"m--","Drift",LabelVerticalAlignment="middle",LineWidth=1.5) legend(h,ce.Properties.VariableNames) legend(h,Location="best")

The updateMetricsAndFit function first evaluates the performance of the model by calling updateMetrics on incoming data, and then fits the model to data by calling fit:
The updateMetrics function evaluates the performance of the model as it processes incoming observations. The function writes specified metrics, measured cumulatively and within a specified window of processed observations, to the Metrics model property.
The fit function fits the model by updating the base learner and monitoring for drift given an incoming batch of data. When you call fit, the software performs the following procedure:
- Trains the model up to - NumTrainingObservationsobservations.
- After training, the software starts tracking the model loss to see if any concept drift has occurred and updates drift status accordingly. 
- When the drift status is - Warning, the software trains a temporary model to replace the- BaseLearnerin preparation for an imminent drift.
- When the drift status is - Drift, temporary model replaces the- BaseLearner.
- When the drift status is - Stable, the software discards the temporary model.
For more information, see the Algorithms section.
Plot the drift status versus the iteration number.
gscatter(1:numIterations,dstatus,statusname,"gmr","o",5,"on","Iteration","Drift Status","filled")

Input Arguments
Incremental drift-aware learning model fit to streaming data, specified as an incrementalDriftAwareLearner model object. You can create
                Mdl using the incrementalDriftAwareLearner
            function. For more details, see the object reference page.
Chunk of predictor data to which the model is fit, specified as a floating-point matrix of n observations and Mdl.BaseLearner.NumPredictors predictor variables.
When Mdl.BaseLearner accepts the ObservationsIn name-value argument, the value of ObservationsIn determines the orientation of the variables and observations. The default ObservationsIn value is "rows", which indicates that observations in the predictor data are oriented along the rows of X.
The length of the observation responses (or labels) Y and the number of observations in X must be equal; Y( is the response (or label) of observation j (row or column) in j)X.
Note
- If - Mdl.BaseLearner.NumPredictors= 0,- updateMetricsAndFitinfers the number of predictors from- X, and sets the corresponding property of the output model. Otherwise, if the number of predictor variables in the streaming data changes from- Mdl.BaseLearner.NumPredictors,- updateMetricsAndFitissues an error.
- updateMetricsAndFitsupports only floating-point input predictor data. If your input data includes categorical data, you must prepare an encoded version of the categorical data. Use- dummyvarto convert each categorical variable to a numeric matrix of dummy variables. Then, concatenate all dummy variable matrices and any other numeric predictors. For more details, see Dummy Variables.
Data Types: single | double
Chunk of responses (or labels) to which the model is fit, specified as one of the following:
- Floating-point vector of n elements for regression models, where n is the number of rows in - X.
- Categorical, character, or string array, logical vector, or cell array of character vectors for classification models. If - Yis a character array, it must have one class label per row. Otherwise,- Ymust be a vector with n elements.
The length of Y and the number of observations in
                X must be equal;
                Y( is the response (or label) of
            observation j (row or column) in j)X.
For classification problems:
- When - Mdl.BaseLearner.ClassNamesis nonempty, the following conditions apply:- If - Ycontains a label that is not a member of- Mdl.BaseLearner.ClassNames,- updateMetricsAndFitissues an error.
- The data type of - Yand- Mdl.BaseLearner.ClassNamesmust be the same.
 
- When - Mdl.BaseLearner.ClassNamesis empty,- updateMetricsAndFitinfers- Mdl.BaseLearner.ClassNamesfrom data.
Data Types: single | double | categorical | char | string | logical | cell
Name-Value Arguments
Specify optional pairs of arguments as
      Name1=Value1,...,NameN=ValueN, where Name is
      the argument name and Value is the corresponding value.
      Name-value arguments must appear after other arguments, but the order of the
      pairs does not matter.
    
Example: ObservationsIn="columns",Weights=W specifies that the columns
        of the predictor matrix correspond to observations, and the vector W
        contains observation weights to apply during incremental learning.
Predictor data observation dimension, specified as "columns" or
                "rows".
updateMetricsAndFit supports ObservationsIn only if
                Mdl.BaseLearner supports the ObservationsIn
            name-value argument.
Example: ObservationsIn="columns"
Data Types: char | string
Chunk of observation weights, specified as a floating-point vector of positive values. updateMetricsAndFit weighs the observations in X with the corresponding values in Weights. The size of Weights must equal n, which is the number of observations in X.
By default, Weights is ones(.n,1)
Example: Weights=w
Data Types: double | single
Output Arguments
Updated incremental drift-aware learning model, returned as an incremental learning
            model object of the same data type as the input model Mdl,
              incrementalDriftAwareLearner.
Algorithms
Incremental learning, or online learning, is a branch of machine learning concerned with processing incoming data from a data stream, possibly given little to no knowledge of the distribution of the predictor variables, aspects of the prediction or objective function (including tuning parameter values), or whether the observations are labeled. Incremental learning differs from traditional machine learning, where enough labeled data is available to fit to a model, perform cross-validation to tune hyperparameters, and infer the predictor distribution. For more details, see Incremental Learning Overview.
Unlike other incremental learning functionality offered by Statistics and Machine Learning Toolbox™, updateMetricsAndFit model object combines incremental learning and
        concept drift detection.
After creating an incrementalDriftAwareLearner object, use updateMetrics
        to update model performance metrics and fit to fit the
        base model to incoming chunk of data, check for potential drift in the model performance
        (concept drift), and update or reset the incremental drift-aware learner, if necessary. You
        can also use updateMetricsAndFit. The fit function
        implements the Reactive Drift Detection Method (RDDM) [1] as follows:
- After - Mdl.BaseLearner.EstimationPeriod(if necessary) and- MetricsWarmupPeriod, the function trains the incremental drift-aware model up to- NumTrainingObservationsobservations until it reaches- TrainingPeriod. (If the- TrainingPeriodvalue is smaller than the- Mdl.BaseLearner.MetricsWarmupPeriodvalue, then- incrementalDriftAwareLearnersets the- TrainingPeriodvalue as- Mdl.BaseLearner.MetricsWarmupPeriod.)
- When - NumTrainingObservations > TrainingPeriod, the software starts tracking the model loss. The software computes the per observation loss using the- perObservationLossfunction. While computing the per observation loss, the software uses the- "classiferror"loss metric for classification models and- "squarederror"for regression models. The function then appends the loss values computed using the last chunk of data to the existing buffer loss values.
- Next, the software checks to see if any concept drift occurred by using the - detectdriftfunction and updates- DriftStatusaccordingly.
Based on the drift status, fit performs the following procedure:
- DriftStatusis- 'Warning'– The software first increases the consecutive- 'Warning'status count by 1.- If the consecutive - 'Warning'status count is less than the- WarningCountLimitvalue and the- PreviousDriftStatusvalue is- Stable, then the software trains a temporary incremental learner (if one does not exist) and sets it (or the existing one) to- BaseLearner.- Then the software resets the temporary incremental learner using the learner's - resetfunction.
- If the consecutive - 'Warning'status count is less than the- WarningCountLimitvalue and the- PreviousDriftStatusvalue is- 'Warning', then the software trains the existing temporary incremental model using the latest chunk of data.
- If the consecutive - 'Warning'status count is more than the- WarningCountLimitvalue, then the software sets the- DriftStatusvalue to- 'Drift'.
 
- DriftStatusis- 'Drift'– The software performs the following steps.- Sets the consecutive - 'Warning'status count to 0.
- Resets - DriftDetectorusing the- resetfunction.
- Empties the buffer loss values and appends the loss values for the latest chunk of data to buffer loss values. 
- If the temporary incremental model is not empty, then the software sets the current - BaseLearnervalue to the temporary incremental model and empties the temporary incremental model.
- If the temporary incremental model is empty, then the software resets the - BaseLearnervalue by using the learner's- resetfunction.
 
- DriftStatusis- 'Stable'– The software first increases the consecutive- 'Stable'status count by 1.- If the consecutive - 'Stable'status count is less than the- StableCountLimitand the- PreviousDriftStatusvalue is- 'Warning', then the software sets the number of warnings to zero and empties the temporary model.
- If the consecutive - 'Stable'status count is more than the- StableCountLimitvalue, then the software resets the- DriftDetectorusing the- resetfunction. Then the software tests all of the saved loss values in the buffer for concept drift by using the- detectdriftfunction.
 
Once DriftStatus is set to 'Drift', and the
            BaseLearner and DriftDetector are reset, the
        software waits until Mdl.BaseLearner.EstimationPeriod +
            Mdl.BaseLearner.MetricsWarmupPeriod before it starts computing the
        performance metrics.
- The - updateMetricsand- updateMetricsAndFitfunctions track model performance metrics (- Metrics) from new data when the incremental model is warm (- Mdl.BaseLearner.IsWarmproperty). An incremental model becomes warm after- fitor- updateMetricsAndFitfits the incremental model to- MetricsWarmupPeriodobservations, which is the metrics warm-up period.- If - Mdl.BaseLearner.EstimationPeriod> 0, the functions estimate hyperparameters before fitting the model to data. Therefore, the functions must process an additional- EstimationPeriodobservations before the model starts the metrics warm-up period.
- The - Metricsproperty of the incremental model stores two forms of each performance metric as variables (columns) of a table,- Cumulativeand- Window, with individual metrics in rows. When the incremental model is warm,- updateMetricsand- updateMetricsAndFitupdate the metrics at the following frequencies:- Cumulative— The functions compute cumulative metrics since the start of model performance tracking. The functions update metrics every time you call the functions, and base the calculation on the entire supplied data set until a model reset.
- Window— The functions compute metrics based on all observations within a window determined by the- MetricsWindowSizename-value argument.- MetricsWindowSizealso determines the frequency at which the software updates- Windowmetrics. For example, if- MetricsWindowSizeis 20, the functions compute metrics based on the last 20 observations in the supplied data (- X((end – 20 + 1):end,:)and- Y((end – 20 + 1):end)).- Incremental functions that track performance metrics within a window use the following process: - Store - MetricsWindowSizeamount of values for each specified metric, and store the same amount of observation weights.
- Populate elements of the metrics values with the model performance based on batches of incoming observations, and store the corresponding observation weights. 
- When the window of observations is filled, overwrite - Mdl.Metrics.Windowwith the weighted average performance in the metrics window. If the window is overfilled when the function processes a batch of observations, the latest incoming- MetricsWindowSizeobservations are stored, and the earliest observations are removed from the window. For example, suppose- MetricsWindowSizeis 20, there are 10 stored values from a previously processed batch, and 15 values are incoming. To compose the length 20 window, the functions use the measurements from the 15 incoming observations and the latest 5 measurements from the previous batch.
 
 
- The software omits an observation with a - NaNscore when computing the- Cumulativeand- Windowperformance metric values.
References
[1] Barros, Roberto S.M. , et al. "RDDM: Reactive drift detection method." Expert Systems with Applications. vol. 90, Dec. 2017, pp. 344-55. https://doi.org/10.1016/j.eswa.2017.08.023.
[2] Bifet, Albert, et al. "New Ensemble Methods for Evolving Data Streams." Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM Press, 2009, p. 139. https://doi.org/10.1145/1557019.1557041.
[3] Gama, João, et al. "Learning with drift detection". Advances in Artificial Intelligence – SBIA 2004, edited by Ana L. C. Bazzan and Sofiane Labidi, vol. 3171, Springer Berlin Heidelberg, 2004, pp. 286–95. https://doi.org/10.1007/978-3-540-28645-5_29.
Version History
Introduced in R2022b
See Also
predict | perObservationLoss | fit | incrementalDriftAwareLearner | updateMetrics | loss
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Sélectionner un site web
Choisissez un site web pour accéder au contenu traduit dans votre langue (lorsqu'il est disponible) et voir les événements et les offres locales. D’après votre position, nous vous recommandons de sélectionner la région suivante : .
Vous pouvez également sélectionner un site web dans la liste suivante :
Comment optimiser les performances du site
Pour optimiser les performances du site, sélectionnez la région Chine (en chinois ou en anglais). Les sites de MathWorks pour les autres pays ne sont pas optimisés pour les visites provenant de votre région.
Amériques
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)