fit
Syntax
Description
The incremental fit function fits an incremental principal
      component analysis (PCA) object (incrementalPCA) to
      streaming data. 
IncrementalMdl = fit(IncrementalMdl,X)IncrementalMdl, which represents the
        input incremental PCA model IncrementalMdl fit using the predictor data
          X. Specifically, the incremental fit
        function fits the model to the incoming data and stores the updated PCA properties in the
        output model IncrementalMdl.
IncrementalMdl = fit(IncrementalMdl,X,Weights=weights)weights.
[
        additionally returns the principal component scores IncrementalMdl,Xtransformed] = fit(IncrementalMdl,X)Xtransformed.
Examples
Perform principal component analysis (PCA) on an initial data chunk, and then create an incremental PCA model that incorporates the results of the analysis. Fit the incremental model to streaming data and analyze how the model evolves during training.
Load and Preprocess Data
Load the human activity data set.
load humanactivityFor details on the human activity data set, enter Description at the command line.
The data set includes observations containing 60 variables. To simulate streaming data, split the data set into an initial chunk of 1000 observations and a second chunk of 10,000 observations.
Xinitial = feat(1:1000,:); Xstream = feat(1001:11000,:);
Perform Initial PCA
Perform PCA on the initial data chunk by using the pca function. Specify to center the data and keep 10 principal components. Return the principal component coefficients (coeff), principal component variances (latent), and estimated means of the variables (mu).
[coeff,~,latent,~,~,mu]=pca(Xinitial,Centered=true,NumComponents=10);
Create Incremental PCA Model
Create a model for incremental PCA that incorporates the PCA results from the initial data chunk.
IncrementalMdl = incrementalPCA(Coefficients=coeff,Latent=latent, ...
    Means=mu,NumObservations=1000);
details(IncrementalMdl)  incrementalPCA with properties:
                     IsWarm: 1
    NumTrainingObservations: 0
               WarmupPeriod: 0
                         Mu: [0.7764 0.4931 -0.3407 0.1108 0.0707 0.0485 0.3931 -1.1100 0.0646 0.1703 -1.1020 0.0283 0.0836 -1.0797 0.0139 0.9328 1.2892 1.6731 2.0729 2.5181 2.9511 0.0128 0.0062 0.0039 0.0027 0.0020 0.0016 0.9322 1.3111 … ] (1×60 double)
                      Sigma: []
          ExplainedVariance: [10×1 double]
           EstimationPeriod: 0
                     Latent: [10×1 double]
               Coefficients: [60×10 double]
            VariableWeights: [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
              NumComponents: 10
              NumPredictors: 60
  Methods, Superclasses
IncrementalMdl is an incrementalPCA model object. All its properties are read-only. Because Coefficients and Latent are specified, the model is warm, meaning that the fit function returns transformed observations.
Fit Incremental Model
Fit the incremental model IncrementalMdl to the data by using the fit function. To simulate a data stream, fit the model in chunks of 100 observations at a time. At each iteration:
- Process 100 observations. 
- Overwrite the previous incremental model with a new one fitted to the incoming observations. 
- Store - topEV, the explained variance value of the component with the highest variance, to see how it evolves during incremental fitting.
n = numel(Xstream(:,1)); numObsPerChunk = 100; nchunk = floor(n/numObsPerChunk); topEV = zeros(nchunk,1); % Incremental fitting for j = 1:nchunk ibegin = min(n,numObsPerChunk*(j-1) + 1); iend = min(n,numObsPerChunk*j); IncrementalMdl = fit(IncrementalMdl,Xstream(ibegin:iend,:)); topEV(j) = IncrementalMdl.ExplainedVariance(1); end
IncrementalMdl is an incrementalPCA model object fitted to all the data in the stream. The fit function fits the model to the data chunk and updates the model properties.
Analyze Incremental Model During Training
Plot the explained variance value of the component with the highest variance to see how it evolves during training.
figure plot(topEV,".-") ylabel("topEV") xlabel("Iteration") xlim([0 nchunk])

The highest explained variance value is 33% after the first iteration, and rapidly rises to 80% after five iterations. The value then gradually approaches 97%.
Create a model for incremental principal component analysis (PCA) and specify to standardize the data.
IncrementalMdl = incrementalPCA(StandardizeData=true); details(IncrementalMdl)
  incrementalPCA with properties:
                     IsWarm: 0
    NumTrainingObservations: 0
               WarmupPeriod: 1000
                         Mu: []
                      Sigma: []
          ExplainedVariance: [0×1 double]
           EstimationPeriod: 1000
                     Latent: [0×1 double]
               Coefficients: []
            VariableWeights: [1×0 double]
              NumComponents: 0
              NumPredictors: 0
  Methods, Superclasses
IncrementalMdl is an incrementalPCA model object. All its properties are read-only. By default, the software sets the hyperparameter estimation period and the warm-up period to 1000 observations. The model must be warm before the incremental fit function outputs transformed data.
Load and Preprocess Data
Load the NYCHousing2015 sample data set.
load NYCHousing2015The data set includes 10 variables with information on the sales of properties in New York City in 2015.
Preprocess the data set. Remove the categorical variables BOROUGH, NEIGHBORHOOD and BUILDINGCLASSCATEGORY. Convert the datetime array (SALEDATE) to month numbers and change zeros in LANDSQUAREFEET, GROSSSQUAREFEET, SALEPRICE, and YEARBUILT to NaNs.
NYCHousing2015 = removevars(NYCHousing2015,["BOROUGH", ... "NEIGHBORHOOD","BUILDINGCLASSCATEGORY"]); NYCHousing2015.SALEDATE = month(NYCHousing2015.SALEDATE); NYCHousing2015.LANDSQUAREFEET(NYCHousing2015.LANDSQUAREFEET == 0) = NaN; NYCHousing2015.GROSSSQUAREFEET(NYCHousing2015.GROSSSQUAREFEET == 0) = NaN; NYCHousing2015.SALEPRICE(NYCHousing2015.SALEPRICE == 0) = NaN; NYCHousing2015.YEARBUILT(NYCHousing2015.YEARBUILT == 0) = NaN;
The fit function of incrementalPCA does not use observations that contain a missing value. Remove these observations from the data set. 
NYCHousing2015=rmmissing(NYCHousing2015);
The incrementalPCA functions do not accept data in table format. Convert the data set to array format and keep only the first 5000 observations.
streamingData = table2array(NYCHousing2015(1:end,:)); streamingData=streamingData(1:5000,:);
Fit Incremental Models
Fit the incremental model IncrementalMdl to the data using the fit function. To simulate a data stream, fit the model in chunks of 100 observations at a time. At each iteration:
- Process 100 observations. 
- Overwrite the previous incremental model with a new one fitted to the incoming observations. 
- Store - isWarm, the- IsWarmproperty of- IncrementalMdl, to see how it evolves during incremental fitting.
- Store - topEV, the explained variance value of the component with the highest variance, to see how it evolves during incremental fitting.
- Store - meanXtr, the mean of the transformed data output by the- fitfunction, to see how it evolves during incremental fitting.
n = numel(streamingData(:,1)); numObsPerChunk = 100; nchunk = floor(n/numObsPerChunk); meanXtr = zeros(nchunk,1); isWarm = zeros(nchunk,1); % Incremental fitting for j = 1:nchunk ibegin = min(n,numObsPerChunk*(j-1) + 1); iend = min(n,numObsPerChunk*j); [IncrementalMdl,Xtr] = fit(IncrementalMdl,streamingData(ibegin:iend,:)); isWarm(j) = IncrementalMdl.IsWarm; topEV(j) = IncrementalMdl.ExplainedVariance(1); meanXtr(j)=mean(Xtr(:)); end
IncrementalMdl is an incrementalPCA model object fitted to all the data in the stream. fit fits the model to the data chunk and outputs the transformed input data. 
Analyze Incremental Model During Training
To see how the IsWarm indicator, the explained variance value of the component with the highest variance, and the mean of the transformed input data per chunk evolve during training, plot them on separate tiles.
figure tiledlayout(3,1); nexttile plot(isWarm,".-") ylabel("IsWarm") xlabel("Iteration") xlim([0 nchunk]) nexttile plot(topEV,".-") ylabel("Top EV") xlabel("Iteration") xlim([0 nchunk]) nexttile plot(meanXtr,".-") ylabel("Mean of Transformed Data") xlabel("Iteration") xlim([0 nchunk])

Because EstimationPeriod = 1000,  fit processes 1000 observations to determine hyperparameters before updating the PCA properties of IncrementalMdl. After the estimation period, the top explained variance value initially fluctuates between 58% and 85%, and then gradually approaches 50%. Because WarmupPeriod = 1000, fit processes an additional 1000 observations after the estimation period before IncrementalMdl becomes warm and outputs transformed data. The mean of the transformed data fluctuates between –0.3 and 0.08.
Input Arguments
Incremental PCA model, specified as an incrementalPCA model object. You can create
                IncrementalMdl by calling incrementalPCA
            directly.
Chunk of predictor data, specified as a floating-point matrix of
              n observations and IncrementalMdl.NumPredictors
            variables. The rows of X correspond to observations, and the
            columns correspond to variables. The software ignores observations that contain at least
            one missing value.
Note
- If - IncrementalMdl.NumPredictors= 0,- fitinfers the number of predictors from- X, and sets the corresponding property of the output model. Otherwise, if the number of predictor variables in the streaming data changes from- IncrementalMdl.NumPredictors,- fitissues an error.
- fitsupports only numeric input predictor data. If your input data includes categorical data, you must prepare an encoded version of the categorical data. Use- dummyvarto convert each categorical variable to a numeric matrix of dummy variables. Then, concatenate all dummy variable matrices and any other numeric predictors. For more details, see Dummy Variables.
Data Types: single | double
Chunk of observation weights, specified as a floating-point vector of positive
            values. fit weighs the observations in
              X with the corresponding values in weights.
            The size of weights must equal n, the number of
            observations in X.
By default, weights is
              ones(.n,1)
Data Types: single | double
Output Arguments
Updated incremental PCA model, returned as an incrementalPCA
            model object.
Principal component scores, returned as a floating-point matrix. The rows of
              Xtransformed correspond to observations, and the columns
            correspond to components. If IncrementalMdl is not warm
              (IsWarm=false), all values of Xtransformed are
            returned as NaN. The data type of Xtransformed
            is the same as X.
Version History
Introduced in R2024a
See Also
incrementalPCA | pca | reset | transform
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Sélectionner un site web
Choisissez un site web pour accéder au contenu traduit dans votre langue (lorsqu'il est disponible) et voir les événements et les offres locales. D’après votre position, nous vous recommandons de sélectionner la région suivante : .
Vous pouvez également sélectionner un site web dans la liste suivante :
Comment optimiser les performances du site
Pour optimiser les performances du site, sélectionnez la région Chine (en chinois ou en anglais). Les sites de MathWorks pour les autres pays ne sont pas optimisés pour les visites provenant de votre région.
Amériques
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)