Main Content

anova

Analysis of variance for between-subject effects in a repeated measures model

Description

example

anovatbl = anova(rm) returns the analysis of variance results for the repeated measures model rm.

example

anovatbl = anova(rm,'WithinModel',WM) returns the analysis of variance results it performs using the response or responses specified by the within-subject model WM.

Examples

collapse all

Load the sample data.

load fisheriris

The column vector species consists of iris flowers of three different species: setosa, versicolor, and virginica. The double matrix meas consists of four types of measurements on the flowers: the length and width of sepals and petals in centimeters, respectively.

Store the data in a table array.

t = table(species,meas(:,1),meas(:,2),meas(:,3),meas(:,4),...
'VariableNames',{'species','meas1','meas2','meas3','meas4'});
Meas = dataset([1 2 3 4]','VarNames',{'Measurements'});

Fit a repeated measures model where the measurements are the responses and the species is the predictor variable.

rm = fitrm(t,'meas1-meas4~species','WithinDesign',Meas);

Perform analysis of variance.

anova(rm)
ans=3×7 table
     Within     Between     SumSq     DF     MeanSq       F         pValue   
    ________    ________    ______    ___    _______    ______    ___________

    Constant    constant    7201.7      1     7201.7     19650    2.0735e-158
    Constant    species     309.61      2      154.8    422.39     1.1517e-61
    Constant    Error       53.875    147    0.36649                         

There are 150 observations and 3 species. The degrees of freedom for species is 3 - 1 = 2, and for error it is 150 - 3 = 147. The small p-value of 1.1517e-61 indicates that the measurements differ significantly according to species.

Load the sample panel data.

load('panelData.mat');

The dataset array, panelData, contains yearly observations on eight cities for 6 years. The first variable, Growth, measures economic growth (the response variable). The second and third variables are city and year indicators, respectively. The last variable, Employ, measures employment (the predictor variable). This is simulated data.

Store the data in a table array and define city as a nominal variable.

t = table(panelData.Growth,panelData.City,panelData.Year,...
	'VariableNames',{'Growth','City','Year'});

Convert the data in a proper format to do repeated measures analysis.

t = unstack(t,'Growth','Year','NewDataVariableNames',...
	{'year1','year2','year3','year4','year5','year6'});

Add the mean employment level over the years as a predictor variable to the table t.

t(:,8) = table(grpstats(panelData.Employ,panelData.City));
t.Properties.VariableNames{'Var8'} = 'meanEmploy';

Define the within-subjects variable.

Year = [1 2 3 4 5 6]';

Fit a repeated measures model, where the growth figures over the 6 years are the responses and the mean employment is the predictor variable.

rm = fitrm(t,'year1-year6 ~ meanEmploy','WithinDesign',Year);

Perform analysis of variance.

anovatbl = anova(rm,'WithinModel',Year)
anovatbl=3×7 table
     Within       Between        SumSq       DF      MeanSq         F         pValue  
    _________    __________    __________    __    __________    ________    _________

    Contrast1    constant          588.17    1         588.17    0.038495      0.85093
    Contrast1    meanEmploy    3.7064e+05    1     3.7064e+05      24.258    0.0026428
    Contrast1    Error              91675    6          15279                         

Load the sample data.

load('longitudinalData.mat');

The matrix Y contains response data for 16 individuals. The response is the blood level of a drug measured at five time points (time = 0, 2, 4, 6, and 8). Each row of Y corresponds to an individual, and each column corresponds to a time point. The first eight subjects are female, and the second eight subjects are male. This is simulated data.

Define a variable that stores gender information.

Gender = ['F' 'F' 'F' 'F' 'F' 'F' 'F' 'F' 'M' 'M' 'M' 'M' 'M' 'M' 'M' 'M']';

Store the data in a proper table array format to do repeated measures analysis.

t = table(Gender,Y(:,1),Y(:,2),Y(:,3),Y(:,4),Y(:,5),...
'VariableNames',{'Gender','t0','t2','t4','t6','t8'});

Define the within-subjects variable.

Time = [0 2 4 6 8]';

Fit a repeated measures model, where blood levels are the responses and gender is the predictor variable.

rm = fitrm(t,'t0-t8 ~ Gender','WithinDesign',Time);

Perform analysis of variance.

anovatbl = anova(rm)
anovatbl=3×7 table
     Within     Between     SumSq     DF    MeanSq      F         pValue  
    ________    ________    ______    __    ______    ______    __________

    Constant    constant     54702     1     54702    1079.2    1.1897e-14
    Constant    Gender      2251.7     1    2251.7    44.425    1.0693e-05
    Constant    Error        709.6    14    50.685                        

There are 2 genders and 16 observations, so the degrees of freedom for gender is (2 - 1) = 1 and for error it is (16 - 2)*(2 - 1) = 14. The small p-value of 1.0693e-05 indicates that there is a significant effect of gender on blood pressure.

Repeat analysis of variance using orthogonal contrasts.

anovatbl = anova(rm,'WithinModel','orthogonalcontrasts')
anovatbl=15×7 table
     Within     Between       SumSq       DF      MeanSq          F           pValue  
    ________    ________    __________    __    __________    __________    __________

    Constant    constant         54702     1         54702        1079.2    1.1897e-14
    Constant    Gender          2251.7     1        2251.7        44.425    1.0693e-05
    Constant    Error            709.6    14        50.685                            
    Time        constant        310.83     1        310.83        31.023    6.9065e-05
    Time        Gender          13.341     1        13.341        1.3315       0.26785
    Time        Error           140.27    14        10.019                            
    Time^2      constant        565.42     1        565.42        98.901    1.0003e-07
    Time^2      Gender          1.4076     1        1.4076       0.24621       0.62746
    Time^2      Error           80.039    14        5.7171                            
    Time^3      constant        2.6127     1        2.6127        1.4318       0.25134
    Time^3      Gender      7.8853e-06     1    7.8853e-06    4.3214e-06       0.99837
    Time^3      Error           25.546    14        1.8247                            
    Time^4      constant        2.8404     1        2.8404       0.47924       0.50009
    Time^4      Gender          2.9016     1        2.9016       0.48956       0.49559
    Time^4      Error           82.977    14        5.9269                            

Input Arguments

collapse all

Repeated measures model, returned as a RepeatedMeasuresModel object.

For properties and methods of this object, see RepeatedMeasuresModel.

Within-subject model, specified as one of the following:

  • 'separatemeans' — The response is the average of the repeated measures (average across the within-subject model).

  • 'orthogonalcontrasts' — This is valid when the within-subject model has a single numeric factor T. Responses are the average, the slope of centered T, and, in general, all orthogonal contrasts for a polynomial up to T^(p – 1), where p is the number of rows in the within-subject model. anova multiplies Y, the response you use in the repeated measures model rm by the orthogonal contrasts, and uses the columns of the resulting product matrix as the responses.

    anova computes the orthogonal contrasts for T using the Q factor of a QR factorization of the Vandermonde matrix.

  • A character vector or string scalar that defines a model specification in the within-subject factors. Responses are defined by the terms in that model. anova multiplies Y, the response matrix you use in the repeated measures model rm by the terms of the model, and uses the columns of the result as the responses.

    For example, if there is a Time factor and 'Time' is the model specification, then anova uses two terms, the constant and the uncentered Time term. The default is '1' to perform on the average response.

  • An r-by-nc matrix, C, specifying nc contrasts among the r repeated measures. If Y represents the matrix of repeated measures you use in the repeated measures model rm, then the output tbl contains a separate analysis of variance for each column of Y*C.

The anova table contains a separate univariate analysis of variance results for each response.

Example: 'WithinModel','Time'

Example: 'WithinModel','orthogonalcontrasts'

Output Arguments

collapse all

Results of analysis of variance for between-subject effects, returned as a table. This includes all terms on the between-subjects model and the following columns.

Column NameDefinition
WithinWithin-subject factors
BetweenBetween-subject factors
SumSqSum of squares
DFDegrees of freedom
MeanSqMean squared error
FF-statistic
pValuep-value corresponding to the F-statistic

More About

collapse all

Vandermonde Matrix

Vandermonde matrix is the matrix where columns are the powers of the vector a, that is, V(i,j) = a(i)(nj), where n is the length of a.

QR Factorization

QR factorization of an m-by-n matrix A is the factorization that matrix into the product A = Q*R, where R is an m-by-n upper triangular matrix and Q is an m-by-m unitary matrix.

Version History

Introduced in R2014a