Main Content

corrplot

Plot variable correlations

Description

[R,PValue] = corrplot(X) plots Pearson's correlation coefficients between all pairs of variables in the input matrix of time series data. The plot is a numVars-by-numVars grid, where numVars is the number of time series variables (columns) in the data, including the following subplots:

  • Each off diagonal subplot contains a scatterplot of a pair of variables with a least-squares reference line, the slope of which is equal to the displayed correlation coefficient.

  • Each diagonal subplot contains the distribution of a variable as a histogram.

Also, the function returns the correlation matrix in the plots and a matrix of p-values for testing the null hypothesis that each pair of coefficients is not correlated against the alternative hypothesis of a nonzero correlation.

example

[R,PValue] = corrplot(Tbl) plots the Pearson's correlation coefficients between all pairs of variables in the input table or timetable, and also returns tables for the correlation matrix and matrix of p-values.

To select a subset of variables, for which to plot the correlation matrix, use the DataVariables name-value argument.

example

[___] = corrplot(___,Name=Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. corrplot returns the output argument combination for the corresponding input arguments. For example, corrplot(Tbl,Type="Spearman",TestR="on",DataVariables=1:5) computes Spearman’s rank correlation coefficient for the first 5 variables of the table Tbl and tests for significant correlation coefficients.

example

corrplot(___) plots the correlation matrix.

example

corrplot(ax,___) plots on the axes specified by ax instead of the current axes (gca). ax can precede any of the input argument combinations in the previous syntaxes.

[___,H] = corrplot(___) plots the diagnostics of the input series and additionally returns handles to plotted graphics objects. Use elements of H to modify properties of the plot after you create it.

Examples

collapse all

Plot and return Pearson's correlation coeffifients between pairs of time series using the default options of corrplot. Input the time series data as a numeric matrix.

Load data of Canadian inflation and interest rates Data_Canada.mat, which contains the series in the matrix Data.

load Data_Canada

Plot and return the correlation matrix between all pairs of variables in the data.

R = corrplot(Data)

MATLAB figure

R = 5×5

    1.0000    0.9266    0.7401    0.7287    0.7136
    0.9266    1.0000    0.5908    0.5716    0.5556
    0.7401    0.5908    1.0000    0.9758    0.9384
    0.7287    0.5716    0.9758    1.0000    0.9861
    0.7136    0.5556    0.9384    0.9861    1.0000

The correlation plot shows that the short-term, medium-term, and long-term interest rates are highly correlated.

Plot correlations between time series, which are variables in a table, using default options. Return a table of pairwise correlations and a table of corresponding significance-test p-values.

Load data of Canadian inflation and interest rates Data_Canada.mat. Convert the table DataTable to a timetable.

load Data_Canada
dates = datetime(dates,ConvertFrom="datenum");
TT = table2timetable(DataTable,RowTimes=dates);
TT.Observations = [];

Plot and return the correlation matrix, with corresponding significance-test p-values, between all pairs of variables in the data

[R,PValue] = corrplot(TT)

MATLAB figure

R=5×5 table
              INF_C      INF_G      INT_S      INT_M      INT_L 
             _______    _______    _______    _______    _______

    INF_C          1    0.92665    0.74007    0.72867     0.7136
    INF_G    0.92665          1    0.59077    0.57159    0.55557
    INT_S    0.74007    0.59077          1     0.9758    0.93843
    INT_M    0.72867    0.57159     0.9758          1    0.98609
    INT_L     0.7136    0.55557    0.93843    0.98609          1

PValue=5×5 table
               INF_C         INF_G         INT_S         INT_M         INT_L   
             __________    __________    __________    __________    __________

    INF_C             1    3.6657e-18    3.2113e-08    6.6174e-08    1.6318e-07
    INF_G    3.6657e-18             1    4.7739e-05    9.4769e-05    0.00016278
    INT_S    3.2113e-08    4.7739e-05             1    2.3206e-27    1.3408e-19
    INT_M    6.6174e-08    9.4769e-05    2.3206e-27             1    5.1602e-32
    INT_L    1.6318e-07    0.00016278    1.3408e-19    5.1602e-32             1

corrplot returns the correlation matrix and corresponding matrix of p-values in tables R and PValue, respectively.

By default, corrplot computes correlations between all pairs of variables in the input table. To select a subset of variables from an input table, set the DataVariables option.

Plot the correlation matrix for selected time series.

Load the credit default data set Data_CreditDefaults.mat. The table DataTable contains the default rate of investment-grade corporate bonds series (IGD, the response variable) and several predictor variables.

load Data_CreditDefaults

Consider a multiple regression model for the default rate that includes an intercept term.

Include a variable in the table of data that represents the intercept in the design matrix (that is, a column of ones). Place the intercept variable at the beginning of the table.

Const = ones(height(DataTable),1);
DataTable = addvars(DataTable,Const,Before=1);

Create a variable that contains all predictor variable names.

varnames = DataTable.Properties.VariableNames;
prednames = varnames(varnames ~= "IGD");

Graph a correlation plot of all predictor variables except for the intercept dummy variable.

corrplot(DataTable,DataVariables=prednames(2:end));

MATLAB figure

The predictor BBB is moderately linearly associated with the other predictors, while all other predictors appear unassociated with each other.

Plot Kendall's rank correlations between multiple time series. Conduct a hypothesis test to determine which correlations are significantly different from zero.

Load data on Canadian inflation and interest rates.

load Data_Canada

Plot the Kendall's rank correlation coefficients between all pairs of variables. Identify which correlations are significantly different from zero by conducting hypothesis tests.

corrplot(DataTable,Type="Kendall",TestR="on")

MATLAB figure

The correlation coefficients highlighted in red indicate which pairs of variables have correlations significantly different from zero. For these time series, all pairs of variables have correlations significantly different from zero.

Test for correlations greater than zero between multiple time series.

Load data on Canadian inflation and interest rates Data_Canada.mat.

load Data_Canada

Return the pairwise Pearson's correlations and corresponding p-values for testing the null hypothesis of no correlation against the right-tailed alternative that the correlations are greater than zero.

[R,PValue] = corrplot(DataTable,Tail="right");

MATLAB figure

PValue
PValue=5×5 table
               INF_C         INF_G         INT_S         INT_M         INT_L   
             __________    __________    __________    __________    __________

    INF_C             1    1.8329e-18    1.6056e-08    3.3087e-08    8.1592e-08
    INF_G    1.8329e-18             1    2.3869e-05    4.7384e-05    8.1392e-05
    INT_S    1.6056e-08    2.3869e-05             1    1.1603e-27    6.7041e-20
    INT_M    3.3087e-08    4.7384e-05    1.1603e-27             1    2.5801e-32
    INT_L    8.1592e-08    8.1392e-05    6.7041e-20    2.5801e-32             1

The output PValue has pairwise p-values all less than the default 0.05 significance level, indicating that all pairs of variables have correlation significantly greater than zero.

Input Arguments

collapse all

Time series data, specified as a numObs-by-numVars numeric matrix. Each column of X corresponds to a variable, and each row corresponds to an observation.

Data Types: double

Time series data, specified as a table or timetable with numObs rows. Each row of Tbl is an observation.

Specify numVars variables to include in the diagnostics computations by using the DataVariables argument. The selected variables must be numeric.

Axes on which to plot, specified as an Axes object.

By default, corrplot plots to the current axes (gca).

corrplot does not support UIAxes targets.

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: corrplot(Tbl,Type="Spearman",TestR="on",DataVariables=1:5) computes Spearman’s rank correlation coefficient for the first 5 variables of the table Tbl and tests for significant correlation coefficients.

Correlation coefficient to compute, specified as a value in this table.

ValueDescription
"Pearson"Pearson’s linear correlation coefficient
"Kendall"Kendall’s rank correlation coefficient (τ)
"Spearman"Spearman’s rank correlation coefficient (ρ)

Example: Type="Kendall"

Data Types: char | string

Option for handling rows in the input time series data that contain NaN values, specified as a value in this table.

ValueDescription
"all"Use all rows, regardless of any NaN entries.
"complete"Use only rows that do not contain NaN entries.
"pairwise"Use rows that do not contain NaN entries in column (variable) i or j to compute R(i,j).

Rows applies only to the correlation and p-value calculations. For plots, corrplot defers to plotmatrix for handling missing values in the data.

Example: Rows="complete"

Data Types: char | string

Alternative hypothesis Ha used to compute the p-values PValue, specified as a value in this table.

ValueDescription
"both"Ha: Correlation is not zero.
"right"Ha: Correlation is greater than zero.
"left"Ha: Correlation is less than zero.

Example: Tail="left"

Data Types: char | string

Unique variable names used in the plots, specified as a string vector or cell vector of strings of a length numVars. VarNames(j) specifies the name to use for variable X(:,j) or DataVariables(j).

  • If the input time series data is the matrix X, the default is {'var1','var2',...}.

  • If the input time series data is the table or timetable Tbl, the default is Tbl.Properties.VariableNames.

Example: VarNames=["Const" "AGE" "BBD"]

Data Types: char | cell | string

Flag for testing whether correlations are significant, specified as a value in this table.

ValueDescription
"on"corrplot highlights significant correlations in the correlation matrix plot using red font.
"off"All correlations in the correlation matrix plot have black font.

Example: TestR="on"

Data Types: char | string

Significance level for correlation tests, specified as a scalar in the interval [0,1].

Example: Alpha=0.01

Data Types: double

Variables in Tbl for which corrplot includes in the correlation matrix plot, specified as a string vector or cell vector of character vectors containing variable names in Tbl.Properties.VariableNames, or an integer or logical vector representing the indices of names. The selected variables must be numeric.

Example: DataVariables=["GDP" "CPI"]

Example: DataVariables=[true true true false] or DataVariables=1:3 selects the first through third table variables.

Data Types: double | logical | char | cell | string

Output Arguments

collapse all

Correlations between pairs of variables in the input time series data that are displayed in the plots, returned as one of the following quantities:

  • numVars-by-numVars numeric matrix when you supply the input X.

  • numVars-by-numVars table when you supply the input Tbl, where numVars is the selected number of variables in the DataVariables argument.

p-values corresponding to significance tests on the elements of R, returned as one of the following quantities:

  • numVars-by-numVars numeric matrix when you supply the input X.

  • numVars-by-numVars table when you supply the input Tbl, where the variables specified by the DataVariables argument determines numVars and the names of the rows and columns of the output table.

The p-values are used to test the null hypothesis of no correlation against the alternative hypothesis of a nonzero correlation, with test tail specified by the TestR argument.

Handles to plotted graphics objects, returned as one of the following quantities:

  • numVars-by-numVars matrix of graphics objects when you supply the input X

  • numVars-by-numVars table of graphics objects when you supply the input Tbl, where the variables specified by the DataVariables argument determines numVars and the names of the rows and columns of the output table

H contains unique plot identifiers, which you can use to query or modify properties of the plot.

Tips

  • The setting Rows="pairwise" (the default) can return a correlation matrix that is not positive definite. The setting Rows="complete" returns a positive-definite matrix, but, in general, the estimates are based on fewer observations.

Algorithms

  • corrplot computes p-values for Pearson’s correlation by transforming the correlation to create a t-statistic with numObs – 2 degrees of freedom. The transformation is exact when the input time series data is normal.

  • corrplot computes p-values for Kendall’s and Spearman’s rank correlations by using either the exact permutation distributions (for small sample sizes) or large-sample approximations.

  • corrplot computes p-values for two-tailed tests by doubling the more significant of the two one-tailed p-values.

Version History

Introduced in R2012a

expand all