Missing Data

Representing Missing Data Values

Often, you represent missing or unavailable data values in MATLAB® code with the special value, NaN, which stands for Not-a-Number.

The IEEE® floating-point arithmetic convention defines NaN as the result of an undefined operation, such as 0/0.

Calculating with NaNs

When you perform calculations on a IEEE variable that contains NaNs, the NaN values are propagated to the final result. This behavior might render the result useless.

For example, consider a matrix containing the 3-by-3 magic square with its center element replaced with NaN:

a = magic(3); a(2,2) = NaN
a =
     8     1     6
     3   NaN     7
     4     9     2

Compute the sum for each column in the matrix:

ans = 
    15   NaN    15

Notice that the sum of the elements in the middle column is a NaN value because that column contains a NaN.

If you do not want to have NaNs in your final results, remove these values from your data. For more information, see Removing NaNs from Data.

Removing NaNs from Data

Use the IEEE function isnan to identify NaNs in the data, and then remove them using the techniques in the following table.

    Note:   Use the function isnan to identify NaNs. By IEEE arithmetic convention, the logical comparison NaN == NaN always produces 0 (that is, it never evaluates to true). Therefore, you cannot use x(x==NaN) = [] to remove NaNs from your data.



i = find(~isnan(x));

x = x(i)

Find the indices of elements in a vector x that are not NaNs. Keep only the non-NaN elements.

x = x(~isnan(x));

Remove NaNs from a vector x.

x(isnan(x)) = [];

Remove NaNs from a vector x (alternative method).

X(any(isnan(X),2),:) = [];

Remove any rows containing NaNs from a matrix X.

If you remove NaNs frequently, consider creating a small function that you can call. For example:

function X = exciseRows(X)
X(any(isnan(X),2),:) = [];

After you remove all rows containing NaNs, use the following command to compute the correlation coefficients of X :

C = corrcoef(exciseRows(X));

For more information about correlation coefficients, see Linear Correlation.

Interpolating Missing Data

Use interpolation to find intermediate points in your data. The simplest function for performing interpolation is interp1, which is a 1-D interpolation function.

By default, the interpolation method is 'linear', which fits a straight line between a pair of existing data points to calculate the intermediate value. The complete set of available methods, which you can specify as arguments in the interp1 function, includes the following:

  • 'nearest' — Nearest neighbor interpolation

  • 'next' — Next neighbor interpolation

  • 'previous' — Previous neighbor interpolation

  • 'linear' — Linear interpolation

  • 'spline' — Piecewise cubic spline interpolation

  • 'pchip' or 'cubic' — Shape-preserving piecewise cubic interpolation

  • 'v5cubic' — Cubic interpolation from MATLAB Version 5. This method does not extrapolate, and it issues a warning and uses 'spline' if X is not equally spaced.

For more information, see interp1 or type help interp1 at the MATLAB prompt.

Was this topic helpful?