Index and Search Dataset Arrays


The dataset data type is not recommended. To work with heterogeneous data, use the MATLAB® table data type instead. See MATLAB table documentation for more information.

Ways To Index and Search

There are many ways to index into dataset arrays. For example, for a dataset array, ds, you can:

  • Use () to create a new dataset array from a subset of ds. For example, ds1 = ds(1:5,:) creates a new dataset array, ds1, consisting of the first five rows of ds. Metadata, including variable and observation names, transfers to the new dataset array.

  • Use variable names with dot notation to index individual variables in a dataset array. For example, ds.Height indexes the variable named Height.

  • Use observation names to index individual observations in a dataset array. For example, ds('Obs1',:) gives data for the observation named Obs1.

  • Use observation or variable numbers. For example, ds(:,[1,3,5]) gives the data in the first, third, and fifth variables (columns) of ds.

  • Use logical indexing to search for observations in ds that satisfy a logical condition. For example, ds(ds.Gender=='Male',:) gives the observations in ds where the variable named Gender, a nominal array, has the value Male.

  • Use ismissing to find missing data in the dataset array.


Common Indexing and Searching Methods

This example shows several indexing and searching methods for categorical arrays.

Load the sample data.

load hospital;
ans = 1×2

   100     7

The dataset array has 100 observations and 7 variables.

Index a variable by name. Return the minimum age in the dataset array.

ans = 25

Delete the variable Trials.

hospital.Trials = [];
ans = 1×2

   100     6

Index an observation by name. Display measurements on the first five variables for the observation named PUE-347.

ans = 
               LastName         Sex       Age    Weight    Smoker
    PUE-347    {'YOUNG'}        Female    25     114       false 

Index variables by number. Create a new dataset array containing the first four variables of hospital.

dsNew = hospital(:,1:4);
ans = 4x1 cell
    {'Sex'     }
    {'Age'     }
    {'Weight'  }

Index observations by number. Delete the last 10 observations.

hospital(end-9:end,:) = [];
ans = 1×2

    90     6

Search for observations by logical condition. Create a new dataset array containing only females who smoke.

dsFS = hospital(hospital.Sex=='Female' & hospital.Smoker==true,:);
ans = 
               LastName             Sex       Smoker
    LPD-746    {'MILLER'   }        Female    true  
    XBR-291    {'GARCIA'   }        Female    true  
    AAX-056    {'LEE'      }        Female    true  
    DTT-578    {'WALKER'   }        Female    true  
    AFK-336    {'WRIGHT'   }        Female    true  
    RBA-579    {'SANCHEZ'  }        Female    true  
    HAK-381    {'MORRIS'   }        Female    true  
    NSK-403    {'RAMIREZ'  }        Female    true  
    ILS-109    {'WATSON'   }        Female    true  
    JDR-456    {'SANDERS'  }        Female    true  
    HWZ-321    {'PATTERSON'}        Female    true  
    GGU-691    {'HUGHES'   }        Female    true  
    WUS-105    {'FLORES'   }        Female    true  

