Main Content

Create Categorical Arrays

This example shows how to create a categorical array. categorical is a data type for storing data with values from a finite set of discrete categories. These categories can have a natural order, but it is not required. A categorical array provides efficient storage and convenient manipulation of data, while also maintaining meaningful names for the values. You can use categorical arrays in a table to define groups of rows.

By default, categorical arrays contain categories that have no mathematical ordering. For example, the discrete set of pet categories ["dog","cat","bird"] has no meaningful mathematical ordering, so MATLAB® uses the alphabetical ordering ["bird","cat","dog"]. Ordinal categorical arrays contain categories that have a meaningful mathematical ordering. For example, the discrete set of size categories ["small","medium","large"] has the mathematical ordering small < medium < large.

When you create categorical arrays from string arrays (or cell arrays of character vectors), leading and trailing spaces are removed. For example, if you specify the text [" cat","dog"] as categories, then when you convert them to categories they become ["cat","dog"].

Create Categorical Array from String Array

You can use the categorical function to create a categorical array from a numeric array, logical array, string array, cell array of character vectors, or an existing categorical array.

Create a 1-by-11 string array containing state names from New England.

state = ["MA","ME","CT","VT","ME","NH","VT","MA","NH","CT","RI"]
state = 1x11 string
  Columns 1 through 9

    "MA"    "ME"    "CT"    "VT"    "ME"    "NH"    "VT"    "MA"    "NH"

  Columns 10 through 11

    "CT"    "RI"

Convert the string array, state, to a categorical array that has no mathematical order.

state = categorical(state)
state = 1x11 categorical
  Columns 1 through 9

     MA      ME      CT      VT      ME      NH      VT      MA      NH 

  Columns 10 through 11

     CT      RI 

List the discrete categories in the variable state. There are only six unique states listed in state, which means there are six categories. The categories are listed in alphabetical order.

categories(state)
ans = 6x1 cell
    {'CT'}
    {'MA'}
    {'ME'}
    {'NH'}
    {'RI'}
    {'VT'}

Add New and Missing Elements

Add elements to the original string array. One of the elements is the missing string, displayed as <missing>. Just as NaN can indicate missing values in a numeric array, <missing> indicates missing values in a string array.

state = ["MA","ME","CT","VT","ME","NH","VT","MA","NH","CT","RI"];
state = [string(missing) state];
state(13) = "ME"
state = 1x13 string
  Columns 1 through 9

    <missing>    "MA"    "ME"    "CT"    "VT"    "ME"    "NH"    "VT"    "MA"

  Columns 10 through 13

    "NH"    "CT"    "RI"    "ME"

Convert the string array to a categorical array. The missing string becomes an undefined category, displayed as <undefined>. It indicates an element of the categorical array that does not belong to any category.

state = categorical(state)
state = 1x13 categorical
  Columns 1 through 8

     <undefined>      MA      ME      CT      VT      ME      NH      VT 

  Columns 9 through 13

     MA      NH      CT      RI      ME 

Create Ordinal Categorical Array from String Array

Create a 1-by-8 string array containing the sizes of eight objects.

AllSizes = ["medium","large","small","small","medium",...
            "large","medium","small"];

The string array, AllSizes, has three distinct values: "large", "medium", and "small". When using a string array, there is no convenient way to indicate that small < medium < large.

Convert the string array, AllSizes, to an ordinal categorical array. Use valueset to specify the values small, medium, and large, which define the categories. For an ordinal categorical array, the first category specified is the smallest and the last category is the largest.

valueset = ["small","medium","large"];
sizeOrd = categorical(AllSizes,valueset,'Ordinal',true)
sizeOrd = 1x8 categorical
  Columns 1 through 6

     medium      large      small      small      medium      large 

  Columns 7 through 8

     medium      small 

The order of the values in the categorical array, sizeOrd, remains unchanged.

List the discrete categories in the categorical variable, sizeOrd.

categories(sizeOrd)
ans = 3x1 cell
    {'small' }
    {'medium'}
    {'large' }

The categories are listed in the specified order to match the mathematical ordering small < medium < large.

Create Ordinal Categorical Array by Binning Numeric Data

Create a vector of 100 random numbers between zero and 50.

x = rand(100,1)*50;

Use the discretize function to create a categorical array by binning the values of x. Put all values between zero and 15 in the first bin, all the values between 15 and 35 in the second bin, and all the values between 35 and 50 in the third bin. Each bin includes the left endpoint, but does not include the right endpoint.

catnames = ["small","medium","large"];
binnedData = discretize(x,[0 15 35 50],'categorical',catnames);

binnedData is a 100-by-1 ordinal categorical array with three categories, such that small < medium < large.

Use the summary function to print the number of elements in each category.

summary(binnedData)
     small       30 
     medium      35 
     large       35 

You can make various kinds of charts of the binned data. For example, make a pie chart of binnedData.

pie(binnedData)

See Also

| | |

Related Examples

More About