Create Categorical Arrays
This example shows how to create categorical arrays from various types of input data and modify their elements. The categorical
data type stores values from a finite set of discrete categories. You can create a categorical array from a numeric array, logical array, string array, or cell array of character vectors. The unique values from the input array become the categories of the categorical array. A categorical array provides efficient storage and convenient manipulation of data while also maintaining meaningful names for the values.
By default, the categories of a categorical array do not have a mathematical ordering. For example, the discrete set of pet categories ["dog" "cat" "bird"]
has no meaningful mathematical ordering, so MATLAB® uses the alphabetical ordering ["bird" "cat" "dog"]
. But you can also create ordinal categorical arrays, in which the categories do have meaningful mathematical orderings. For example, the discrete set of size categories ["small" "medium" "large"]
can have the mathematical ordering of small < medium < large
. Ordinal categorical arrays enable you to make comparisons between their elements.
Create Categorical Array from Input Array
To create a categorical array from an input array, use the categorical
function.
For example, create a string array whose elements are all states from New England. Notice that some of the strings have leading and trailing spaces.
statesNE = ["MA" "ME" " CT" "VT" " ME " "NH" "VT" "MA" "NH" "CT" "RI "]
statesNE = 1×11 string
"MA" "ME" " CT" "VT" " ME " "NH" "VT" "MA" "NH" "CT" "RI "
Convert the string array to a categorical array. When you create categorical arrays from string arrays (or cell arrays of character vectors), leading and trailing spaces are removed.
statesNE = categorical(statesNE)
statesNE = 1×11 categorical
MA ME CT VT ME NH VT MA NH CT RI
List the categories of statesNE
by using the categories
function. Every element of statesNE
belongs to one of these categories. Because statesNE
has six unique states, there are six categories. The categories are listed in alphabetical order because the state abbreviations have no mathematical ordering.
categories(statesNE)
ans = 6×1 cell
{'CT'}
{'MA'}
{'ME'}
{'NH'}
{'RI'}
{'VT'}
Add and Modify Elements
To add one element to a categorical array, you can assign text that represents a category name. For example, add a state to statesNE
.
statesNE(12) = "ME"
statesNE = 1×12 categorical
MA ME CT VT ME NH VT MA NH CT RI ME
To add or modify multiple elements, you must assign a categorical array.
statesNE(1:3) = categorical(["RI" "VT" "MA"])
statesNE = 1×12 categorical
RI VT MA VT ME NH VT MA NH CT RI ME
Add Missing Values as Undefined Elements
You can assign missing values as undefined elements of a categorical array. An undefined categorical value does not belong to any category, similar to NaN
(Not-a-Number) in numeric arrays.
To assign missing values, use the missing
function. For example, modify the first element of the categorical array to be a missing value.
statesNE(1) = missing
statesNE = 1×12 categorical
<undefined> VT MA VT ME NH VT MA NH CT RI ME
Assign two missing values at the end of the categorical array.
statesNE(12:13) = [missing missing]
statesNE = 1×13 categorical
<undefined> VT MA VT ME NH VT MA NH CT RI <undefined> <undefined>
If you convert a string array to a categorical array, then missing strings and empty strings become undefined elements in the categorical array. If you convert a numeric array, then NaN
s become undefined elements. Therefore, assigning missing strings, ""
, ''
, or NaN
s to elements of a categorical array converts them to undefined categorical values.
statesNE(2) = ""
statesNE = 1×13 categorical
<undefined> <undefined> MA VT ME NH VT MA NH CT RI <undefined> <undefined>
Create Ordinal Categorical Array from String Array
In an ordinal categorical array, the order of the categories defines a mathematical order that enables comparisons. Because of this mathematical order, you can compare elements of an ordinal categorical array using relational operators. You cannot compare elements of categorical arrays that are not ordinal.
For example, create a string array that contains the sizes of eight objects.
AllSizes = ["medium" "large" "small" "small" "medium" ... "large" "medium" "small"];
The string array has three unique values: "large"
, "medium"
, and "small"
. A string array has no convenient way to indicate that small < medium < large
.
Convert the string array to an ordinal categorical array. Define the categories as small
, medium
, and large
, in that order. For an ordinal categorical array, the first category specified is the smallest and the last category is the largest.
valueset = ["small" "medium" "large"]; sizeOrd = categorical(AllSizes,valueset,"Ordinal",true)
sizeOrd = 1×8 categorical
medium large small small medium large medium small
The order of the values in the categorical array, sizeOrd
, remains unchanged.
List the discrete categories in sizeOrd
. The order of the categories matches their mathematical ordering small < medium < large
.
categories(sizeOrd)
ans = 3×1 cell
{'small' }
{'medium'}
{'large' }
Create Ordinal Categorical Array by Binning Numeric Data
If you have an array with continuous numeric data, specifying numeric ranges as categories can be useful. In such cases, bin the data using the discretize
function. Assign category names to the bins.
For example, create a vector of 100 random numbers between 0 and 50.
x = rand(100,1)*50
x = 100×1
40.7362
45.2896
6.3493
45.6688
31.6180
4.8770
13.9249
27.3441
47.8753
48.2444
7.8807
48.5296
47.8583
24.2688
40.0140
⋮
Use discretize
to create a categorical array by binning the values of x
. Put all the values between 0 and 15 in the first bin, all the values between 15 and 35 in the second bin, and all the values between 35 and 50 in the third bin. Each bin includes the left endpoint but does not include the right endpoint, except the last bin.
catnames = ["small" "medium" "large"]; binnedData = discretize(x,[0 15 35 50],"categorical",catnames)
binnedData = 100×1 categorical
large
large
small
large
medium
small
small
medium
large
large
small
large
large
medium
large
small
medium
large
large
large
medium
small
large
large
medium
large
large
medium
medium
small
⋮
binnedData
is an ordinal categorical array with three categories, such that small < medium < large
.
To display the number of elements in each category, use the summary
function.
summary(binnedData)
binnedData: 100×1 ordinal categorical small 30 medium 35 large 35 <undefined> 0 Additional statistics: Min small Median medium Max large
You can make various kinds of charts of the binned data. For example, make a pie chart of binnedData
.
pie(binnedData)
Preallocate Categorical Array
You can preallocate a categorical array of any size by creating an array of NaN
s and converting it to a categorical array. After you preallocate the array, you can initialize its categories by adding the category names to the array.
For example, create a 2-by-4 array of NaN
s.
A = NaN(2,4)
A = 2×4
NaN NaN NaN NaN
NaN NaN NaN NaN
Then convert the array of NaN
s to a categorical array of undefined categorical values.
A = categorical(A)
A = 2×4 categorical
<undefined> <undefined> <undefined> <undefined>
<undefined> <undefined> <undefined> <undefined>
At this point, A
has no categories.
categories(A)
ans = 0×0 empty cell array
Add small
, medium
, and large
categories to A
by using the addcats
function.
A = addcats(A,["small" "medium" "large"])
A = 2×4 categorical
<undefined> <undefined> <undefined> <undefined>
<undefined> <undefined> <undefined> <undefined>
While the elements of A
are still undefined values, the categories of A
are defined.
categories(A)
ans = 3×1 cell
{'small' }
{'medium'}
{'large' }
Now that A
has categories, you can assign defined categorical values as elements of A
.
A(1) = "medium"; A(8) = "small"; A(3:5) = "large"
A = 2×4 categorical
medium large large <undefined>
<undefined> large <undefined> small
See Also
categorical
| categories
| discretize
| summary
| addcats
| missing