Main Content

crossentropy

Cross-entropy loss for classification tasks

Description

The cross-entropy operation computes the cross-entropy loss between network predictions and binary or one-hot encoded targets for single-label and multi-label classification tasks.

The crossentropy function computes the cross-entropy loss between predictions and targets represented as dlarray data. Using dlarray objects makes working with high dimensional data easier by allowing you to label the dimensions. For example, you can label which dimensions correspond to spatial, time, channel, and batch dimensions using the "S", "T", "C", and "B" labels, respectively. For unspecified and other dimensions, use the "U" label. For dlarray object functions that operate over particular dimensions, you can specify the dimension labels by formatting the dlarray object directly, or by using the DataFormat option.

Note

To train with cross-entropy loss using the trainnet function, set the loss function to "crossentropy".

loss = crossentropy(Y,targets) returns the categorical cross-entropy loss between the formatted dlarray object Y containing the predictions and the target values targets for single-label classification tasks. The output loss is an unformatted dlarray scalar.

For unformatted input data, use the DataFormat argument.

example

loss = crossentropy(Y,targets,weights) applies weights to the calculated loss values. Use this syntax to weight the contributions of classes, observations, regions, or individual elements of the input to the calculated loss values.

loss = crossentropy(___,Name=Value) specifies options using one or more name-value pair arguments in addition to the input arguments in previous syntaxes. For example, ClassificationMode="multilabel" computes the cross-entropy loss for a multi-label classification task.

Examples

collapse all

Create an array of prediction scores for 12 observations over 10 classes.

numClasses = 10;
numObservations = 12;

Y = rand(numClasses,numObservations);
Y = dlarray(Y,"CB");
Y = softmax(Y);

View the size and format of the prediction scores.

size(Y)
ans = 1×2

    10    12

dims(Y)
ans = 
'CB'

Create an array of targets encoded as one-hot vectors.

labels = randi(numClasses,[1 numObservations]);
targets = onehotencode(labels,1,ClassNames=1:numClasses);

View the size of the targets.

size(targets)
ans = 1×2

    10    12

Compute the cross-entropy loss between the predictions and the targets.

loss = crossentropy(Y,targets)
loss = 
  1×1 dlarray

    2.3343

Create an array of prediction scores for 12 observations over 10 classes.

numClasses = 10;
numObservations = 12;
Y = rand(numClasses,numObservations);
Y = dlarray(Y,"CB");

View the size and format of the prediction scores.

size(Y)
ans = 1×2

    10    12

dims(Y)
ans = 
'CB'

Create a random array of targets encoded as a numeric array of zeros and ones. Each observation can have multiple classes.

targets = rand(numClasses,numObservations) > 0.75;
targets = single(targets);

View the size of the targets.

size(targets)
ans = 1×2

    10    12

Compute the cross-entropy loss between the predictions and the targets. To specify cross-entropy loss for multi-label classification, set the ClassificationMode argument to "multilabel".

loss = crossentropy(Y,targets,ClassificationMode="multilabel")
loss = 
  1×1 single dlarray

    9.8853

Create an array of prediction scores for 12 observations over 10 classes.

numClasses = 10;
numObservations = 12;

Y = rand(numClasses,numObservations);
Y = dlarray(Y,"CB");
Y = softmax(Y);

View the size and format of the prediction scores.

size(Y)
ans = 1×2

    10    12

dims(Y)
ans = 
'CB'

Create an array of targets encoded as one-hot vectors.

labels = randi(numClasses,[1 numObservations]);
targets = onehotencode(labels,1,ClassNames=1:numClasses);

View the size of the targets.

size(targets)
ans = 1×2

    10    12

Compute the weighted cross-entropy loss between the predictions and the targets using a vector class weights. Specify a weights format of "UC" (unspecified, channel) using the WeightsFormat argument.

weights = rand(1,numClasses);
loss = crossentropy(Y,targets,weights,WeightsFormat="UC")
loss = 
  1×1 dlarray

    1.1261

Input Arguments

collapse all

Predictions, specified as a formatted or unformatted dlarray object, or a numeric array. When Y is not a formatted dlarray, you must specify the dimension format using the DataFormat argument.

If Y is a numeric array, targets must be a dlarray object.

Target classification labels, specified as a formatted or unformatted dlarray or a numeric array.

Specify the targets as an array containing one-hot encoded labels with the same size and format as Y. For example, if Y is a numObservations-by-numClasses array, then targets(n,i) = 1 if observation n belongs to class i targets(n,i) = 0 otherwise.

If targets is a formatted dlarray, then its format must be the same as the format of Y, or the same as DataFormat if Y is unformatted.

If targets is an unformatted dlarray or a numeric array, then the function applies the format of Y or the value of DataFormat to targets.

Tip

Formatted dlarray objects automatically permute the dimensions of the underlying data to have the order "S" (spatial), "C" (channel), "B" (batch), "T" (time), then "U" (unspecified). To ensure that the dimensions of Y and targets are consistent, when Y is a formatted dlarray, also specify targets as a formatted dlarray.

Weights, specified as a dlarray object or a numeric array.

To specify class weights, specify a vector with a "C" (channel) dimension with size matching the "C" (channel) dimension of Y and a singleton "U" (unspecified) dimension. Specify the dimensions of the class weights by using a formatted dlarray object or by using the WeightsFormat argument.

To specify observation weights, specify a vector with a "B" (batch) dimension with size matching the "B" (batch) dimension of Y. Specify the "B" (batch) dimension of the class weights by using a formatted dlarray object or by using the WeightsFormat argument.

To specify weights for each element of the input independently, specify the weights as an array of the same size as Y. In this case, if weights is not a formatted dlarray object, then the function uses the same format as Y. Alternatively, specify the weights format using the WeightsFormat argument.

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: ClassificationMode="multilabel",DataFormat="CB" evaluates the cross-entropy loss for multi-label classification tasks and specifies the dimension order of the input data as "CB"

Type of classification task, specified as one of these values:

  • "single-label" — Each observation is exclusively assigned one class label (single-label classification). The function computes the loss between the target value for the single category specified by targets and the corresponding prediction in Y, averaged over the number of observations.

  • "multilabel"— Each observation can be assigned more than one independent class label (multilabel classification). The function computes the sum of the loss between each category specified by targets and the predictions in Y for those categories, averaged over the number of observations. Cross-entropy loss for this type of classification task is also known as binary cross-entropy loss.

Note

To select the classification mode for binary classification, you must consider the final layer of the network:

  • If the final layer has an output size of one, such as with a sigmoid layer, use "multilabel".

  • If the final layer has an output size of two, such as with a softmax layer, use "single-label".

Mask indicating which elements to include for loss computation, specified as a dlarray object, a logical array, or a numeric array with the same size as Y.

The function includes and excludes elements of the input data for loss computation when the corresponding value in the mask is 1 and 0, respectively.

If Mask is a formatted dlarray object, then its format must match that of Y. If Mask is not a formatted dlarray object, then the function uses the same format as Y.

If you specify the DataFormat argument, then the function also uses the specified format for the mask.

The size of each dimension of Mask must match the size of the corresponding dimension in Y. The default value is a logical array of ones.

Tip

Formatted dlarray objects automatically permute the dimensions of the underlying data to have this order: "S" (spatial), "C" (channel), "B" (batch), "T" (time), and "U" (unspecified). For example, dlarray objects automatically permute the dimensions of data with format "TSCSBS" to have format "SSSCBT".

To ensure that the dimensions of Y and the mask are consistent, when Y is a formatted dlarray, also specify the mask as a formatted dlarray.

Loss value array reduction mode, specified as "sum" or "none".

If the Reduction argument is "sum", then the function sums all elements in the array of loss values. In this case, the output loss is a scalar.

If the Reduction argument is "none", then the function does not reduce the array of loss values. In this case, the output loss is an unformatted dlarray object of the same size as Y.

Divisor for normalizing the reduced loss when Reduction is "sum", specified as one of the following:

  • "batch-size" — Normalize the loss by dividing it by the number of observations in Y.

  • "all-elements" — Normalize the loss by dividing it by the number of elements of Y.

  • "mask-included" — Normalize the loss by dividing the loss values by the product of the number of observations and the number of included elements specified by the mask for each observation independently. To use this option, you must specify a mask using the Mask option.

  • "none" — Do not normalize the loss.

Description of the data dimensions, specified as a character vector or string scalar.

A data format is a string of characters, where each character describes the type of the corresponding data dimension.

The characters are:

  • "S" — Spatial

  • "C" — Channel

  • "B" — Batch

  • "T" — Time

  • "U" — Unspecified

For example, consider an array that represents a batch of sequences where the first, second, and third dimensions correspond to channels, observations, and time steps, respectively. You can describe the data as having the format "CBT" (channel, batch, time).

You can specify multiple dimensions labeled "S" or "U". You can use the labels "C", "B", and "T" once each, at most. The software ignores singleton trailing "U" dimensions after the second dimension.

If the input data is not a formatted dlarray object, then you must specify the DataFormat option.

For more information, see Deep Learning Data Formats.

Data Types: char | string

Description of the dimensions of the weights, specified as a character vector or string scalar.

A data format is a string of characters, where each character describes the type of the corresponding data dimension.

The characters are:

  • "S" — Spatial

  • "C" — Channel

  • "B" — Batch

  • "T" — Time

  • "U" — Unspecified

For example, consider an array that represents a batch of sequences where the first, second, and third dimensions correspond to channels, observations, and time steps, respectively. You can describe the data as having the format "CBT" (channel, batch, time).

You can specify multiple dimensions labeled "S" or "U". You can use the labels "C", "B", and "T" once each, at most. The software ignores singleton trailing "U" dimensions after the second dimension.

If weights is a numeric vector and Y has two or more nonsingleton dimensions, then you must specify the WeightsFormat option.

If weights is not a vector, or weights and Y are both vectors, then the default value of WeightsFormat is the same as the format of Y.

For more information, see Deep Learning Data Formats.

Data Types: char | string

Output Arguments

collapse all

Cross-entropy loss, returned as an unformatted dlarray. The output loss is an unformatted dlarray with the same underlying data type as the input Y.

The size of loss depends on the Reduction argument.

Algorithms

collapse all

Extended Capabilities

expand all

Version History

Introduced in R2019b

expand all