Main Content

crossentropy

Neural network performance

Description

example

perf = crossentropy(net,targets,outputs,perfWeights) calculates a network performance given targets and outputs, with optional performance weights and other parameters. The function returns a result that heavily penalizes outputs that are extremely inaccurate (y near 1-t), with very little penalty for fairly correct classifications (y near t). Minimizing cross-entropy leads to good classifiers.

The cross-entropy for each pair of output-target elements is calculated as: ce = -t .* log(y).

The aggregate cross-entropy performance is the mean of the individual values: perf = sum(ce(:))/numel(ce).

Special case (N = 1): If an output consists of only one element, then the outputs and targets are interpreted as binary encoding. That is, there are two classes with targets of 0 and 1, whereas in 1-of-N encoding, there are two or more classes. The binary cross-entropy expression is: ce = -t .* log(y) - (1-t) .* log(1-y) .

perf = crossentropy(___,Name,Value) supports customization according to the specified name-value pair arguments.

Examples

collapse all

This example shows how to design a classification network with cross-entropy and 0.1 regularization, then calculate performance on the whole dataset.

[x,t] = iris_dataset;
net = patternnet(10);
net.performParam.regularization = 0.1;
net = train(net,x,t);

y = net(x);
perf = crossentropy(net,t,y,{1},'regularization',0.1)
perf = 0.0267

This example shows how to set up the network to use the crossentropy during training.

net = feedforwardnet(10);
net.performFcn = 'crossentropy';
net.performParam.regularization = 0.1;
net.performParam.normalization = 'none';

Input Arguments

collapse all

Neural network, specified as a network object.

Example: net = feedforwardnet(10);

Neural network target values, specified as a matrix or cell array of numeric values. Network target values define the desired outputs, and can be specified as an N-by-Q matrix of Q N-element vectors, or an M-by-TS cell array where each element is an Ni-by-Q matrix.  In each of these cases, N or Ni indicates a vector length, Q the number of samples, M the number of signals for neural networks with multiple outputs, and TS is the number of time steps for time series data.  targets must have the same dimensions as outputs.

The target matrix columns consist of all zeros and a single 1 in the position of the class being represented by that column vector. When N = 1, the software uses cross entropy for binary encoding, otherwise it uses cross entropy for 1-of-N encoding. NaN values are allowed to indicate unknown or don't-care output values.  The performance of NaN target values is ignored.

Data Types: double | cell

Neural network output values, specified as a matrix or cell array of numeric values. Network output values can be specified as an N-by-Q matrix of Q N-element vectors, or an M-by-TS cell array where each element is an Ni-by-Q matrix. In each of these cases, N or Ni indicates a vector length, Q the number of samples, M the number of signals for neural networks with multiple outputs and TS is the number of time steps for time series data. outputs must have the same dimensions as targets.

Outputs can include NaN to indicate unknown output values, presumably produced as a result of NaN input values (also representing unknown or don't-care values). The performance of NaN output values is ignored.

General case (N>=2): The columns of the output matrix represent estimates of class membership, and should sum to 1. You can use the softmax transfer function to produce such output values. Use patternnet to create networks that are already set up to use cross-entropy performance with a softmax output layer.

Data Types: double | cell

Performance weights, specified as a vector or cell array of numeric values. Performance weights are an optional argument defining the importance of each performance value, associated with each target value, using values between 0 and 1. Performance values of 0 indicate targets to ignore, values of 1 indicate targets to be treated with normal importance. Values between 0 and 1 allow targets to be treated with relative importance.

Performance weights have many uses. They are helpful for classification problems, to indicate which classifications (or misclassifications) have relatively greater benefits (or costs). They can be useful in time series problems where obtaining a correct output on some time steps, such as the last time step, is more important than others. Performance weights can also be used to encourage a neural network to best fit samples whose targets are known most accurately, while giving less importance to targets which are known to be less accurate.

perfWeights can have the same dimensions as targets and outputs. Alternately, each dimension of the performance weights can either match the dimension of targets and outputs, or be 1. For instance, if targets is an N-by-Q matrix defining Q samples of N-element vectors, the performance weights can be N-by-Q indicating a different importance for each target value, or N-by-1 defining a different importance for each row of the targets, or 1-by-Q indicating a different importance for each sample, or be the scalar 1 (i.e. 1-by-1) indicating the same importance for all target values.

Similarly, if outputs and targets are cell arrays of matrices, the perfWeights can be a cell array of the same size, a row cell array (indicating the relative importance of each time step), a column cell array (indicating the relative importance of each neural network output), or a cell array of a single matrix or just the matrix (both cases indicating that all matrices have the same importance values).

For any problem, a perfWeights value of {1} (the default) or the scalar 1 indicates all performances have equal importance.

Data Types: double | cell

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: 'normalization','standard' specifies the inputs and targets to be normalized to the range (-1,+1).

Proportion of performance attributed to weight/bias values, specified as a double between 0 (the default) and 1. A larger value penalizes the network for large weights, and the more likely the network function will avoid overfitting.

Example: 'regularization',0

Data Types: single | double

Normalization mode for outputs, targets, and errors, specified as 'none', 'standard', or 'percent'. 'none' performs no normalization. 'standard' results in outputs and targets being normalized to (-1, +1), and therefore errors in the range (-2, +2).'percent' normalizes outputs and targets to (-0.5, 0.5) and errors to (-1, 1).

Example: 'normalization','standard'

Data Types: char

Output Arguments

collapse all

Network performance, returned as a double in the range (0,1).

Version History

Introduced in R2013b

See Also

| | | | |