Set Up Parameters and Train Convolutional Neural Network
After you define the layers of your neural network as described in Specify Layers of Convolutional Neural Network, the next step is to
set up the training options for the network. Use the trainingOptions
function to
define the global training parameters. To train a network, use the object
returned by trainingOptions
as an input argument to the
trainNetwork
function. For
example:
options = trainingOptions('adam'); trainedNet = trainNetwork(data,layers,options);
Layers with learnable parameters also have options for adjusting the learning parameters. For more information, see Set Up Parameters in Convolutional and Fully Connected Layers.
Specify Solver and Maximum Number of Epochs
trainNetwork
can use different variants of stochastic
gradient descent to train the network. Specify the optimization
algorithm by using the solverName
argument of
trainingOptions
. To minimize the loss,
these algorithms update the network parameters by taking small steps
in the direction of the negative gradient of the loss function.
The 'adam'
(derived from adaptive moment
estimation) solver is often a good optimizer to
try first. You can also try the 'rmsprop'
(root
mean square propagation) and 'sgdm'
(stochastic
gradient descent with momentum) optimizers and see if this improves
training. Different solvers work better for different problems. For
more information about the different solvers, see Stochastic Gradient Descent.
The solvers update the parameters using a subset of the data each step.
This subset is called a mini-batch. You can
specify the size of the mini-batch by using the
'MiniBatchSize'
name-value pair
argument of trainingOptions
. Each parameter
update is called an iteration. A full pass
through the entire data set is called an epoch.
You can specify the maximum number of epochs to train for by using the
'MaxEpochs'
name-value pair argument of
trainingOptions
. The default value is
30, but you can choose a smaller number of epochs for small networks
or for fine-tuning and transfer learning, where most of the learning
is already done.
By default, the software shuffles the data once before training. You can
change this setting by using the 'Shuffle'
name-value pair argument.
Specify and Modify Learning Rate
You can specify the global learning rate by using the
'InitialLearnRate'
name-value pair
argument of trainingOptions
. By default,
trainNetwork
uses this value throughout
the entire training process. You can choose to modify the learning
rate every certain number of epochs by multiplying the learning rate
with a factor. Instead of using a small, fixed learning rate
throughout the training process, you can choose a larger learning rate
in the beginning of training and gradually reduce this value during
optimization. Doing so can shorten the training time, while enabling
smaller steps towards the minimum of the loss as training
progresses.
Tip
If the mini-batch loss during training ever becomes
NaN
, then the learning rate is
likely too high. Try reducing the learning rate, for example
by a factor of 3, and restarting network training.
To gradually reduce the learning rate, use the
'LearnRateSchedule','piecewise'
name-value pair argument. Once you choose this option,
trainNetwork
multiplies the initial
learning rate by a factor of 0.1 every 10 epochs. You can specify the
factor by which to reduce the initial learning rate and the number of
epochs by using the 'LearnRateDropFactor'
and
'LearnRateDropPeriod'
name-value pair
arguments, respectively.
Specify Validation Data
To perform network validation during training, specify validation data
using the 'ValidationData'
name-value pair argument
of trainingOptions
. By default,
trainNetwork
validates the network
every 50 iterations by predicting the response of the validation data
and calculating the validation loss and accuracy (root mean squared
error for regression networks). You can change the validation
frequency using the 'ValidationFrequency'
name-value pair argument. If your network has layers that behave
differently during prediction than during training (for example,
dropout layers), then the validation accuracy can be higher than the
training (mini-batch) accuracy. You can also use the validation data
to stop training automatically when the validation loss stops
decreasing. To turn on automatic validation stopping, use the
'ValidationPatience'
name-value pair
argument.
Performing validation at regular intervals during training helps you to determine if your network is overfitting to the training data. A common problem is that the network simply "memorizes" the training data, rather than learning general features that enable the network to make accurate predictions for new data. To check if your network is overfitting, compare the training loss and accuracy to the corresponding validation metrics. If the training loss is significantly lower than the validation loss, or the training accuracy is significantly higher than the validation accuracy, then your network is overfitting.
To reduce overfitting, you can try adding data augmentation. Use an
augmentedImageDatastore
to perform random
transformations on your input images. This helps to prevent the
network from memorizing the exact position and orientation of objects.
You can also try increasing the L2
regularization using the 'L2Regularization'
name-value pair argument, using batch normalization layers after
convolutional layers, and adding dropout layers.
Select Hardware Resource
If a GPU is available, then trainNetwork
uses it for
training, by default. Otherwise, trainNetwork
uses a CPU. Alternatively, you can specify the execution environment
you want using the 'ExecutionEnvironment'
name-value pair argument. You can specify a single CPU
('cpu'
), a single GPU
('gpu'
), multiple GPUs
('multi-gpu'
), or a local parallel pool
or compute cluster ('parallel'
). All options other
than 'cpu'
require Parallel Computing Toolbox™. Training on a GPU requires a supported GPU device. For
information on supported devices, see GPU Computing Requirements (Parallel Computing Toolbox).
Save Checkpoint Networks and Resume Training
Deep Learning Toolbox™ enables you to save neural networks as .mat files during training. This
periodic saving is especially useful when you have a large neural network or a large data
set, and training takes a long time. If the training is interrupted for some reason, you can
resume training from the last saved checkpoint neural network. If you want
trainNetwork
to save checkpoint neural networks, then you must
specify the name of the path by using the CheckpointPath
option of
trainingOptions
. If the path that you specify does not exist, then
trainingOptions
returns an error.
trainNetwork
automatically assigns unique names to checkpoint neural
network files. In the example name,
net_checkpoint__351__2018_04_12__18_09_52.mat
, 351 is the iteration
number, 2018_04_12
is the date, and 18_09_52
is the
time at which trainNetwork
saves the neural network. You can load a
checkpoint neural network file by double-clicking it or using the load command at the
command line. For example:
load net_checkpoint__351__2018_04_12__18_09_52.mat
trainNetwork
. For example:trainNetwork(XTrain,TTrain,net.Layers,options)
Set Up Parameters in Convolutional and Fully Connected Layers
You can set the learning parameters to be different from the global values
specified by trainingOptions
in layers with
learnable parameters, such as convolutional and fully connected
layers. For example, to adjust the learning rate for the biases or
weights, you can specify a value for the
BiasLearnRateFactor
or
WeightLearnRateFactor
properties of the
layer, respectively. The trainNetwork
function
multiplies the learning rate that you specify by using
trainingOptions
with these factors.
Similarly, you can also specify the L2
regularization factors for the weights and biases in these layers by
specifying the BiasL2Factor
and
WeightL2Factor
properties,
respectively. trainNetwork
then multiplies the
L2 regularization factors that you
specify by using trainingOptions
with these
factors.
Initialize Weights in Convolutional and Fully Connected Layers
The layer weights are learnable parameters. You can specify the
initial value for the weights directly using the Weights
property of the layer. When you train a network, if the Weights
property of the layer is nonempty, then trainNetwork
uses the Weights
property as the
initial value. If the Weights
property is empty, then
trainNetwork
uses the initializer specified by the WeightsInitializer
property of the layer.
Train Your Network
After you specify the layers of your network and the training parameters,
you can train the network using the training data. The data, layers,
and training options are all input arguments of the
trainNetwork
function, as in this
example.
layers = [imageInputLayer([28 28 1]) convolution2dLayer(5,20) reluLayer maxPooling2dLayer(2,'Stride',2) fullyConnectedLayer(10) softmaxLayer classificationLayer]; options = trainingOptions('adam'); convnet = trainNetwork(data,layers,options);
Training data can be an array, a table, or an
ImageDatastore
object. For more
information, see the trainNetwork
function reference page.
See Also
trainingOptions
| trainNetwork
| Convolution2dLayer
| FullyConnectedLayer