Deep Learning Data Formats
Most deep learning networks and functions operate on different dimensions of the input data in different ways.
For example, an LSTM operation iterates over the time dimension of the input data and a batch normalization operation normalizes over the batch dimension of the input data.
Data can have many different types of layouts:
Data can have different numbers of dimensions, for example, you can represent image and video data as 4-D and 5-D arrays, respectively.
Dimensions of data can represent different things, for example image data has two spatial dimensions, one channel dimension, and one batch dimension.
Data can have dimensions in multiple permutations. For example a batch of sequences can be represented as a 3-D array with dimensions corresponding to channels, time steps, and observations. These dimensions can be in any order.
To ensure that the software operates on the correct dimensions, you can provide data layout information in different ways:
Option | Scenario | Usage |
---|---|---|
Provide data with dimensions in a specific permutation | Network with an input layer and the data has the required layout | Pass data directly to network or function. |
Provide data with labeled dimensions | Network with an input layer and the data does not have the required layout | Create a formatted |
Deep learning model defined as a function that uses multiple deep learning operations | ||
Custom layer that uses multiple deep learning operations | Create layer that inherits from
nnet.layer.Formattable . | |
Provide data with additional layout information | Deep learning functions that require layout information and you want to preserve the layout of the data | Specify layout information using the appropriate
input argument. For example, the |
Model functions where dimensions change between functions. For example, when one function must treat the third dimension as time, and a second function must treat the third dimension as spatial. |
To provide input data with labeled dimensions or input data with additional layout information, you can use data formats.
A data format is a string of characters, where each character describes the type of the corresponding data dimension.
The characters are:
"S"
— Spatial"C"
— Channel"B"
— Batch"T"
— Time"U"
— Unspecified
For example, consider an array containing a batch of sequences where the first, second,
and third dimensions correspond to channels, observations, and time steps, respectively. You
can specify that this array has the format "CBT"
(channel, batch,
time).
For dlnetwork
objects with input layers or when you use the
trainnet
function, if your data already has the layout required by
the network, then it is usually easiest to provide input data with the dimensions in the
permutation that the network requires. In this case, you can input your data directly and
not specify layout information. The required format depends on the type of input
layer.
Layer | Format |
---|---|
Feature input layer | "BC" |
2-D image input layer | "SSCB" |
3-D image input layer | "SSSCB" |
Sequence input layer | "TCB" (vector sequences) |
"SCBT" (1-D image sequences) | |
"SSCBT" (2-D image sequences) | |
"SSSCBT" (3-D image sequences) |
When your data has a different layout, providing formatted data or data format information
is usually easier than reshaping and preprocessing your data. For example, if you have
sequence data, where the first, second, and third dimensions correspond to channels,
observations, and time steps, respectively, then it is usually easier to specify the string
"CBT"
instead of permuting and preprocessing the data to have the
layout required by the software.
To create formatted input data, create a dlarray
object
and specify the format using the second argument. For example, for an array
X
that represents a batch of sequences, where the first, second, and
third dimension correspond to channels, observations, and time-steps respectively,
use:
X = dlarray(X,"CBT");
Note
When you create a formatted dlarray
object, the software automatically
permutes the dimensions such that the format has dimensions in this order:
"S"
"C"
"B"
"T"
"U"
For example, if you specify a format of "TCB"
(time, channel, batch),
then the software automatically permutes the dimensions so that it has format
"CBT"
(channel, batch, time).
To provide additional layout information with unformatted data, specify the formats using
the appropriate input argument of the function. For example, to apply the
dlconv
operation to an unformatted dlarray
object
X
, that represents a batch of images, where the first two dimensions
correspond to the spatial dimensions and the third and forth dimensions correspond to the
channel and batch dimensions, respectively,
use:
Y = dlconv(X,weights,bias,DataFormat="SSCB");
To view the layout information of dlarray
objects, use the dims
function.
To view the layout information of layer outputs, use the analyzeNetwork
function.
See Also
dlarray
| dims
| stripdims
| analyzeNetwork