Get Started with Image Preprocessing and Augmentation for Deep Learning
Data preprocessing consists of a series of deterministic operations that normalize or enhance desired data features. For example, you can normalize data to a fixed range or resize data to the size required by the network input layer. Preprocessing is used for training, validation, and test data.
Preprocessing can occur at two stages in the deep learning workflow.
Commonly, preprocessing occurs as a separate step that you complete before preparing the data to be fed to the network. You load your original data, apply the preprocessing operations, then save the result to disk. The advantage of this approach is that the preprocessing overhead is only required once, then the preprocessed images are readily available as a starting place for all future trials of training a network.
If you load your data into a datastore, then you can also apply preprocessing during training by using the
transform
andcombine
functions. For more information, see Datastores for Deep Learning (Deep Learning Toolbox). The transformed images are not stored in memory. This approach is convenient to avoid writing a second copy of training data to disk if your preprocessing operations are not computationally expensive and do not noticeably impact the speed of training the network.
Common image preprocessing operations include noise removal, edge-preserving smoothing, color space conversion, contrast enhancement, and morphology. For an example that shows how to create and apply these transformations, see Augment Images for Deep Learning Workflows.
Data augmentation consists of randomized operations that are applied to the training data while the network is training.
Augmented image data can simulate variations in the image acquisition. Common types of image augmentation operations are randomized geometric transformations such as rotation and translation, which simulate variations in the camera orientation with respect to the scene. Random cropping simulates variations in the scene composition. Artificial noise simulates distortions introduced during image acquisition or upstream data processing operations. Augmentation increases the effective amount of training data and helps to make the network invariant to common variations and distortion in the data.
To augment training data, start by loading your data into a datastore. For more
information, see Datastores for Deep Learning (Deep Learning Toolbox). Some built-in datastores apply a specific and
limited set of augmentation to data for specific applications. You can also apply your
own set of augmentation operations on data in the datastore by using the transform
and combine
functions. During training, the datastore randomly perturbs the training data for each
epoch, so that each epoch uses a slightly different data set.
The table lists common types of preprocessing and augmentation operations applied to image data for deep learning applications.
Processing Type | Description | Sample Functions | Sample Output |
---|---|---|---|
Resize images | Resize images by a fixed scaling factor or to a target size |
| |
Warp images | Apply random reflection, rotation, scale, shear, and translation to images |
| |
Crop images | Crop an image to a target size from the center or a random position |
| |
Simulate noise | Add random Gaussian, Poisson, salt and pepper, or multiplicative noise |
| |
Simulate blur | Add Gaussian or directional motion blur |
| |
Jitter color | Randomly adjust image hue, saturation, brightness, or contrast |
| |
Jitter intensity | Randomly adjust the brightness, contrast, or gamma correction |
|
|
Related Examples
More About
- Datastores for Deep Learning (Deep Learning Toolbox)
- Select Datastore for File Format or Application