Feature extraction for machine learning and deep learning

Feature extraction refers to the process of transforming raw data into numerical features that can be processed while preserving the information in the original data set. It yields better results than applying machine learning directly to the raw data.  

Feature extraction can be accomplished manually or automatically:

  • Manual feature extraction requires identifying and describing the features that are relevant for a given problem and implementing a way to extract those features. In many situations, having a good understanding of the background or domain can help make informed decisions as to which features could be useful. Over decades of research, engineers and scientists have developed feature extraction methods for images, signals, and text. An example of a simple feature is the mean of a window in a signal.
  • Automated feature extraction uses specialized algorithms or deep networks to extract features automatically from signals or images without the need for human intervention. This technique can be very useful when you want to move quickly from raw data to developing machine learning algorithms. Wavelet scattering is an example of automated feature extraction.

With the ascent of deep learning, feature extraction has been largely replaced by the first layers of deep networks – but mostly for image data. For signal and time-series applications, feature extraction remains the first challenge that requires significant expertise before one can build effective predictive models.

Feature Extraction for Signals and Time Series Data

Feature extraction identifies the most discriminating characteristics in signals, which a machine learning or a deep learning algorithm can more easily consume. Training machine learning or deep learning directly with raw signals often yields poor results because of the high data rate and information redundancy.

Schematic process for applying feature extraction to signals and time series data for a machine learning classifier.

Signal features and time-frequency transformations

When analyzing signals and sensor data, Signal Processing Toolbox™ and Wavelet Toolbox™ provide functions that let you measure common distinctive features of a signal in the time, frequency, and time-frequency domains. You can apply pulse and transition metrics, measure signal-to-noise ratio (SNR), estimate spectral entropy and kurtosis, and compute power spectra.

Time-frequency transformations, such as the short-time Fourier transform (STFT) can be used as signal representations for training data in machine learning and deep learning models. For example, convolutional neural networks (CNNs) are commonly used on image data and can successfully learn from the 2D signal representations returned by time-frequency transformations.

Spectrogram of a signal using short-time Fourier transform. Spectrogram shows variation of frequency content over time.

Other time-frequency transformations can be used, depending on the specific application or the characteristics. For example, the constant-Q transform (CQT) provides a logarithmically spaced frequency distribution; the continuous wavelet transform (CWT) is usually effective at identifying short transients in non-stationary signals.

Features for audio applications and predictive maintenance

Audio Toolbox™ provides a collection of time-frequency transformations including Mel spectrograms, octave and gammatone filter banks, and discrete cosine transform (DCT), that are often used for audio, speech, and acoustics. Other popular feature extraction methods for these types of signals include Mel frequency cepstral coefficients (MFCC), gammatone cepstral coefficients (GTCC), pitch, harmonicity, and different types of audio spectral descriptors. The Audio Feature Extractor tool can help select and extract different audio features from the same source signal while reusing any intermediate computations for efficiency.

For engineers developing applications for condition monitoring and predictive maintenance, the Diagnostic Feature Designer app in Predictive Maintenance Toolbox™ lets you extract, visualize, and rank features to design condition indicators for monitoring machine health.

Diagnostic Feature Designer App lets you design and compare features to discriminate between nominal and faulty systems.

Automated feature extraction methods

New high-level methods have emerged to automatically extract features from signals. Autoencoders, wavelet scattering, and deep neural networks are commonly used to extract features and reduce dimensionality of the data.

Wavelet scattering networks automate the extraction of low-variance features from real-valued time series and image data. This approach produces data representations that minimize differences within a class while preserving discriminability across classes. Wavelet scattering works well when you do not have a lot of data to begin with.

Feature Extraction for Image Data

Feature extraction for image data represents the interesting parts of an image as a compact feature vector. In the past, this was accomplished with specialized feature detection, feature extraction, and feature matching algorithms. Today, deep learning is prevalent in image and video analysis, and has become known for its ability to take raw image data as input, skipping the feature extraction step. Regardless of which approach you take, computer vision applications such as image registration, object detection and classification, and content-based image retrieval, all require effective representation of image features – either implicitly by the first layers of a deep network, or explicitly applying some of the longstanding image feature extraction techniques.

Detecting an object (left) in a cluttered scene (right) using a combination of feature detection, feature extraction, and matching. See example for details.

Feature extraction techniques provided by Computer Vision Toolbox™ and Image Processing Toolbox™ include:

  • Histogram of oriented gradients (HOG)
  • Speeded-up robust features (SURF)
  • Local binary pattern (LBP) features

Histogram of oriented gradients (HOG) feature extraction of image (top). Feature vectors of different sizes are created to represent the image by varying cell size (bottom). See example for details.



See also: feature matching, object detection, image stabilization, image processing and computer vision, face recognition, image recognition, object detection, object recognition, digital image processing, optical flow, RANSAC, pattern recognition, point cloud, deep learning, feature selection

Machine Learning Training Course

In this course you’ll determine how to use unsupervised learning techniques to discover features in large data sets and supervised learning techniques to build predictive models.