Ebook

Deep Learning and Traditional Machine Learning: Choosing the Right Approach

CHAPTERS

Chapter 2

Your Data


In this chapter, we take a look at three questions:

  • Is your data tabular?
  • If your data is nontabular, what type is it?
  • Is your data labeled?
section

Is Your Data Tabular?

Traditional machine learning techniques were designed for tabular data, which is organized into independent rows and columns. In tabular data, each row represents a discrete piece of information (e.g., an employee’s address).

There are ways to transform tabular data to work with deep learning models, but this may not be the best option to start off with.

EmployeeID AddressLine1 AddressLine2 StartDate
1111 "5 Maple St" "" 01-Jan-2005 00:00:00
7654 "8 Main Ave" "Apt 13" 31-Dec-2014 00:00:00
80 "835 High St" "" 31-May-2000 00:00:00
6424 "42 Oakridge Rd" "Unit 4" 02-Aug-2013 00:00:00

Tabular data can be numeric or categorical (though eventually the categorical data would be converted to numeric).

Images and Video: Deep learning is more common for image and video classification problems. Convolutional neural networks are designed to extract features from images that often result in state-of-the-art classification accuracies. Intuitively, the operations performed by the convolutional filters are able to extract progressively higher-level features from images, making it possible to discern high-level differences such as cat versus dog.

Camera icon

Sensor and Signal: Machine learning has been more common, but deep learning is gaining popularity. Traditional approaches involve extracting features from signals and then using these features with a machine learning algorithm. More recently, signals have been passed directly to LSTM networks, or converted to images (for example, by calculating the signal’s spectrogram), and then that image is used with a convolutional neural network. Wavelets provide yet another way to extract features from signals, with techniques like wavelet scattering showing promising results when combined with machine learning algorithms.

Signal processing icon

Text: Like sensor data, machine learning has been more common though deep learning is growing in use for text data. Text can be converted to a numerical representation via bag-of-words models and normalization techniques and then used with traditional machine learning techniques such as support vector machines or naive Bayes. Newer techniques use text with recurrent or convolutional neural network architectures. In these cases, text is often transformed into a numeric representation using a word-embedding model such as word2vec.

Word bubble icon

Guess the Algorithm

section

Is Your Data Labeled?

To train a supervised model, whether for machine learning or deep learning, you need labeled data.

If You Have No Labeled Data

Focus on machine learning techniques (in particular, unsupervised learning techniques). Labeling for deep learning can mean annotating objects in an image, or each pixel of an image or video, for semantic segmentation. The process of creating these labels, often referred to as “ground-truth labeling,” can be prohibitively time-consuming.

If You Have Some Labeled Data

Try transfer learning and/or labeling apps if you want to use deep learning. Because transfer learning focuses on training a smaller number of parameters in the deep neural network, it requires a smaller amount of labeled data.

Another approach for dealing with small amounts of labeled data is to augment that data. For example, it is common with image data sets to augment the training data with various transformations on the labeled images (such as reflection, rotation, scaling, and translation).

If You Have Lots of Labeled Data

With plenty of labeled data, both machine learning and deep learning are available. The more labeled data you have, the more likely that deep learning techniques will be more accurate.

A screenshot showing green and yellow boxes indicating detection of surrounding vehicles with the label “vehicle”.

The Ground Truth Labeler app in MATLAB.

Handy Labeling Apps

Image Labeler enables you to label ground truth data in a collection of images. Define rectangular region of interest (ROI) labels, pixel ROI labels, and scene labels, and use these labels to interactively label your ground truth data. Write, import, and use your own custom automation algorithm to automatically label ground truth.

Ground Truth Labeler works in the same way as the Image Labeler app but is specifically for automated driving applications.

Audio Labeler enables you to label ground truth audio data at both the region level and file level. Create label definitions for consistent and fast labeling. Visualize the time-domain waveform during playback and specify regions by drawing directly on the time-domain waveform.