Produits
Solutions
Apprendre
Formation

Accueil

Autoformations en ligne

Sessions de formation

Programme de certification MathWorks

Événements

Événements MATLAB et Simulink

Présentations des événements

Vidéos à la demande

Ressources pour apprendre

Enseigner avec MATLAB

Recherche avec MATLAB

Programmes pour les étudiants

Livres

Contacts locaux

Visitez le centre d'aide pour consulter la documentation des produits, participer aux forums communautaires, vérifier les notes de version, et bien plus encore.

Vidéos MATLAB et Simulink

Découvrez nos produits, regardez des démonstrations et explorez les nouveautés.

Explorez les vidéos
Société
Société

La société

Mission et valeurs

Mission sociale

Décarboner MathWorks

Témoignages clients

Offres d'emploi

Accueil

Explorer nos opportunités

Équipes et rôles

Contacts locaux

Décarboner MathWorks

Découvrez comment MathWorks protège et restaure les ressources de la Terre.

En savoir plus
Centre d'aide
Obtenir MATLAB MATLAB
Connectez-vous
Obtenir MATLAB MATLAB Contacts locaux
Recherche

Description

Feature Engineering | Applied Machine Learning, Part 1

From the series: Applied Machine Learning

Explore how to perform feature engineering, a technique for transforming raw data into features that are suitable for a machine learning algorithm.

Feature engineering starts with your best guess about what features might influence the action you’re trying to predict. After that, it’s an iterative process where you create new features, add them to your model, and see if your results have improved.  

This video provides a high-level overview of the topic, and it uses several examples to illustrate basic principles behind feature engineering and established ways for extracting features from signals, text, and images.

Published: 18 Jan 2019

Full Transcript

Machine learning algorithms don’t always work so well on raw data. Part of our jobs as engineers and scientists is to transform the raw data to make the behavior of the system more obvious to the machine learning algorithm. This is called feature engineering.

Feature engineering starts with your best guess about what features might influence the thing you’re trying to predict. After that, it’s an iterative process where you create new features, add them to your model, and see if the result improved.

Let’s take a simple example where we want to predict whether a flight is going to be delayed or not.

In the raw data, we have information such as the month of the flight, the destination, and the day of the week.

If I fit a decision tree just to this data, I’ll get an accuracy of 70%. What else could we calculate from this data that might help improve our predictions?

Well, how about the number of flights per day? There are more flights on some days than others, which may mean they’re more likely to be delayed.

I already have this feature from my dataset in the app, so let’s add it and retrain the model. You can see the model accuracy improved to 74%. Not bad for just adding a feature.

Feature engineering is often referred to as a creative process, more of an art than a science. There’s no correct way to do it, but if you have domain expertise and a solid understanding of the data, you’ll be in a good position to perform feature engineering. As you’ll see later, techniques used for feature engineering are things you may already be familiar with, but you might not have thought about them in this context before.

Let’s see another example that’s a bit more interesting. Here, we’re trying to predict whether a heart is behaving normally or abnormally by classifying the sounds it makes.

The sounds come in the form of audio signals. Rather than training on the raw signals, we can engineer features and then use those values to train a model.

Recently, deep learning approaches are becoming popular, as they require less manual feature engineering. Instead, the features are learned as part of the training process. While this has often shown very promising results, deep learning models require more data, take longer to train, and the resulting model is typically less interpretable than if you were to manually engineer the features.

The features we used to classify heart sounds come from the signal processing field. We calculated things such as skewness, kurtosis, and dominant frequencies. These calculations extract characteristics that make it easier for the model to distinguish between an abnormal heart sound and a normal one.

So what other features do people use? Many use traditional statistical techniques like mean, median, and mode, as well as basic things like counting the number of times something happens.

Lots of data has a timestamp associated with it. There are a number of features you can extract from a timestamp that might improve model performance. What was the month, or day of week, or hour of the day? Was it a weekend or a holiday? Such features play a big role in determining human behavior, for example, if you were trying to predict how much electricity people use.

Another class of feature engineering has to do with text data. Counting the number of times certain words occur in a text is one technique, which is often combined with normalization techniques like term-frequency-inverse-document-frequency. Word2vec, in which words are converted to a high-dimensional vector representation, is another popular feature engineering technique for text.

The last class of techniques I’ll talk about has to do with images. Images contain lots of information, so you often need to extract the important parts. Traditional techniques calculate the histogram of colors or apply transforms such as the Haar wavelet. More recently, researchers have started using convolutional neural networks to extract features from images.

Depending on the type of data you’re working with, it may make sense to use a variety of the techniques we’ve discussed. Feature engineering is a trial and error process. The only way to know if a feature is any good is to add it to a model and check if it improves the results.

To wrap up, that was a brief explanation of feature engineering. We have many more examples on our site, so check them out.

Related Resources

Related Products

Statistics and Machine Learning Toolbox

Learn More

Feature Extraction for Signals

Feature Extraction for Images

Text Feature Extraction

Statistical Feature Extraction

Featured Product

Statistics and Machine Learning Toolbox

Up Next:

Use ROC curves to assess classification models. Walk through several examples that illustrate what ROC curves are and why you’d use them. — ROC Curves

View full series (4 Videos)

Related Videos:

This session explores the fundamentals of machine learning using MATLAB . Rory reviews typical workflows for both supervised (classification and regression) and unsupervised learning, through examples. — Machine Learning for Predictive Modelling (Highlights)

This session explores the fundamentals of machine learning using MATLAB . Rory reviews typical workflows for both supervised (classification and regression) and unsupervised learning, through examples. — Machine Learning for Predictive Modelling

Machine Learning may seem difficult to understand and even harder to use but in practice, incorporating machine learning in your workflow can be as easy as a couple of clicks. — The Basics | Machine Learning Made Easy

Campus-wide access to MATLAB and Simulink enables Mondragon University to develop an applied teaching methodology to build students’ practical engineering skills. — Mondragon Unibertsitatea Students Build Practical...

Learn how machine learning tools in MATLAB can be used to solve regression, clustering, and classification problems. — Machine Learning with MATLAB Overview

View more related videos