# Examples of Data Analytics for Predictive Maintenance (2)

## Method2: Gaussian mixture model

Though Hotelling’s T-square method is applicable for many multi-dimensional data sets, this method has a fundamental assumption that the data follow a unimodal distribution. So, when the data follows multimodal distribution, other PdM techniques should be applied. Gaussian mixture model is a probabilistic model that assumes all the data points are generated from a mixture of a multiple Gaussian distribution. Thus, by estimating mean and variance values for each Gaussian distribution from observed data, x percent outliers can be detected. To calculate the maximum likelihood estimation of Gaussian mixture model, Expectation-Maximization (EM) algorithm is commonly used.

## Load the Pre-processed Data

This data is generated by running Demo0_PreProcessing.m.

```load('Preprocessed_FD001.mat');
```

## Constructiong Gaussian Mixture Model

To visualize results, I would like to construct GMM using only the first two PCA components.

First, the group of data points labeled 'long' is assumed as a normal condition.

Next, the thresholds to detect 5%, 1% and 0.1% outliers are calculated by using GMM.

Finally, I would like to create a plot of the first two components with these thresholds.

```% Calculate the thresholds to detect 5%, 1% and 0.1% outliers

% Optimum number of clusters is calculated based on Akaike's Information
% Criterion (AIC).
idx = (dataTrainZ.Label == 'long');
mdlGM = cell(1,4);
AIC = zeros(1,4);

% Calculate GMM by changing the total number of clusters to 1~4
rng('default');
options = statset('MaxIter',500);
for kk = 1:4
mdlGM{kk} = fitgmdist(score(idx,:), kk, 'Options', options);
AIC(kk) = mdlGM{kk}.AIC;
end

% Find an optimum number of clusters based on AIC
[minAIC, numCluster] = min(AIC);

% Construct the GMM
rng('default');
options = statset('MaxIter',500);
bestMdlGM = fitgmdist(score(idx,1:2), numCluster, 'Options', options);
d = 0.1;
[x1Grid, x2Grid] = meshgrid(-8:d:8, -8:d:8);
aGrid = pdf(bestMdlGM, [x1Grid(:) x2Grid(:)]);
aGrid = reshape(aGrid, size(x1Grid,1), size(x2Grid,2));

sorted = sort(aGrid(:),'descend');
accum = cumsum(sorted*d*d);

% Calculate the thresholds to detect 5%, 1% and 0.1% outliers
th = [0.001, 0.01, 0.05];
C = cell(3,1);

for kk = 1:3
idx = find(accum > (1-th(kk)), 1);
C{kk} = contourc((-8:d:8), (-8:d:8), aGrid, [sorted(idx) sorted(idx)]);
end

% Set the color for 5%, 1% and 0.1% outlier regions
col5per = [0.75 0.95 1];
col1per = [0.75 1 0.75];
col01per = [1 1 0.75];
colAnomaly = [1 0.85 0.85];

% Plot the result
figure
plotContourmatrix(C{1}, col01per);
plotContourmatrix(C{2}, col1per);
plotContourmatrix(C{3}, col5per);
hold on;
s1 = gscatter(score(:,1),...
score(:,2),...
dataTrainZ.Label(:));
idx = dataTrainZ.Time == 0;
s2 = plot(score(idx,1),...
score(idx,2),'rp','MarkerSize',10,'MarkerFaceColor','w');
legend([s1; s2],{'urgent','short','medium','long','failure occurrence'},...
'Color',    [1 1 1],...
'Location', 'northwest',...
'FontSize', 12);
ax = gca;
ax.Color = colAnomaly;
ax.Box = 'on';
xlabel('1st Principal Component','FontSize',12);
ylabel('2nd Principal Component','FontSize',12);
title({'Training Data and Calculated Outlier Detection Threshold';...
'Blue: Normal, Green: 5% outlier, Yellow: 1% outlier, Red: 0.1% outlier'},'FontSize',12)
```

## Applying to the Test Set

In this process, these thresholds (which can detect 5%, 1% and 0.1% outliers) are applied to the test data set. The result is shown by creating a plot of the first two components with test set and these thresholds. In this plot, conditions of each engine are shown by changing colors for each data point. In real situaiton, only the positions of each data point and the thresholds for outlier detection are available.

As shown in this plot, the thresholds for outlier detectio calcuated by the GMM would be another helpful tool to detect anomalies and foresee machine failure during normal operation. In this example, all the 'urgent' condition in the test set can be detected by using the threshold for 1% outlier detection.

```% Standardize the test set
dataTestZ = dataTest;
dataTestZ{:,3:end-1} = (dataTest{:,3:end-1} - mu)./sigma;

score = dataTestZ{:,3:end-1}*wcoeff;

% Plot the result
figure
plotContourmatrix(C{1}, col01per);
plotContourmatrix(C{2}, col1per);
plotContourmatrix(C{3}, col5per);
hold on;
s1 = gscatter(score(:,1),...
score(:,2),...
dataTestZ.Label);
legend(s1,{s1.DisplayName},...
'Color',    [1 1 1],...
'Location', 'northwest',...
'FontSize', 12);
ax = gca;
ax.Color = colAnomaly;
ax.Box = 'on';
xlabel('1st Principal Component','FontSize',12);
ylabel('2nd Principal Component','FontSize',12);
title({'Applying Outlier Detection Threshold for the Test Data';...
'Blue: Normal, Green: 5% outlier, Yellow: 1% outlier, Red: 0.1% outlier'},'FontSize',12)
```