Effacer les filtres
Effacer les filtres

Finding outliers in a dataset

6 vues (au cours des 30 derniers jours)
Salma fathi
Salma fathi le 2 Août 2022
Hello, shown in the image are the plots for the dataset I am having. I am trying to clean out the dataset from outliers so that later on I would use it to train a machine learning model.
but apparently it is considering a lot of important data points as outliers, so is there any other approach I could follow to get rid of the outliers?
the plot on top is the whole dataset and in the bottom is after removing the outliears using the following lines
nonOutliers=rmoutliers(Matrix3, 'mean');
figure(3);tiledlayout(2,1);nexttile;
scatter(Matrix3(:,1),Matrix3(:,2),1);
nexttile;
scatter(nonOutliers(:,1),nonOutliers(:,2),1)
ylim([0 10*10^12])
  1 commentaire
Monica Roberts
Monica Roberts le 2 Août 2022
One thing to consider is, what do you consider outliers when you look at the graph? Right now, MATLAB doesn't seem to be considering the X-values when calculating outliers. You may want to consider splitting your data into chunks and passing it into rmoutliers. I'd start at where the data shoots up and group every ~200 values of x, pass those chunks into rmoutliers, and see what happens.
There are also other parameters you can pass into rmoutliers. For instance, maybe "mean" isn't the best method of detecting outliers for this dataset. Have you tried the others? The 'movmean' or 'movmedian' methods, for instance, might do the chunking I've described.

Connectez-vous pour commenter.

Réponses (1)

Cris LaPierre
Cris LaPierre le 2 Août 2022
If you process your data in a live script, consider interactively exploring different ways to detect and remove outliers using the Clean Outlier Data live task. See here:

Produits


Version

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by