Outlier removal from a matrix
21 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
I removed the outliers from my dataset with rmoutliers(A,'mean') command. It should remove the data 3 standard deviations from the mean of each column. But when I print the histogram of each column, there are still some data as far as 6 standard deviations away. What do you suggest? Here is my code:
A = rmoutliers(table_data,'mean');
Zscores = zscore(A); %(A is a 50000*12 matrix)
figure
histogram(Zscores(:,2))
In the histogram, there are still some data as far as 6 standard deviations away.
1 commentaire
John D'Errico
le 11 Oct 2022
help rmoutliers
I had to go to the doc to check your claim that rmoutliers with the 'mean' option does specifically use 3 standard deviations as the cutoff, away from the mean and then it removes the entire row containing that outlier. This is true. But rmoutliers is not a perfect tool, and any such tool can have problems if you dare to push its limits.
x = [ones(1,5),1 + eps,10]
xhat = rmoutliers(x)
xhat == 1
So rmoutliers first removed the 10 as being more than 3 sigma out, but then, since the standard deviation of the first 5 elements is exactly zero, 1+eps is ALSO more than 3 sigma out, and a clear outlier. The point is, if you try hard enough, you can always cause any such adaptive tool to exhibit strange behavior.
But if you want to know what happened, then you need to provide your data. Otherwise, anything is just a wild guess.
Attach it to a comment (not as an answer), in a .mat file.
Réponses (0)
Voir également
Catégories
En savoir plus sur Descriptive Statistics dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!