How to calculate the number of outliers from a data set according to the required maximum deviation of the remaining values

5 vues (au cours des 30 derniers jours)
Dear friends,
I'm looking for a simple promgram to do the following work.
Given a dataset of N elements:{x1,x2,...,xN}. The max-min value of this dataset is larger than D. Now I'm allowed to remove K elements of the data set, so that the remaining M-K elements satisfiy max-min<D. The problem is how to calculate the minimum possible value of K.
Now I have a program by sorting the data set and then using a "while" loop to remove one by one until finding K. But this method is toooo slow when my dataset is large, for example when N is several millions.
Does anyone have a better solution? This is more like a mathmatical problem to solve.
Thanks.
  6 commentaires
Cris LaPierre
Cris LaPierre le 12 Mai 2021
You could explore options interactively using the Remove Outliers task in a live script. See here for more info. Once you find the appropriate settings, you can convert the task to code and reuse that in your script (or just keep the task).
Jeff Miller
Jeff Miller le 13 Mai 2021
You don't need to sort but just keep track of the current min and max after each exclusion, and this might speed things up a bit. I think it will depend on whether K is small relative to N.
Can there be ties between different x elements?

Connectez-vous pour commenter.

Réponses (0)

Catégories

En savoir plus sur Matrices and Arrays dans Help Center et File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by