how to put zero or nan instead of rejecting my data in Chauvenet-Script
6 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
panik772 illza
le 22 Déc 2014
Commenté : Star Strider
le 22 Déc 2014
hello, my task is to detect outliers in large dataset using chauvenet criterion.. Chauvenet-Test said: A reading may be rejected if the probability of obtaining the particular deviation is less than 1/2n. in other words it compares the probability of data deviation and reject the data from a list, if this distance is to large.. So, my question is not to Reject a data, but to replace bad data with 0 or NaN ..
I have following script:
`function [ data_bio2, data_percent_rejected, data_cv ] = chauvenet( x )
% remove zero entries
data_zeros=find(x==0.0);
data_nonzeros=find(x>0.0);
data_bio2 = x(data_nonzeros);
% compute length, mean, std, min max of non-zero data
data_length2=length(data_bio2); %
data_mean2 =mean(data_bio2); %
data_standard2 = std(data_bio2); %
data_max2 = max(data_bio2); %
data_min2 = min(data_bio2); %
% Part three - Identify outliers using Chauvenets criterion
% Z-score data and compute two-sided Z-score for Chauvenets criteria
data_probability = 1/(2*length(data_nonzeros)); %
data_zscore = (data_bio2 - data_mean2)/(data_standard2);
data_ptest = 1 - data_probability/2;
zc=norminv(data_ptest, 0, 1);
% Hence, reject data with biomass > std*zc
data_limit = zc * data_standard2;
data_cv = data_bio2( data_zscore >= -zc & data_zscore <= zc );
data_cvlength = length(data_cv);
index_rejected = find(data_zscore > zc | data_zscore < -zc);
%!!! index_rejected: these are the indices of the rejected values in your data vector
data_rejected = data_bio2(data_zscore > zc | data_zscore < -zc)
index_rejected_original = data_nonzeros(index_rejected); %!!!FLAG THOSE LINES!!!
biomass_rejected_original = data_bio(index_rejected_original);
%!!!index/biomass_rejected_original: these are the lines/biomasses
%of your original data file that need to be flagged
% percent of data rejected by Chavenets criterion
data_percent_rejected = (1- data_cvlength/length(data_bio2))* 100
% compute histogram using linear bin-size
[M,Y]=hist(data_bio2,1000);
[M_cv]=hist(data_cv,Y);
end
So, how can I change the script to put zero or Nan for my bad data and not to reject it from the list Thank you in advance!
0 commentaires
Réponse acceptée
Star Strider
le 22 Déc 2014
If I understand your code correctly, this will replace your ‘data_rejected’ selections withto NaN:
data_bio2(index_rejected) = NaN;
I would replace them with NaN instead of zero because zero could enter into your calculations and be considered a valid number. NaN will not be considered a valid number.
4 commentaires
Plus de réponses (0)
Voir également
Catégories
En savoir plus sur Preprocessing Data dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!