Remove outliers until there are none left

Dear community,
I apologize that I can't offer a better first try. I have a double array. I want to write a Loop for removing outliers from every column. The idea is: The code test for outliers, remove them, do it again, as long as there are outliers. If no outliers are found anymore, it should stop and give me back an double array without these outliers.
I tried it:
directory_name=uigetdir('','Ordner mit Messungen auswählen');
[nur_file_name,pfad]=uigetfile({'*.csv','csv-files (*.csv)';'*.*','all Files'},...
'Die csv-Files der Proben oeffnen (probe_001.csv=',[directory_name '/'], 'Multiselect', 'on');
nur_file_name=cellstr(nur_file_name);
nur_file_name=sort(nur_file_name);
filename=strcat(pfad,nur_file_name);
anzahl_files=size(filename,2);
for xy=1:anzahl_files
fid_in=fopen(char(filename(xy)),'r');
filename_s = matlab.lang.makeValidName(nur_file_name);
filename_s=string(filename_s);
filename_s = erase(filename_s,"_csv");
filename_s = erase(filename_s,"LiqQuant_");
filename_c=cellstr(filename_s);
for c=1:anzahl_files
filename_f{c}=extractBefore(filename_c{c},11);
end
filename_s=string(filename_f);
%----------------Import elements and intensity--------------------
clear element_RL
clear intens_RL
tmpImport = importdata(filename{xy},',');
element_RL = tmpImport.colheaders;
element_RL(:,[1 6 8 10 12 14 16 17 19 21 23 26 27 29 30 32 33 36 38 43 45 48 57 59 61 64 67 69 94 97 99 102 106 223 298 303 304 305])=[];
element_RL=string(element_RL);
[anzahl_zeile,anzahl_elemente]=size(element_RL);
intens_RL=tmpImport.data;
intens_RL(:,[1 6 8 10 12 14 16 17 19 21 23 26 27 29 30 32 33 36 38 43 45 48 57 59 61 64 67 69 94 97 99 102 106 223 298 303 304 305])=[];
[anzahl_runs,anzahl_elemente]=size(intens_RL);
%---------------remove outliers----------------
while intens_RL=ismember(NaN) %Wrong, because will run forever
threshold = mean(intens_RL)+3*std(intens_RL);
intens_RL(bsxfun(@(x, y) x > y, intens_RL, threshold)) = NaN; %outliers removing, set to NaN
end
I am sorry that my loop is so horrible, but I never wrote a while-loop before.
I am grateful for every small help
THANK YOU

 Réponse acceptée

Mathieu NOE
Mathieu NOE le 3 Mai 2023
hello
I updated the end of your code
the plot is for myself to see the difffences before / after thresholding (if hot spots are indeed removed)
%---------------remove outliers----------------
figure(1)
clim = [-5 7];
subplot(211),imagesc(log10(abs(intens_RL)),clim);colormap('jet');colorbar("vert")
title('before thresholding');
threshold = mean(intens_RL,1,'omitnan')+3*std(intens_RL,1,'omitnan');
ind = intens_RL>(ones(anzahl_runs,1)*threshold);
% ind = intens_RL>threshold; % works too
intens_RL(ind) = NaN;
subplot(212),imagesc(log10(abs(intens_RL)),clim);colormap('jet');colorbar("vert")
title('after thresholding');

4 commentaires

Tatjana Mü
Tatjana Mü le 3 Mai 2023
Thank you so much. But in this case, I just remove outliers once from every column. But in the end I want to run a loop, with updated thresholds, until the curve is flatted and no outliers are any more found.
ooops ; forgot that point
here you are , the full code updated with the while loop
directory_name=uigetdir('','Ordner mit Messungen auswählen');
[nur_file_name,pfad]=uigetfile({'*.csv','csv-files (*.csv)';'*.*','all Files'},...
'Die csv-Files der Proben oeffnen (probe_001.csv=',[directory_name '/'], 'Multiselect', 'on');
nur_file_name=cellstr(nur_file_name);
nur_file_name=sort(nur_file_name);
filename=strcat(pfad,nur_file_name);
anzahl_files=size(filename,2);
for xy=1:anzahl_files
fid_in=fopen(char(filename(xy)),'r');
filename_s = matlab.lang.makeValidName(nur_file_name);
filename_s=string(filename_s);
filename_s = erase(filename_s,"_csv");
filename_s = erase(filename_s,"LiqQuant_");
filename_c=cellstr(filename_s);
for c=1:anzahl_files
filename_f{c}=extractBefore(filename_c{c},11);
end
filename_s=string(filename_f);
%----------------Import elements and intensity--------------------
clear element_RL
clear intens_RL
tmpImport = importdata(filename{xy},',');
element_RL = tmpImport.colheaders;
element_RL(:,[1 6 8 10 12 14 16 17 19 21 23 26 27 29 30 32 33 36 38 43 45 48 57 59 61 64 67 69 94 97 99 102 106 223 298 303 304 305])=[];
element_RL=string(element_RL);
[anzahl_zeile,anzahl_elemente]=size(element_RL);
intens_RL=tmpImport.data;
intens_RL(:,[1 6 8 10 12 14 16 17 19 21 23 26 27 29 30 32 33 36 38 43 45 48 57 59 61 64 67 69 94 97 99 102 106 223 298 303 304 305])=[];
[anzahl_runs,anzahl_elemente]=size(intens_RL);
%---------------remove outliers----------------
figure(1)
clim = [-5 7];
subplot(211),imagesc(log10(abs(intens_RL)),clim);colormap('jet');colorbar("vert")
title('before thresholding');
c = 1; % init c above 0
while c>0
threshold = mean(intens_RL,1,'omitnan')+3*std(intens_RL,1,'omitnan');
ind = intens_RL>(ones(anzahl_runs,1)*threshold);
% ind = intens_RL>threshold; % works too
b = find(ind);
c = numel(b) % will display in the command window how many outliers are removed at each iteration
intens_RL(ind) = NaN;
end
subplot(212),imagesc(log10(abs(intens_RL)),clim);colormap('jet');colorbar("vert")
title('after thresholding');
end
Tatjana Mü
Tatjana Mü le 3 Mai 2023
Thank you so much, it's working perfectly.
Mathieu NOE
Mathieu NOE le 3 Mai 2023
My pleasure !

Connectez-vous pour commenter.

Plus de réponses (0)

Catégories

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by