MATLAB Answers

deleting a part of a column - date to date??

4 views (last 30 days)
daten1=floor(gas_calcorr(1,1));
% daten1=datenum(2018,08,20);
% daten2=floor(gas_calcorr(end,1));
daten2=datenum(2018,08,31);
RemoveData=(gas_calcorr(daten1:daten2,7));

  3 Comments

Benjamin Großmann
Benjamin Großmann on 9 Mar 2020
Please be more specific about your data, problem and question. I think, that you also do not want to use daten1 and daten2 as indices for gas_calcorr since you use datenum to create these dates and thatfor index 1 would correspond to the first of january in the year 0.
Micky Josipovic
Micky Josipovic on 9 Mar 2020
Yes,you are right. wrong way.. To be more specific: I just want to delete very noisy data over a period when an instruemnt was malfunctioning and unfortunately it is along vector; Goes from 01/02/2018 untill 31/08/2018. My matrix is the following
% Columns:
% 1: time
% 2: pressure drop in inlet (provides information on possible jams in inlet)
% 3: O3
% 4: SO2
% 5: NO
% 6: NOx
% 7: CO
All others are fine except CO an that must go out (become nan) within this date. My data is in 1-minute resolution and there are too much to manilpulate it in variable editor. Could you write me an example script? Thx. MJ
Benjamin Großmann
Benjamin Großmann on 9 Mar 2020
Okay, I think i got the problem and prepare a example script. Is the column 2 a criterion for the malfunction so that if column2 is true than CO should be NaN?

Sign in to comment.

Accepted Answer

Benjamin Großmann
Benjamin Großmann on 9 Mar 2020
clearvars
close all
clc
% lets create the date column (I only use 1 hour with increment of 1 minute), but this should works for any length
dt_datetime = [datetime(2018,02,01,14,00,00):minutes(1):datetime(2018, 02, 01, 14, 59, 59)];
dt = datenum(dt_datetime); % this should look like your first column transposed
% Generate the rest of your data as random values and attach it to the time
% vector
data_orig = [dt' rand(size(dt,2), 6)];
% now, the variable "data_orig" should have the dimensions of your gas_calcorr variable
% We now can try to manipulate the data
%% Example 1) Give specific start and end date and set the CO values (seventh column) within these dates to NaN
data1 = data_orig; % do not override the original data since we need it for another example
start_date = datenum(2018, 02, 01, 14, 20, 00);
end_date = datenum(2018, 02, 01, 14, 25, 00);
% generate a mask where the date fullfills the criterion
mask1 = (data1(:, 1) >= start_date) & (data1(:, 1) <= end_date); % creates a logical vector with 1s and 0s
% use logical indexing as row index to apply the mask:
% Set the values in the 7th column and each row where the mask is 1 to NaN
data1(mask1, 7) = NaN;
%% Example 2) Search for a criterion in column 2 and apply the mask
% do not override the original data since we may need it for another example
data2 = data_orig;
mask2 = data2(:,2) >= 0.5;
% use logical indexing as row index to apply the mask for the corresponding mask
data2(mask2, 7) = NaN;

  9 Comments

Show 6 older comments
Micky Josipovic
Micky Josipovic on 10 Mar 2020
Thanks Benni,
It worked! Thanks a lot. However, whene I try to load and plot the column 7 (CO gas) my script (below) plots all (from desired date) and not CO column at all. Please could you look at whete I went wrong?
% Program to visually remove bad data from gas_calcorr measurements
% VV 15.4.2010 :
% - all paths to files are collected in paths.m. Modify it!
% - Using Trailer_ini.xls to read calibrations and filter out maintenance periods
% Columns:
% 1: time
% 2: pressure drop in inlet (provides information on possible jams in inlet)
% 3: O3
% 4: SO2
% 5: NO
% 6: NOx
% 7: CO
load('gas_calcorr_O3_SO2_NO_NOx_out.mat');
% load('gas_calcorrected.mat'); % gas_calcorr
% gas_calcorr=gases_uncor;
% load O3;
load('../raw_data/soil_raw.mat'); % soil data for power indicator
col=7 % check and remove from this column
collabel='CO';
%% remember to save and load corrected matrix "gas_calcorr" before continuing with
%% next next run
% daten1=floor(gas_calcorr(1,1));
daten1=datenum(2018,9,1);
daten2=floor(gas_calcorr(end,1));
% daten2=datenum(2017,8,22);
% for daten=daten1:daten2
for daten= daten1:daten2
grr1=find(gas_calcorr(:,1)>=daten,1,'first');
grr2=find(gas_calcorr(:,1)<=daten+1,1,'last');
figure(1)
subplot(2,1,1)
plot(gas_calcorr(grr1:grr2,1),gas_calcorr(grr1:grr2,3),'.b')
ylabel('O3')
xlim([daten daten+1])
datetick('x','HH:MM','keeplimits')
grid on;
subplot(2,1,2)
plot(gas_calcorr(grr1:grr2,1),gas_calcorr(grr1:grr2,4),'.b')
ylabel('SO2')
xlim([daten daten+1])
datetick('x','HH:MM','keeplimits')
grid on;
figure(2)
subplot(2,1,1)
plot(gas_calcorr(grr1:grr2,1),gas_calcorr(grr1:grr2,5),'.b',gas_calcorr(grr1:grr2,1),gas_calcorr(grr1:grr2,6),'.g')
ylabel('NO, NOx')
xlim([daten daten+1])
datetick('x','HH:MM','keeplimits')
grid on;
subplot(2,1,2)
plot(gas_calcorr(grr1:grr2,1),gas_calcorr(grr1:grr2,7),'.b')
ylabel('CO')
xlim([daten daten+1])
datetick('x','HH:MM','keeplimits')
grid on;
addpath ../raw_data/; % use one function from this folder
plot_power_rh(soil(soil(:,1)>=daten & soil(:,1)<=daten+1 ,[1 17]),[],4);% plot power indicator, not RH, in fig 4
rmpath ../raw_data/;
figure(3) % this is the figure where you remove bad points
plot(gas_calcorr(grr1:grr2,1),gas_calcorr(grr1:grr2,col),'.r')
ylabel(collabel)
xlim([daten daten+1])
datetick('x','HH:MM','keeplimits')
grid on;
title(['Remove bad ' datestr(daten,'dd.mm.yyyy')])
rem=input('Remove? ','s');
while ~isempty(rem)
disp('Removing all data between two x-values.')
[x y]=ginput(2);
if x(1)<daten
x(1)=daten;
end
if x(2)>daten+1
x(2)=daten+1;
end
disp([datestr(x(1),'yyyymmdd HH:MM') ' ' datestr(x(2),'yyyymmdd HH:MM')])
if x(2)>x(1)
hold on
plot(gas_calcorr(gas_calcorr(:,1)>=x(1) & gas_calcorr(:,1)<=x(2),1 ),gas_calcorr(gas_calcorr(:,1)>=x(1) & gas_calcorr(:,1)<=x(2),col ),'ok');
hold off
else
disp('x(2) > x(1)! Period rejected, no data will be removed.');
end
go_on=input('Ok? ','s');
while ~isempty(go_on)
figure(3)
plot(gas_calcorr(grr1:grr2,1),gas_calcorr(grr1:grr2,col),'.r')
ylabel(collabel)
xlim([daten daten+1])
datetick('x','HH:MM','keeplimits')
grid on;
title(['Remove bad ' datestr(daten,'dd.mm.yyyy')])
[x y]=ginput(2);
disp([datestr(x(1),'yyyymmdd HH:MM') ' ' datestr(x(2),'yyyymmdd HH:MM')])
if x(2)>x(1)
hold on
plot(gas_calcorr(gas_calcorr(:,1)>=x(1) & gas_calcorr(:,1)<=x(2),1 ),gas_calcorr(gas_calcorr(:,1)>=x(1) & gas_calcorr(:,1)<=x(2),col ),'ok');
hold off
else
disp('x(2) > x(1)! Period rejected, no data will be removed.');
end
go_on=input('Ok? ','s');
end
if x(2)>x(1)
gas_calcorr(gas_calcorr(:,1)>=x(1) & gas_calcorr(:,1)<=x(2),col )=nan;
else
disp('x(2) > x(1)! Period rejected, no data will be removed.');
end
figure(3)
plot(gas_calcorr(grr1:grr2,1),gas_calcorr(grr1:grr2,col),'.r')
ylabel(collabel)
xlim([daten daten+1])
datetick('x','HH:MM','keeplimits')
grid on;
title(['Remove bad ' datestr(daten,'dd.mm.yyyy')])
rem=input('Remove? ','s');
end
end
Benjamin Großmann
Benjamin Großmann on 10 Mar 2020
Hey Micky,
your code seems fine. It could be improved at one point or another, but it gets the job done. I think, that you only looked at data points where the CO value is NaN. Remember, as I said in the earlier comment, the data that you uploaded to google drive already contains close to 200.000 NaNs. If you dont see any CO data in the plot, then the whole day contains NaNs.
Please set your daten1 variable to something like
daten1=datenum(2018, 11, 15);
to get some data for the CO plot.
If you set it to
daten1=datenum(2018, 12, 04);
you can see a gap in the data in all subplots.
Please let me know if you need further help. Do you know where these NaNs in your original data come from? We can also try to investigate the NaNs in your original data, maybe graphically.
Micky Josipovic
Micky Josipovic on 11 Mar 2020
Hi Benni,
Yes indeed - there mut have been more days with NaNs after 01/09/18 (I checked a few and not all). Thanks for your assistance, highly appreciated.
The script works now as you indicated! And to answer your question about the NaNs in the raw (and semi-cleaned data matrix):
Those are generally power cut periods, interferences with our measurments due to maintenance, checks, calibrations, etc. Indeed there are many but the main culprit is our electricity grid. Giving the entire South Africa many hours of cuts and dips... So in order for us to clean the data , all those interfereances must be flaged and cut out at the beginning of further work, one where we look at other u nrealistic and unprobable outliers and cut them out at our discretion. Desite this we have high retention of data and the case with our CO-analyser was an odd one, malfunctioned February till August (we could not get another one to replace it)...
Thank you for offering your further assistance. I am fine for now but will count on "Matlab Answers", community in the future of course.
Kind regards,
MJ

Sign in to comment.

More Answers (0)

Sign in to answer this question.


Translated by