Replace bad data in a 24x45x65 matrix. zeroes and values greater than 10 etc..

2 vues (au cours des 30 derniers jours)
awda
awda le 25 Avr 2014
Commenté : Star Strider le 26 Avr 2014
Hello
i have a matrix of 24x45x65 of type double. it indicates power consumption of a customer in 24 hours x 45 days x 63 customers (65 cause some data are more for the last customer than rest) i will include the .mat file :)
i have like any other data some bad data As: - there is 0 values (which is bad, because a customer never stops using power, even if in vacation) - values greater than 10 kWh. however the question:
is it possible to replace the bad data with a healthier data from the next week same time same day?
the data starts on a friday -> monday -> tuesday -> wednesday -> thursday
also showing how many zeroes there were and which day had it, same geos to large numbers..
Thanks ALOOOOOOOOOOT for anyone helping

Réponses (3)

ES
ES le 25 Avr 2014
say some data is
iData=rand(24,45,65); %This data is normalized=>the double values are between 0 and 1.
You can get all double values greater than a limit 'k' by finding the logical indices..
iData>k,
similarly you can find the logical indices where the elements are equal to 0 by doing
iData==0;
  2 commentaires
awda
awda le 25 Avr 2014
It does not answer any of the mentioned question i am afraid. i already tried that and some other stuff...it shows any data starting with 0. and so on and so forth. my question was to replace :)
ES
ES le 25 Avr 2014
Modifié(e) : ES le 25 Avr 2014
Brute Force way is to write three fold for loop.
for i1=1:24
for j1=1:45
for k1=1:65
if iData(i1,j1,k1)==0 || iData(i1,j1,k1)>10
iData(i1,j1,k1)=iData(i1,j1+7,k1) %+7 to get the next week
end
end
end
end
But there are lots of loopholes here (index exceeding matrix dimension, replacement value might be out of range as well) definitely would be better methods!

Connectez-vous pour commenter.


Jos (10584)
Jos (10584) le 25 Avr 2014
What you are after is called outlier analysis. What do with recordings that are "bad" or out of range. Simply replacing them with another value might not be the best option, in terms of the underlying statistical model.
  1 commentaire
awda
awda le 26 Avr 2014
So its best to just remove the bad data? btw. Chocolate warrior that gives me alot of errors im afraid. i thank you for trying anyways. :)

Connectez-vous pour commenter.


Star Strider
Star Strider le 26 Avr 2014
Hi awda,
I’ve been thinking about this. If you have the Statistics Toolbox, I suggest you consider trimmmean in place of mean in my earlier code in the line:
hrmn(k1) = mean(hrmx(:)); % Mean
changing it to:
hrmn(k1) = trimmean(hrmx(:),0.05); % Mean
which will remove the upper and lower 2.5% of the data.
However, I caution you that zero usage could be real data. Suppose that customer had a power outage at that time? This is a real possibility, and while the customer might have wanted to use electrical power, would not have been able to.
In the end, I suggest you leave your data as they are. You have a very large dataset, and small outliers — that could be real data — are not going to affect it much.
  2 commentaires
awda
awda le 26 Avr 2014
Ok. Thank you. I will try this asap im home. Thank you again

Connectez-vous pour commenter.

Catégories

En savoir plus sur Logical dans Help Center et File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by