Hello, I have two matrices of different lengths and this is what the scenario looks like ..
x = [...];
y = [...];
size(x) = 5800 * 16
size(y) = 450 * 14
% X & Y have dates & times in the first six columns in this form:
% year, month, day, hour, minute, second
% Each column represents a variable
% Each row represents a data sample
% A model to predict a variable in (X) after some time
...
X_time + some_time = predicted_time; % in hours
% "X_time" is the time of (X)
% "Y_time" is the time of (Y)
% Match that predicted time with the time of (Y) within a range of +/- 11 hours
for i = 1:length(x)
for j = 1:length(y)
if (predicted_time >= Y_time-11) && (Y_time+11 >= predicted_time) is True
MATCHED = [x(i,:) y(j,:) predicted_time];
end
end
end
Please, I want to know how to make this work as I tried a lot but it didn't work properly.

10 commentaires

jonas
jonas le 17 Juil 2018
Modifié(e) : jonas le 17 Juil 2018
I don't understand. You lost me at " a model to predict a variable..."
It seems that you have a set of variables in one time-series and another set variables in another time-series. What exactly do you want to do with those? What does match mean in this context?
Albert Fan
Albert Fan le 17 Juil 2018
If I understand your question correctly, you are asking for how to compare dates and times? If so, you can refer to this document, and you can make a datetime object by using something like:
t = datetime(Y,M,D,H,MI,S)
e.g.:
t = datetime(2003,10,24,12,45,07)
which will create a datetime object for date Oct, 10, 2013 12:45:07 (24-hour format).
If you want to add or subtract the time, you can simply do:
t.Hour = t.Hour + 2 which will add two hours to t, and it will become: Oct, 10, 2013 14:45:07 (24-hour format).
Mohamed Nedal
Mohamed Nedal le 17 Juil 2018
Please check the attached image. This is how the data is presented. The date and time are within the first six columns and the other columns are just variables for storms. Each row represents a data record for one storm with its date, time, and properties.
  • I have another dataset with a similar style but with a different number of rows and columns.
  • The first data set represents the storms at some location and the other data set represents the storms arrived at another location (here we're not concerned with the location ^^).
  • The issue is not all the storms recorded in the first dataset arrived at that location, that's why the data records in the second data set are much fewer than that of the first data set.
  • So, I need to match each storm in the first data set with its record in the second data set.
  • I'm using a model to predict the arrival time of the storm, but that model isn't accurate 100%.
  • So, I put a window for the error with + or - 11 hours (a specific value for that model based on researches). Let me know your thoughts.
Mohamed Nedal
Mohamed Nedal le 17 Juil 2018
@Albert Fan. Thank you but this isn't quite what I need. Please refer to my previous comment. Regards
jonas
jonas le 17 Juil 2018
Modifié(e) : jonas le 17 Juil 2018
OK! Things are clearing up. I still don't understand what match means though. What do you want to match? This is my understanding so far:
  • Each row describes the characteristics of a single storm
  • All storms are available in data set 1 (SET1)
  • A subset of the storms in SET1 are also available in SET2
  • You want to find this subset in SET1 by matching those storms.
Is this correct so far?
If so, how do you want to match them? Do they share the same characteristics or the same time-slots. If it's the latter, then you should follow Albert Fan's advice.
A wild guess: You measure different things at the two locations, and now you want to create a dataset with 450 storms and 18 variables?
Albert Fan
Albert Fan le 17 Juil 2018
I see, you are trying to get something from, if I were to quote jonas's words, SET2 with the data you have in SET1, if it is recorded in SET2, is that correct? If so, then the question becomes how are you going to match the storms? Do they share anything in common? I've noticed that their location data is sort of different, right?
Mohamed Nedal
Mohamed Nedal le 17 Juil 2018
@jonas. Yes, that's correct.
  • Each storm in SET2 has a counterpart in SET1. In other words, it's the same storm, but with different properties after some time.
  • I want to match between both datasets via the time-slots.
  • Theoretically, the storms in SET1 should arrive after a specific time based on the model I use, but this isn't the case in reality. The storms arrive around the theoretical arrival time within a window +/- 11 hours.
  • So, I'm looking for the storms in SET1 that already arrived (recorded in SET2) within a window of arrival time +/- 11 hours.
  • Yes, I want to create a dataset with the storms (and their properties) and their transit times.
Mohamed Nedal
Mohamed Nedal le 17 Juil 2018
@Albert Fan. Yes, that's correct.
  • Yes, they share the same source, actuallty it's the same storm but after some time at a different location.
  • Each storm in SET2 has a counterpart in SET1. In other words, it's the same storm, but with different properties after some time.
  • I want to match between both datasets via the time-slots.
  • Theoretically, the storms in SET1 should arrive after a specific time based on the model I use, but this isn't the case in reality. The storms arrive around the theoretical arrival time within a window +/- 11 hours.
  • So, I'm looking for the storms in SET1 that already arrived (recorded in SET2) within a window of arrival time +/- 11 hours.
jonas
jonas le 17 Juil 2018
Modifié(e) : jonas le 17 Juil 2018
So, what you need to do is:
  1. Loop through each storm in SET2
  2. Calculate the corresponding time until it reaches the location of SET1
  3. Find the storm that is closest in time to this value
Right?
These are simple steps, and it seems to me that this is almost what Albert Fan proposed some comments ago. If you provide some sample data to work with, I'm sure someone will give you code now that the problem is clearly stated. I guess we also need your model though, unless you can provide the modelled time-slots.
After this discussion, the initial code actually makes some sense :)
@jonas. Yes, I guess as you said I need to loop through each storm in SET1, calculate the corresponding time until it reaches the location of SET2, and finally find the storm in SET2 that is closest in time to that value.
  • Kindly find the attached files. The following is the code I wrote so far but still it doesn't work as it should be. The "matched set" gives zeros.
tic
close all; clear; clc
%%Read Data
soho = xlsread('start.xlsx'); % initial storms data
shocks = xlsread('end.xlsx', 1); % final storms data
%%Start Time
yr1 = soho(4:end,1);
M1 = soho(4:end,2);
d1 = soho(4:end,3);
hh1 = soho(4:end,4);
mm1 = soho(4:end,5);
ss1 = soho(4:end,6);
%%End Time (start of Shocks)
yr2 = shocks(4:end,1);
M2 = shocks(4:end,2);
d2 = shocks(4:end,3);
hh2 = shocks(4:end,4);
mm2 = shocks(4:end,5);
ss2 = shocks(4:end,6);
%%Parameters
% CMEs
CPA = soho(4:end,7);
w = soho(4:end,8);
vl = soho(4:end,9);
vi = soho(4:end,10);
vf1 = soho(4:end,11);
v20Rs = soho(4:end,12);
a1 = soho(4:end,13);
mass = soho(4:end,14);
KE = soho(4:end,15);
MPA = soho(4:end,16);
% Shocks
vfinal = shocks(4:end,11);
T =shocks(4:end,13);
N = shocks(4:end,14);
%%Inistial Values
AU = 149599999.99979659915; % Sun-Earth distance in km
d = 0.76 * AU; % cessation distance in km
%%Pre-Allocating Variables
a_calc = zeros(size(vl));
squareRoot = zeros(size(vl));
A = zeros(size(vl));
B = zeros(size(vl));
ts = zeros(size(vl));
t_hrs = zeros(size(vl)); % predicted transit time in hours
t_mn = zeros(size(vl)); % predicted transit time in minutes
%%G2001 Model
% calculations
for i = 1:length(vl)
a_calc(i) = power(-10,-3) * ((0.0054*vl(i)) - 2.2); % in km/s2
squareRoot(i) = sqrt(power(vl(i),2) + (2*a_calc(i)*d));
A(i) = (-vl(i) + squareRoot(i)) / a_calc(i);
B(i) = (AU - d) / squareRoot(i);
ts(i) = A(i) + B(i); % in seconds
t_mn(i) = ts(i) / 60; % in minutes
end
clear i;
%%Show the predicted travel time
% CME-ICME Matching
matchedSet = zeros(length(shocks), 33); % the final set of CME-ICME pairs
for n = 1:length(yr2)
if yr2(n) == yr1(n)
for m = 1:length(M2)
if M2(m) == M1(m)
for k = 1:length(d2)
if d2(k) == d1(k)
if (t_mn(k) >= (((hh2(k)*60)+mm2(k)+(ss2(k)/60))-11)) && ((((hh2(k)*60)+mm2(k)+(ss2(k)/60))+11) >= t_mn(k))
matchedSet(k, 1:16) = soho(k,:);
matchedSet(k, 18:32) = shocks(k,:);
end
end
end
end
end
end
end
clear n; clear m; clear k;
toc
I really appreciate that :)

Connectez-vous pour commenter.

 Réponse acceptée

jonas
jonas le 18 Juil 2018
Modifié(e) : jonas le 18 Juil 2018
I've made an attempt to fix your code and match the two time-vectors. I've converted your time-vectors to datetime format and fixed your matching-algorithm. The matching works by looping through the modelled time-vector, which is based on the longer time-vector (SET1), and finding the closest match in the smaller time-vector (SET2). A match is only stored if the absolute difference is smaller than 11 hours.
The output, id, is a vector with two columns, where each row [id1 id2] shows the matched indices, i.e. the row of SET1 with the corresponding row of SET2.
NOTE: id is longer than SET2, which would indicate that some elements of SET2 are matched twice.
%%Read Data
soho = xlsread('start.xlsx'); % initial storms data
shocks = xlsread('end.xlsx', 1); % final storms data
%%Start Time
%%EDITED %%
t1=datetime(soho(4:end,1:6));
t2=datetime(shocks(4:end,1:6));
%%ORIGINAL CODE %%
%%Parameters
% CMEs
CPA = soho(4:end,7);
w = soho(4:end,8);
vl = soho(4:end,9);
vi = soho(4:end,10);
vf1 = soho(4:end,11);
v20Rs = soho(4:end,12);
a1 = soho(4:end,13);
mass = soho(4:end,14);
KE = soho(4:end,15);
MPA = soho(4:end,16);
% Shocks
vfinal = shocks(4:end,11);
T =shocks(4:end,13);
N = shocks(4:end,14);
%%Inistial Values
AU = 149599999.99979659915; % Sun-Earth distance in km
d = 0.76 * AU; % cessation distance in km
%%Pre-Allocating Variables
a_calc = zeros(size(vl));
squareRoot = zeros(size(vl));
A = zeros(size(vl));
B = zeros(size(vl));
ts = zeros(size(vl));
t_hrs = zeros(size(vl)); % predicted transit time in hours
t_mn = zeros(size(vl)); % predicted transit time in minutes
%%G2001 Model
% calculations
for i = 1:length(vl)
a_calc(i) = power(-10,-3) * ((0.0054*vl(i)) - 2.2); % in km/s2
squareRoot(i) = sqrt(power(vl(i),2) + (2*a_calc(i)*d));
A(i) = (-vl(i) + squareRoot(i)) / a_calc(i);
B(i) = (AU - d) / squareRoot(i);
ts(i) = A(i) + B(i); % in seconds
t_mn(i) = ts(i) / 60; % in minutes
end
clear i;
t_model=t1+minutes(t_mn);
%%Show the predicted travel time
% CME-ICME Matching
%%EDITED FROM HERE AND ON %%
id=nan(numel(t_model),2);
for i=1:numel(t_model)
[MinDiff ind]=min(abs(t2-t_model(i)));
if MinDiff<hours(11)
id(i,1:2)=[i ind];
else
id(i,1:2)=[i NaN];
end
end
id(isnan(id(:,2)),:)=[];
plot(id(:,1),id(:,2),'.')

8 commentaires

Could you please explain what do the following lines mean?
  • 1st part:
[MinDiff ind] = min(abs(t2 - t_model(i)));
if MinDiff < hours(11)
id(i, 1:2) = [i ind];
else
id(i, 1:2) = [i NaN];
end
  • 2nd part:
id(isnan(id(:,2)), :) = [];
plot(id(:,1), id(:,2), '.')
Thank you so much for your efforts!
jonas
jonas le 18 Juil 2018
Modifié(e) : jonas le 18 Juil 2018
I wanted to find the closest value to t_model(i) in t2. This is done by subtracting all values of the vector t2 with t_model(i) and finding the smallest number (mindiff) and its index (ind). If the difference (mindiff) is less than 11 hours, then it is saved in id. If not, then a NaN is saved instead, meaning no match.
In the 2nd one I removed all rows with NaN (no match), as I assumed you were only interested in matched. I also plotted the matches to see that the results make sense.
Mohamed Nedal
Mohamed Nedal le 18 Juil 2018
Sorry, I'm confused..
  • id1 is the index number of the record in SET1, and id2 is the index number of the corresponding record in SET2, is that right?
  • And what do id(:,1) and id(:,2) represent?
Please check the attached image. It's the layout of the matched dataset as I imagine. Each row represents one storm with all its data (the initial date, time, and properties & the final date, time, and properties, with the theoretical and the actual time differences).
jonas
jonas le 19 Juil 2018
Modifié(e) : jonas le 19 Juil 2018
Maybe I can answer that follow-up by making some timetables. Just add the following lines at the end of the script.
MatchedSet1=array2timetable(soho(id(:,1),7:end),'RowTimes',t1(id(:,1)));
MatchedSet2=array2timetable(shocks(id(:,2),7:end),'RowTimes',t2(id(:,2)));
PredictedTimeDiff=hours(minutes(t_mn(id(:,1))));
ActualTimeDiff=hours(t2(id(:,2))-t1(id(:,1)));
Error=PredictedTimeDiff-ActualTimeDiff;
As you can see id(:,1) describes the row-index of the long vector (SET1) that are matched with id(:,2) of the short vector (SET2). You can easily export those into excel, and excel will appreciate that you use the datetime format instead of single columns for hours/min/seconds etc... :)
Important to note is that the maximum error is 11 hours. If the actual error is more than 11 hours, then the match is not recorded. Therefore, the results do not infer that the maximum error is 11 hours.
Mohamed Nedal
Mohamed Nedal le 19 Juil 2018
array2timetable function isn't introduced in Matlab R2015a, unfortunately. Could you please suggest some alternative for it that can work on older versions?
In the meantime, I'll try to get a newer version.
jonas
jonas le 19 Juil 2018
Modifié(e) : jonas le 19 Juil 2018
MatchedSet1=array2table(soho(id(:,1),7:end));
addvars(MatchedSet1,t1(id(:,1)),'before',1)
MatchedSet2=array2table(shocks(id(:,2),7:end));
addvars(MatchedSet2,t2(id(:,2)),'before',1);
PredictedTimeDiff=hours(minutes(t_mn(id(:,1))));
ActualTimeDiff=hours(t2(id(:,2))-t1(id(:,1)));
Error=PredictedTimeDiff-ActualTimeDiff;
If you want you can add all those to a single table using addvars, and print them to a single xls-file. I'll let you figure that one out yourself :)
Definitely recommend updating to a newer version if you're still running 2015a
Mohamed Nedal
Mohamed Nedal le 19 Juil 2018
Okay, I'll do that.
Thank you so much for your help.
I'll use this code in the analysis phase of my paper and I was thinking of acknowledging you if you don't mind.
If it's okay with you, please send me your information such as the first name, the last name, the organization, the specialty, and the email address.
jonas
jonas le 19 Juil 2018
That's really kind of you, I'm flattered. However, this is fairly standard stuff so there is absolutely no need for me to take up space in your paper. I'm a final year PhD student myself and, although I have zero questions asked, this forum has helped me a ton throughout my work. I'm just happy to give something back.
Btw, as a final note. Since this is for a scientific paper, don't forget that if the error is larger than 11 hours, then it's not included as a match. Also double-check why the "matched" number of entries is larger than the total number of entries in SET2 (I think it's about 550 matches compared to 450 unique entries in SET2).
Good luck in your work and let me know if you need more help!

Connectez-vous pour commenter.

Plus de réponses (0)

Catégories

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by