Fast way to perform multiple searches on a large array
Afficher commentaires plus anciens
I have a large time series array (10,000,000 elements) :
ts = [2; 1; 3; 4; 6; 7; .......]
I have a corresponding time array (same size as the above) :
times = [d1; d2; d3; d4; d5.......]
I have 2 arrays of start times and end times (also large ~ 30000 elements):
st = [dd1 dd2 dd3 ....]
en = [de1 de2 de3 ....]
I need to create a new matrix with many many finds. Logic is :
results = NaN(300, numel(st));
for i=1:numel(st);
temp = ts(find(times > st(i) & times < en(i) , 300,'first');
results(:,i) = temp;
end;
Is there any ay I do this faster (ideally without a loop) ?
- I have a 64 bit version so I can try a large in-memory solution.
Many thanks in advance, Nigel
8 commentaires
Jan
le 4 Oct 2011
I assume "time" should be "times" inside the loop.
Nigel
le 4 Oct 2011
Andrei Bobrov
le 4 Oct 2011
What size of 'st'?
Nigel
le 4 Oct 2011
Jan
le 4 Oct 2011
Do the intervals [st(i):en(i)] overlap?
Nigel
le 4 Oct 2011
Daniel Shub
le 4 Oct 2011
Just to confirm times, st and en are all sorted?
Nigel
le 4 Oct 2011
Réponse acceptée
Plus de réponses (2)
Jan
le 4 Oct 2011
Never let an array grow in each iteration! Pre-allocate the output:
results = NaN(300, numel(st));
for i = 1:numel(st) % Not size(st), which is a vector!
temp = ts(find(times > st(i) & times < en(i), 300, 'first');
if length(temp) == 300
results(:, i) = temp;
else
results(1:length(temp), i) = temp;
end
end
results = results(~isnan(results));
If st and times are sorted, it wastes a lot of time to compare all values. But for vectorizing this, a very large matrix would be needed, such that I assume it will be slower than the loop.
Can you solve the problem by using HISTC?
6 commentaires
Nigel
le 4 Oct 2011
Jan
le 4 Oct 2011
Well, then this answer wasted your and my time. Please include the pre-allocation and all other relevant details, if you post a code-snippet and ask for a speed improvement. Otherwise the trials to optimize your code start from a very unrealistic point.
Nigel
le 4 Oct 2011
Teja Muppirala
le 4 Oct 2011
Since you say you know it will always have 300 entries, then this:
find(times > st(i) & times < en(i), 300, 'first');
May as well be this:
find(times > st(i), 300, 'first');
Daniel Shub
le 4 Oct 2011
and since times and st are sorted
0:299+find(times > st(i), 1, 'first')
Nigel
le 4 Oct 2011
Nigel
le 4 Oct 2011
0 votes
2 commentaires
Bjorn Gustavsson
le 10 Oct 2011
Well then at least do the consequtive 'find's on shortened sections of times (with 'offset' as in Daniel's example):
idx = find(times(offset:end) > st(i), 1,'first');
Then you'd get the benefit from increasingly shorter arrays to search over but without loosing the data.
Daniel Shub
le 10 Oct 2011
I wonder if this would be faster. I would hope MATLAB is smart enough not to have to reallocate memory for my method. Yours is probably a little safer. I was also thinking that working from the end backwards might ultimately be the fastest.
Catégories
En savoir plus sur Operators and Elementary Operations dans Centre d'aide et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!