Processing a HUGE number of timestamps

2 vues (au cours des 30 derniers jours)
Matlab2010
Matlab2010 le 13 Nov 2014
Réponse apportée : Jan le 16 Nov 2014
I have a cell array whose elements are time stamps in the format "Mon Apr 01 20:00:00 BST 2013". I have a very large number of these vectors. At the moment, I loop through each value in the vector and apply the below function. This loop taking up 99% of my processing time.
How can I remove the loop?
thanks
function myTimeOut = st_timestampConvert(myTime)
year = strtrim(myTime(end-4:end));
month = strtrim(myTime(5:8));
day = strtrim(myTime(9:10));
time = strtrim(myTime(11:19));
timezone = strtrim(myTime(20:23));
myTimeOut = convert_to_UTC(myTimeOut, timezone); %time zone conversion
myTimeOut = datenum([day '-' month '-' year ' ' time], 'dd-mmm-yyyy HH:MM:SS');
end

Réponse acceptée

Guillaume
Guillaume le 14 Nov 2014
Without R2014b datetime, you can use regexprep to rearrange the bits of the string you want before calling datenum. It's many orders of magnitude faster than a loop, cellfun, or strsplit.
s2 = regexprep(s, '(\w+) (\w+) (\d+) ([0-9:]+) (\w+) (\d+)', '$3-$2-$6 $4');
tout = datenum(s2, 'dd-mmm-yyyy HH:MM:SS');
On my machine, to process 100k dates, the above two lines takes 2.1 seconds , most of it taken by the datenum operation. The regexp line is only 0.3 seconds.
There remains the problem of the time zone adjustment (which I believe should have come after the conversion to datenum in your example). Your convert_to_UTC is not part of matlab. Hopefully it can operate on cell arrays as well. Thus to extract the timezone:
tzones = regexp(s, '\w+(?= \d+$)', 'match', 'once');
tout = convert_to_UTC(tout, tzones); %Will this work?

Plus de réponses (2)

Peter Perkins
Peter Perkins le 13 Nov 2014
none, I don't know if you have access to R2014b. If you do, consider using the new datetime data type. On a not so fast PC, parsing 100000 strings like yours, with time zones, takes a bit over 2s. 'BST' presents a potential issue, because it might mean any number of things. In the UK, it means "British Summer Time", and the following parses the strings using that locale. Hope this helps.
% construct 100k strings
d = datetime(2013,4,1,20,0,0,'TimeZone','Europe/London') + days(randn(100000,1));
s = cellstr(d,'eee MMM dd HH:mm:ss z yyyy','en_UK');
ans =
Tue Apr 02 14:33:32 BST 2013
% parse those strings
tic
d1 = datetime(s,'Format','eee MMM dd HH:mm:ss z yyyy','TimeZone', ...
'Europe/London','Locale','en_UK');
toc
  1 commentaire
Matlab2010
Matlab2010 le 14 Nov 2014
Peter.
I am using win7 and 2014A.
I have N cell arrays, each with T elements. N = 0.5M, T = 0.25M.
I have a PC with 244GB RAM and 32 Cores. I use parfor for such jobs.
BST does mean British summer time. Though I deal with all time zone conversions myself. You can assume each N is in the same time zone, thus, the time zone conversion can be done as a single vectorized operation.
The issue is how to extract from this cell and get into datenum without using a for loop.
thank you

Connectez-vous pour commenter.


Jan
Jan le 16 Nov 2014
I prefer Guillaume's version, but it is not "magnitudes" faster than a loop approach:
function DOut = ConvertCellDate(DIn)
DOut = zeros(size(DIn));
for k = 1:numel(DIn)
Dx = double(DIn{k} - '0'); % For faster conversion of numbers
month = (strfind('JanFebMarAprMayJunJulAugSepOctNovDec', DIn{k}(5:7)) + 2) / 3;
year = Dx(25) * 1000 + Dx(26) * 100 + Dx(27) * 10 + Dx(28);
DOut(k) = datenummx(year, month, Dx(9) * 10 + Dx(10), ...
Dx(12) * 10 + Dx(13), Dx(15) * 10 + Dx(16), ...
Dx(18) * 10 + Dx(19));
end
Phew, this looks cruel and is not smart to debug. But it takes 2.3 sec on my Matlab 2011b/Win7/32 system, while Guillaume's method takes 1.7 sec.

Catégories

En savoir plus sur Dates and Time dans Help Center et File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by