MATLAB Answers

How to decipher date formats?

4 views (last 30 days)
Leon
Leon on 22 Mar 2020
Edited: dpb on 23 Mar 2020
My date columns stored in Excel can have various formats like the below:
'9/1/19'
'31-Dec-2020'
I'm reading them into Matlab using readtable and want to export them into a txt file with the format of '31-Dec-2020' for all of them.
How do I read them correctly in a holistical way?
Thanks

  2 Comments

Guillaume
Guillaume on 22 Mar 2020
I don't really understand what you mean by "Is it possible to get their datenum correctly in a holistical way?".
You haven't even explained how you get the excel data into matlab. Hopefully, you're using readtable or similar and not xlsread.
Secondly, why would you be using datenum when the much better datetime has been available since 2014b?
Leon
Leon on 22 Mar 2020
Sorry about that. I just updated the question.
Yes, I'm using readtable to load the info into Matlab. I don't have to use datenum, I just want to be able to export the info into a txt file all using the same date format.

Sign in to comment.

Accepted Answer

dpb
dpb on 22 Mar 2020
Edited: dpb on 22 Mar 2020
"Is it possible to get their [variant input and ambiguous formats - sic] datenum correctly in a holistical way?"
"Possible?" Maybe. Not easily, no
The first isn't possible to know if it stands alone whether it is Sept 1 or Jan 9 w/o worrying about century. If there are more values of the same format, one can scan until finds a field that is illegal with one interpretation and then presume the other. That's what the input routines do if not given a format string.
The second, of course, is trivial to convert once given the location of the pieces, but neither datenum nor datetime will handle mixed metaphors well automagically as both try to use the first convertible format they can decipher for the entire input array.
However, if you really can't fix the input format and can stand the overhead, datenum is somewhat more forgiving than datetime and you can try calling each in turn instead of as a group--then either datenum or datetime will try to figure out each on its own since has only the one string at a time. This, of course, will introduce a lot of overhead as the input parsing/detection logic is called every time for a random input format.
>> dstr % the sample date strings
dstr =
2×1 cell array
{'9/1/19' }
{'31-Dec-2020'}
>> datenum(dstr) % try datenum as a group -- can't deal with
Error using datenum (line 190)
DATENUM failed.
Caused by:
Error using dtstr2dtvecmx
Failed to convert from text to date number.
>> for i=1:2,datestr(datenum(dstr(i))),end % let datenum() try each in isolation...
ans =
'01-Sep-2019'
ans =
'31-Dec-2020'
>>
% Above does at least run...did it make the right guess for the first or not?
% No way to know in isolation; if it's right, still may get another wrong later or
% if wrong, another later may be right, anyways. No way to know...altho I suppose if
% read the details of the code could see what is the default assumption for each.
>> datetime(dstr) % try datetime as combination...
Warning: Successfully converted the text to datetime using the format 'MM/dd/uuuu', but the format is ambiguous and could also be 'dd/MM/uuuu'.
To create datetimes from text with a specific format call:
datetime(textinput,'InputFormat',infmt)
> In guessFormat (line 96)
In datetime (line 632)
ans =
2×1 datetime array
01-Sep-0019
NaT
>>
% Did first but used absolute year; second not same so failed...
>> for i=1:2,datetime(dstr(i)),end % try it one at a time...
Warning: Successfully converted the text to datetime using the format 'MM/dd/uuuu',
but the format is ambiguous and could also be 'dd/MM/uuuu'.
To create datetimes from text with a specific format call:
datetime(textinput,'InputFormat',infmt)
> In guessFormat (line 67)
In datetime (line 632)
ans =
datetime
01-Sep-0019
ans =
datetime
31-Dec-2020
>>
% Handles the second correctly, still uses uuuu for the year. Per the documentation,
% using 'PivotYear',2000 doesn't help because only considered if the 'infmt' string is
% specifically two-digit format...and if you specify that, the other one won't work.
And, of course, the above is only for two specific examples, all kinds of other things could happen even with datenum besides the above for other cases.
I don't know if anybody may have tried to build such a tool on File Exchange or not; would be first place I'd look, but fixing it at the input source would still be my recommendation and first choice by far.

  4 Comments

Show 1 older comment
dpb
dpb on 22 Mar 2020
If you know the format for each of a given style, then Steven's suggestion is the better by far but you'll need to parse each to know which format string to associate with each entry on the call.
The above response is mostly directed to the "holistic" approach requested which I interpreted as "let Matlab try to figure it out without user intervention".
Even if you do choose to do something such as I outlined above, I would still then recommend to wrap the result of the conversion to datenum in a call to datetime with the 'ConvertFrom' option. That could be done after the loop.
Still would be far better to fix the input source (or at least have a flag associated with it that indicates what the input format is).
dpb
dpb on 22 Mar 2020
_"you'll need to parse each to know which format string to associate with each entry on the call."-
Alternatively, if you've got to parse the fields anyway, just save the y,m,d values and call datetime with them instead of the text strings...
dpb
dpb on 23 Mar 2020
>> dstr
dstr =
2×1 cell array
{'9/1/19' }
{'31-Dec-2020'}
>> isMMM=contains(dstr,month(datetime(1,[1:12]',1),'shortname'))
isMMM =
2×1 logical array
0
1
>> pieces=split(dstr,{'/','-'})
pieces =
2×3 cell array
{'9' } {'1' } {'19' }
{'31'} {'Dec'} {'2020'}
>>
gives you the pieces needed and which need to be the two specific formats. If split the pieces apart can convert to [y,m,d] format first, then call datetime on the collection; otherwise still have to convert each individually which will be much slower for sizable arrays.

Sign in to comment.

More Answers (1)

Steven Lord
Steven Lord on 22 Mar 2020
It depends. Is that first date supposed to be:
  • the 9th of January, 2019 (day/month/year, increasing order of size of units)
  • the 1st of September, 2019 (month/day/year)
  • (less commonly) the 19th of January, either year 9 AD or year 2009 (sometimes people abbreviate the release numbers of MATLAB, referring to 9a instead of R2009a, so interpreting 9 as 2009 has some precedent) (year/month/day, decreasing order of size of units)
  • or something else?
But I agree with Guillaume. Don't try to read these in as serial date numbers. Read them in as text and convert them into date and time objects using datetime, specifying the InputFormat option. See the section of the datetime documentation page that describes the meaning of the various parts of the Format option (which also work for the InputFormatt option.) If you use something like readtable you can combine those two steps.

  1 Comment

Leon
Leon on 22 Mar 2020
Thank you for the recommendations, Steven.

Sign in to comment.

Sign in to answer this question.

Tags

Products


Release

R2019b

Translated by