Hi All, I have a directory with unknown number of text files.
fid = dir('*.txt');
How do I fprintf the timestamps (file.date) of all of these files to a csv file? I tried using the datestr() function,datetime(),cell2mat(), brackets {} and [], & cell2str() functions but they don't work and generate errors when I tried to fprinf it to the csv.
Any pointers would help.
Thanks,

 Réponse acceptée

dpb
dpb le 12 Sep 2016
Modifié(e) : dpb le 12 Sep 2016

0 votes

fid isn't good variable name for a directory list; too much associated with a file handle.
d=dir('*.txt'); % the directory structure
fid=fopen('file.date','w'); % open an output file
cellfun(@(s) fprintf(fid,'"%s"\n',s),cellstr(char(d.date))) % and write them out quote-delimited
fid=fclose(fid); % done--close file

2 commentaires

Thanks! I was able to incorporate this line into another series of loops so I can print the timestamp out with other stats and file info I need from the data.
cellfun(@(s) fprintf(fid,'"%s"\n',s),cellstr(char(d.date)))
Several questions though, 1. Why do I need to use the cell function for fprintf the "s"? 2. What makes datetime special that it has to be quote delimited in order to be printed as a string? Sorry if my question doesn't make any sense, since I'm still a novice in cell-handling.
Walter Roberson
Walter Roberson le 13 Sep 2016
You do not need to use a cellfun to handle the printing: the solution I gave using a temporary variable handles the task without using cellfun.
The timestamps include characters such as '-' and ':' and spaces that are not valid numbers. The format for csv files requires that strings in csv files be enclosed in double-quotes. The double-quotes being used there are not for the purpose of getting MATLAB to emit strings: they are there to get MATLAB to emit strings that are properly-formatted csv

Connectez-vous pour commenter.

Plus de réponses (1)

Walter Roberson
Walter Roberson le 13 Sep 2016

1 vote

d = dir('*.txt');
dates = {d.date};
fid = fopen('file.csv', 'wt');
fprintf(fid, '"%s"\n', dates{:});
fclose(fid);
No loop is needed.

19 commentaires

Stephen23
Stephen23 le 13 Sep 2016
Modifié(e) : Stephen23 le 13 Sep 2016
+1 This is the best way to solve this task.
dpb
dpb le 13 Sep 2016
Same essentially as mine other than explicitly creating the char() array from a separate cell array instead of "on-the-fly" in a behind the scenes temporary...another place where extended dereferencing syntax would be a_good_thing (tm) to get around the difficulties of fprintf and friends not being cell-fluent.
Tammy Chen
Tammy Chen le 13 Sep 2016
Thanks for clarifying the baby steps for a novice here. Just curious, I've seen this {:} came up in many of my errors when I tried using horzcat or vertcat functions for data sets I want to xlswrite. What does this {:} exactly do to a string??
dpb
dpb le 13 Sep 2016
Modifié(e) : dpb le 14 Sep 2016
The "curlies" are to dereference a cell array's content; it's not specific to strings, it's Matlab syntax to access cell array content. See the doc for cell arrays and links therefrom on accessing cell array data for the details.
The ":" colon operator is simply again Matlab syntax for arrays in general; it's shorthand for "all elements in the array irrespective of size and shape". Again, see the documentation under
doc colon % for specifics.
I'd recommend opening the online documentation and working thru the "Getting Started" section tutorials to get these most basic syntax elements down first--your progress will be much faster for the investment of a little time up front learning the basics and certainly far faster than waiting for somebody here to answer every detail.
As for using cellfun vis a vis not, it's a stylistic choice; I prefer when possible to not create extra named temporaries in the workspace but use anonymous ones which will automagically be discarded. In a single-purpose function it doesn't really make a lot of difference as the temporary will also be destroyed when the function exits but in a script or a larger function it's convenient to not necessarily "clutter up" the name space with stuff that's only needed temporarily.
ADDENDUM
Some additional amplification on the expression cellstr(char(d.date), the argument to cellfun. d.date by itself returns the multiple values returned as a comma-separated list (and yes, you need to look that up in the documentation, too :) ) rather than an array; char turns that list into a character string array and cellstr collects that into a cellstring array. cellfun then process each of those elements in turn.
The reason for this is owing to the difference between cell strings and character strings; the latter is a 1D array of characters and one must reference each string within the array as the (row,:) to get all the characters in each row. OTOH, cell strings are stored as the whole string in each cell and so referring to element (1) returns the string (as cell string); using the {} as {1} gets the whole string as the underlying characters.
Example...
>> d=dir('*.m'); % a local directory listing just for show...
>> d % show what dir returns is struct array
d =
113x1 struct array with fields:
name
date
bytes
isdir
datenum
>> d.date % what struc array altogether returns...
ans =
27-Apr-2016 08:40:50
ans =
24-May-2014 13:03:30
ans =
24-May-2014 16:49:24
...
ans =
08-Dec-2014 10:37:18
>>
Note the multiple answers -- 113 of 'em in this case.
>> dates=char(d.date); % return the dates as char() array
>> whos dates
Name Size Bytes Class Attributes
dates 113x20 4520 char
>> dates(1,:) % note the trailing colon and that dates is 2D array
ans =
27-Apr-2016 08:40:50
>> dates(1) % if just use one subscript; get just one character
ans =
2
>> dates=cellstr(dates); % OK, that's inconvenient so convert to cellstr
>> whos dates % now see it's a cell array of 113 length
Name Size Bytes Class Attributes
dates 113x1 11300 cell
>> dates(1) % and the single subscript returns the cell string
ans =
'27-Apr-2016 08:40:50'
>> whos dates % what's the class???
Name Size Bytes Class Attributes
dates 1x1 100 cell
>>
OK, note that is a single cell and also note that the display is surrounded by single quotes--this is a key item to note; whenever you see that in the command window it's the klew that the object is a cell string, not a character string and fprintf and friends will have to use the dereferencing operator {} in order to use them...
>> dates{1} % OK, now use the curlies, Luke! :)
ans =
27-Apr-2016 08:40:50
>> whos ans
Name Size Bytes Class Attributes
ans 1x20 40 char
>>
Now note that the string content alone is shown without the quotes and the class is char.
Here's where we were heading from the git-go as that's what fprintf needs as its input.
Walter Roberson
Walter Roberson le 14 Sep 2016
Yes, I used your framework but modified to not need the cellfun.
dpb
dpb le 14 Sep 2016
Which is probably more suitable to the neophyte, granted and worthy of illustrating as well. I tend to try to not build temporaries when avoidable as noted in the tutorial and figure it's worth exposing some of the more exotic features as well as pedagogical instrument... :)
Walter Roberson
Walter Roberson le 14 Sep 2016
Temporaries are always faster than arrayfun() or cellfun()
(except, of course, for some cases where the output size of the cellfun() is much smaller than the size of the inputs and you are on the verge of needing to swap to disk, in which case the temporary could trigger swapping where the cellfun() might not.)
dpb
dpb le 14 Sep 2016
Probably but I don't ever think of running Matlab where data sizes are likely to cause a noticeable delay...and I'd expect TMW to continue to improve the JIT optimizer w/ time as well anyways... :)
Walter Roberson
Walter Roberson le 14 Sep 2016
Even with the improved JIT, the documentation for timer() indicates that timers can interrupt between any two source lines (but not within a single source line.) This places a limitation on speed: an operation written on a single line will be faster than the same operation written as a function call, because the function call would need the extra overhead of tracking which line it is executing on.
On the other hand, having just written that it, I suppose it would be theoretically possible for the data structures to have a per-atom reference to the source together with a per-atom flag indicating whether the atom is the first operation on the line or not, and the interpreter could check the flag for every atom it interprets in order to determine whether it should check for interrupts or not. Though that would still result in more interrupt checks... Mumble mumble mumble...
dpb
dpb le 14 Sep 2016
"an operation written on a single line will be faster than the same operation written as a function call, because the function call would need the extra overhead of tracking which line it is executing on."
I'm certainly no compiler writer so I'm just spouting but is there any theoretical reason TMW couldn't eventually even extend to inlining functions, etc., etc., ...???
dpb
dpb le 14 Sep 2016
But, duly chastened, I did time it at command line and to my surprise it isn't just a little different, cellfun is 10X slower. That does surprise me and while 30 msec for the 113 elements isn't enough to see interactively, it does look like it might add up more quickly than I had presumed the difference would be...guess if I ever get back to actually consulting "in anger" (and that's highly unlikely at this point to the virtual exclusion of any chance't) would have to temper my dislike of temporaries... :)
Walter Roberson
Walter Roberson le 14 Sep 2016
One of the classic difficulties with inlining functions is getting right breakpoints, and error messages (e.g., that refer to variable names.) The optimizer might move statements or sub-expressions around to increase locality of reference or to better match patterns exploitable by LINPACK or BLAS, so stepping through with a debugger can get pretty confusing. It is all easier if you don't have to be able to debug ;-)
dpb
dpb le 15 Sep 2016
Well, I'd think debugging would simply turn of the optimizer; what happens with most compiled languages (albeit one generally has to do it yourself). But, inlining specifically was just a general area; I'd think the possibilities are limited only by how long it takes to do the analysis/codegen and with faster processors, it'd seem there'd be a long ways to go yet in that direction. Again, "jes' talkin'/guessin'"...
Walter Roberson
Walter Roberson le 15 Sep 2016
You might turn off the optimizer when there is a breakpoint (any breakpoint) set, but you still need to be able to correctly identify variables by name for error messages (and warnings), and you still need to generate correct source references in case of exceptions, and those have to work even when no debugging is explicitly taking place. try/catch is actively used by correct code for situations other than just doing nice handling of error conditions and returning to caller.
dpb
dpb le 15 Sep 2016
Sure there are small areas that may be problematical but the amount of code extant with try/catch as compared to that without (and even that which uses it it's generally pretty isolated) is miniscule so would still seem there's much room for expansion in the area. Anyway, that's why TMW developers "get the big bucks!" :)
I'm still surprised by a 10X penalty on the cellfun in the example here---I expected and willing to accept some owing to the indirect addressing implied but that seems like a lot and that there should be solutions to address ("should" in the theoretical sense implied more than as immediate altho it does seem worthy of a look-see if that is a general result and not specific which doesn't seem should be anything that unique here).
While I've not tried now for a while and should, after the switch to HG2 and other implementations of more and more of the object-oriented stuff, recent implementations have been unusable on my old machine here which is max-ed out on memory and while pretty competitive at the time it was new, it's now quite long-in-the-tooth as computer hardware goes. It's clear from that that performance has taken a hit in implementing much of the new paradigm and by adding so many features; new hardware can mask some of that but there are many complaints on performance so clearly it's an area TMW needs to concentrate on (not that they aren't, I'm sure).
Walter Roberson
Walter Roberson le 15 Sep 2016
R2016b specifically addressed improving performance for object creation. R2016a improved JIT, so the cellfun might have improved by R2016a already.
R2016b also included a string data type (for reasons that are less than obvious to me); the kind of operation used here would have to be investigated to determine whether it could benefit from strings.
However, R2016b introduced something that I am not keen on at all: now for non-vector cell arrays, TheCell{:} is an error! It is now necessary to reshape the cell array to a vector before using {:} on it! Though i do not know what the result of TheCell{:,:} would be.
dpb
dpb le 16 Sep 2016
That latter would indeed seem to be a pita...
I suppose a string array variable is addressable by a single index a la cellstr excepting not needing curlies? Are they fixed-length a la Fortran CHARACTER or variable/counted? Can different lengths then live in the same array transparently?
Walter Roberson
Walter Roberson le 16 Sep 2016
They are not fixed length.
dpb
dpb le 16 Sep 2016
I had presumed not; wondered, though, about an array then...guess they're more classes/objects overhead to deal with, then. At this stage of the game I wonder why also but would likely have been preference over the cellstr route they took first...

Connectez-vous pour commenter.

Catégories

En savoir plus sur Programming Utilities dans Centre d'aide et File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by