What is the best way to save to .csv without losing precision and with column headers?

47 vues (au cours des 30 derniers jours)
tl;dr: What's the best way to save an m x n array from matlab to csv, with column headers that can be any text format (ie, include spaces and [ ]), without losing precision or clipping large values?
I am using Matlab (usually 2018a on Windows 7) to do various data manipulations on localizations from single molecule microscopy. These data tables are in large m x n arrays where each row is a molecule and each column is some characteristic about the molecule (for example, it's position in x y and z, its intensity, etc). These arrays are usually ~ 100,000 x 17 to >1,000,000 x 17.
In order to visualize the data and do some further processing I don't want to do in Matlab, I use another program called ThunderSTORM, a plugin for the image analysis suite FIJI. ThunderSTORM requires a .csv file with specific column headers that denote the measure and the units. Usually the headers are of the form, for example, "x [nm]", "y [nm]", "intensity [photons]", etc.
I had been using dlmwrite to write a header line to csv, then write the data underneath that, which works, except that a) dlmwrite truncates at 4 decimal plates, which is a problem, and b) dlmwrite for some reason is treating numbers greater than 9.9999 * 10^4 as just the base value without the exponent (so, 1.001 * 10^5 imports as 1.001 only). This is a problem for filtering the data as extreme values are either retained or removed.
When I used writetable instead of dlmwrite, I did not have this behavior and values >10^5 imported correctly, so it's not a ThunderSTORM thing. The only problem with using writetable is that the column names must follow the Matlab rules for variable names, so I cannot use "x [nm]" as a header. When ThunderSTORM doesn't get the column headers it expects, it sometimes tries to do mathematical conversions on that data, which I don't want (for example, it thinks the x values are in pixels, and tries to convert them to nm, when they are already in nm).
So my question is, what is the best way to save this data out to .csv, with the appropriate headers, without clipping large data (still not sure why dlmwrite is doing that) or losing precision? I was able to make a solution using fprintf, but it's clunky, and I'm wondering if there is a better way.
% what I want writetable to do, but it doesn't because the ColName variables are not allowed.
ColName = {'id' 'frame' 'x [nm]' 'y [nm]' 'z [nm]' 'sigma1 [nm]' 'sigma2 [nm]' 'intensity [photons]'...
'bkg [photons]' 'peakphot [photons]' 'crlbx [nm]' 'crlby [nm]' 'error [nm]' 'channel' 'area' 'firstrankarea' 'firstrankdensity'};
data = array2table(Tess561,'VariableNames',ColName);
% my temporary solution that seems clunky, but does do what I want.
x = mydata(:,1:17);
fid = fopen('ouputfile.csv','w');
fprintf(fid,'id,frame,x [nm],y [nm],z [nm],sigma1 [nm],sigma2 [nm],intensity [photons],bkg [photons],peakphot [photons],CRLBX [nm],CRLBY [nm],uncertainty_xy [nm],channel,area,firstrankarea,firstrankdensity\n');
for i = 1:length(x)
fprintf(fid,'%f , %f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f\n',x(i,1),x(i,2),x(i,3),x(i,4),x(i,5),x(i,6),x(i,7),x(i,8),x(i,9),x(i,10),x(i,11),x(i,12),x(i,13),x(i,14),x(i,15),x(i,16),x(i,17));

Réponse acceptée

James Tursa
James Tursa le 28 Août 2019
Modifié(e) : James Tursa le 28 Août 2019
Why are you using fprintf in a loop? Can't you do it all in one call? E.g., this
for i = 1:length(x)
fprintf(fid,'%f , %f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f\n',x(i,1),x(i,2),x(i,3),x(i,4),x(i,5),x(i,6),x(i,7),x(i,8),x(i,9),x(i,10),x(i,11),x(i,12),x(i,13),x(i,14),x(i,15),x(i,16),x(i,17));
becomes this (the transpose is to get the x elements in memory in the desired printing order)
fprintf(fid,'%f , %f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f\n',x.');
P.S., Since you are creating a text file instead of a binary file, you should probably be using 'wt' instead of 'w' in your fopen( ) call.
  4 commentaires
James Tursa
James Tursa le 3 Sep 2019
Note that you can specify the number of digits to print. E.g., %15.7f

Connectez-vous pour commenter.

Plus de réponses (1)

Luna le 28 Août 2019
Have you tried the functions writematrix or csvwrite ?
csvwrite is not recommended right now but you can try it.
  1 commentaire
GolgiWhillikers le 28 Août 2019
I'll note that my for loop fprintf solution takes foreeevverrrr for the number of localizaitons I have (~2 million). Like minutes.
I had looked at writematrix, but it doesn't allow headers, and I can't use it in Matlab 2018, which I am stuck with at the moment. The problem with csvwrite is the same as baseline dlmwrite - it will truncate at 5 significant digits, which is I suppose what's happening with dlmwrite described above (ie, 100,000 becomes 1).
I guess I can specify some insane precision in dlmwrite, like precision 20 or something, since sometimes I have values that are like 4000.092583028582. Setting the precision super high does work, but does seem a little excessive. But if that's what I have to do, at least it does work (and is much faster than the loop, unsurprisingly).
fid = fopen(savename,'w');

Connectez-vous pour commenter.


Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by