How do I determine the number of data in a 'messy' CSV and then import it?
1 vue (au cours des 30 derniers jours)
Afficher commentaires plus anciens
Daniel Bridges
le 22 Jan 2018
Modifié(e) : Daniel Bridges
le 23 Jan 2018
I have a CSV file that looks like the following:
"","JOHN DOE","p0123TP01, Approved"
"","Rectum (4)(Volume: 57.77)","CTV (21)(Volume: 32.39)"
"0.0","100.0","100.0"
"0.1","78.01999","100.0"
"0.2","76.2245","100.0"
"0.3","75.21035","100.0"
[skipping to the end of the file]
"58.1","","1.88277"
"58.2","","0.21338"
"58.3","","0.0"
but the number of data sets can vary, so I must write a flexible script to accomodate however many data rows and columns exist:
"","DOE, JOHN","revised blurring for Case 16, ANON, original; calculating correlation between blurred dose and toxicity","","","","","","","","","","","","","",""
"","patient (1)(Volume: 18578.06)","Prostate (2)(Volume: 19.80)","Proximal-SV (3)(Volume: 6.40)","Rectum (4)(Volume: 70.44)","Bladder (5)(Volume: 143.70)","Intestine (6)(Volume: 8.40)","CutLine (7)(Volume: 415.77)","Air (8)(Volume: 5.06)","InsideRectum (9)(Volume: 36.17)","PTV1 (10)(Volume: 76.77)","PTVoriginal (11)(Volume: 80.89)","PTV2 (12)(Volume: 56.52)","Guide1 (13)(Volume: 81.79)","Guide2 (14)(Volume: 68.50)","guide (15)(Volume: 66.91)","CTV-operator (16)(Volume: 26.15)"
"0.0","100.0","100.0","100.0","100.0","100.0","100.0","100.0","100.0","100.0","100.0","100.0","100.0","100.0","100.0","100.0","100.0"
"0.1","12.88921","100.0","100.0","69.70608","73.36377","2.54878","82.87909","16.40558","73.91824","100.0","100.0","100.0","100.0","100.0","100.0","100.0"
"0.2","12.06455","100.0","100.0","67.50839","69.03589","0.88936","79.92729","16.40558","71.92567","100.0","100.0","100.0","100.0","100.0","100.0","100.0"
"0.3","11.56745","100.0","100.0","65.82671","67.72518","0.32793","77.7594","16.40558","70.44129","100.0","100.0","100.0","100.0","100.0","100.0","100.0"
"0.4","11.09599","100.0","100.0","64.40629","66.56731","0.15624","75.78776","16.40558","69.10236","100.0","100.0","100.0","100.0","100.0","100.0","100.0"
"0.5","10.1524","100.0","100.0","63.25727","65.87648","0.07211","73.65644","16.40558","68.12462","100.0","100.0","100.0","100.0","100.0","100.0","100.0"
"0.6","9.48733","100.0","100.0","61.64318","64.92351","0.0103","67.57963","16.40558","65.96868","100.0","100.0","100.0","100.0","100.0","100.0","100.0"
"0.7","9.1257","100.0","100.0","60.35885","62.43443","0.0","59.25954","16.40558","65.12147","100.0","100.0","100.0","100.0","100.0","100.0","100.0"
[... skipping to end of file ...]
"58.1","0.00498","4.57181","0.23601","","","","","","","1.20431","1.143","1.63577","1.13037","1.34968","1.38174","3.51704"
"58.2","4.9E-4","0.46111","0.0","","","","","","","0.1189","0.11285","0.1615","0.1116","0.13326","0.13642","0.34903"
"58.3","0.0","0.0","","","","","","","","0.0","0.0","0.0","0.0","0.0","0.0","0.0"
What textscan formatSpec do I need to store this data? The following code results in a 1x1 cell array with element 0x2 empty double:
function [StudyIDcol,DistributionType,OrganSpecification,Doses,Volumes,DoseUnit,VolumeUnit] = CreateTableVariables(folderpath,type,StudyID)
% First we read the file. Code courtesy of Walter Roberson of the MathWorks community
fid = fopen([folderpath,type,'/',StudyID,'.csv'],'rt'); % rt = 'read in text mode'
if fid == -1 % added for debugging at the suggestion of Jan Hansen of the MathWorks community
error('Author:Function:OpenFile', 'Cannot open file');
end
num = 0;
% headers = cell % unfinished
while true %endless loop requiring 'break' command to exit!
H1 = fgetl(fid) ;
if feof(fid); break; end
H2 = fgetl(fid) ;
if feof(fid); break; end
datacell = textscan(fid, '%f%f', 'Delimiter', ',', 'CollectOutput', true) ; % originally '%f%f'
if isempty(datacell) || isempty(datacell{1}); break; end
% if any(isnan(datacell{1}(end,:))); datacell{1}(end,:) = []; end % commented out to verify the above command quits the loop
num = num + 1;
headers(num,:) = {H1, H2} ;
data(num) = datacell; % We store it to the data cell array only after checking that it contains data.
%fgetl(fid); %the empty line between organs
end
fclose(fid);
I am reading the textscan formatSpec documentation and more trying to solve this problem. Please help me understand how to use textscan.
I am now trying to write a script to flexibly import such files; I must learn how to scan it to count the number of rows and columns for the data beginning at "0.0","100.0","100.0" (not necessarily this number each time) ...
%function data = ReadMIMDVH(filepath)
filepath = '/home/sony/Documents/research/data/MIM Cumulative DVH/planned/1.csv';
% We write a script specifically to import MIM data.
% First we scan the file to see how many rows of data it has.
% We construct the format spec based on this number of rows and columns.
formatSpec = ['%*s %*s %*s %*s %q %q',repmat('%f',[rows,columns])];
% We read the file using textscan.
fileID = fopen(filepath);
DVH = textscan(fileID,formatSpec,'Delimiter',{',' '/n'});
%end
0 commentaires
Réponse acceptée
Guillaume
le 22 Jan 2018
t = readtable(filepath); %all done file is read and parsed.
5 commentaires
Guillaume
le 23 Jan 2018
My next task is to learn how to parse the Volume variable names to strip the underscore and all characters after it.
A regexprep should do that very easily. Explain exactly what you want.
Plus de réponses (1)
Walter Roberson
le 22 Jan 2018
opts = detectImportOptions(filepath);
nvar = length(opts.VariableNames);
opts.VariableNames = sprintfc('Var%d', 1:nvar);
test = readtable(filepath, opts);
0 commentaires
Voir également
Catégories
En savoir plus sur Data Import and Export dans Help Center et File Exchange
Produits
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!