Effacer les filtres
Effacer les filtres

parcing comma delimited column to multiple vectors and cell arrays

2 vues (au cours des 30 derniers jours)
joseph Frank
joseph Frank le 7 Juil 2012
Hi,
I am importing a series of CSV files of 18 columns each with different row sizes (can be up to 800,000 rows) using teh following codes
for i=1:135
%%Import the data
fullFileName=sprintf('%s%d%s', 'C:\Users\Joseph\Documents\MATLAB\CS\CSV\',i, '.csv') ;
fid = fopen(fullFileName, 'rt');
M=textscan(fid,'%s','collectoutput',1,'headerlines',0);
fclose(fid);
X=M{1,1};
end
The issue is that X is a cell array in which the data is comma delimited. For instance the first two rows are the following: 1st row:
'CUSIP_ID,BOND_SYM_ID,COMPANY_SYMBOL,TRD_EXCTN_DT,TRD_EXCTN_TM,TRC_ST,ASCII_RPTD_VOL_TX,RPTD_PR,YLD_PT,DAYS_TO_STTL_CT,SALE_CNDTN_CD,SPCL_TRD_FL,DISS_RPTG_SIDE_CD,RPTD_HIGH_PR,HIGH_YLD_PT,RPTD_LOW_PR,LOW_YLD_PT,RPTD_LAST_PR'
2nd row
'00846UAG6,A.GF,A,1/3/2011,17:21:06,T,1700000,101.636,4.78396,0,A,,B,0,0,0,0,0'
The first row is the headers of the columns and the second row contains data. All I want is to create cell and numeric variables (depending on the type of data) where each variable has the name of the respective name in the headers and has the corresponding data from the rest of rows. i.e to create cell array called CUSIP_ID with the data {00846UAG6} and another vvector RPTD_PR=[101.636] etc...
is there a way to parce the data of X?
  1 commentaire
Jan
Jan le 8 Juil 2012
I do not understand the question. Would textscan(... 'delimiter', ',') solve the problem already?
Btw. it is called "parsing" with "s".

Connectez-vous pour commenter.

Réponses (1)

Walter Roberson
Walter Roberson le 8 Juil 2012
  3 commentaires
Jan
Jan le 8 Juil 2012
Is this really the same question as above?
C = {'CUSIP_ID', 'BOND_SYM_ID', 'COMPANY_SYMBOL');
FileName2 = ['Issuer' num2str(UIssuer(i))];
save(FileName2, C{:]});
Walter Roberson
Walter Roberson le 8 Juil 2012
Modifié(e) : Walter Roberson le 8 Juil 2012
You wrote,
All I want is to create cell and numeric variables (depending on the type of data) where each variable has the name of the respective name in the headers and has the corresponding data from the rest of rows.
You are therefore asking to compute variable names. It is not a good idea to do that; there are many associated problems.
In your situation, I recommend using dynamic field names in a structure, and then saving with save() and the -struct flag.
The parsing is easy:
fieldnames = regexp( FirstRow, ',', 'split');
fieldvals = regexp( SecondRow, ',', 'split');
tempcell = [fieldnames; fieldvals];
savestruct = struct( tempcell{:} );
save( FileName, 'savestruct', '-struct');
The step that this misses is converting numeric-looking fields to numeric values. In order to do that, you have to know ahead of time which fields must be numeric, or you have to set rules about the forms that are okay to convert to numeric. Keep in mind as you construct those rules that some strings that contain the characters 'e', 'E', 'i', 'I', '-', '+' or '.' are considered to be convertible to numeric, so you can end up surprised if something you "know" should be a text field just happened to contain "E0", which is interpretable as "0E0" which is 0.

Connectez-vous pour commenter.

Catégories

En savoir plus sur Data Type Conversion dans Help Center et File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by