Populating matrix with data from a table

Question

Mihai Milea le 27 Nov 2020

0
Lien

Utiliser le lien direct vers cette question

https://fr.mathworks.com/matlabcentral/answers/665348-populating-matrix-with-data-from-a-table

Commenté : dpb le 30 Nov 2020

ii.csv

Hello .Sorry for this questions but I am not very familiar with the table function.I am attaching a csv file .I am reading this file which basically contains some stock prices for some days .I am trying to populate a matrix with the data .For some reason it looks really cumbersome to use the table function but it might be my stupidity.Before I was using textread,except I do not know in general how many names I have so is hard (or impossible) to do that.I have read the documentation for table but nothing seems to work!

so I do something like this

opts=detectImportOptions('ii.csv');
opts.VariableNamingRule = "preserve";
P=readtable('ii.csv',opts);
nDates=length(P(:,1);
nBlIds=length(P(1,:);
closes=nan*ones(nDates,nBlIds);
% here I write the date in different way 
trDates=P(:,1);tradingDates=table2array(trDates);
formatOut='yyyymmdd';
Dates=str2num(datestr(tradingDates,formatOut));

No clue how to populate the matrix using P.... All my attempts failed .Maybe is hard maybe is not but if not so obvious I fail to understand the purpose of this function if you cannot create a simple numerical matrix .Maybe useful for complicated files with weird types of variables .But in this case I just need to put the data(numerical) in some matrix .In Perl or Python this is a breeze ,unfortunately I need to use it with Matlab in this context .Any suggestion ,maybe using some other function will save me a lot of time

9 commentaires
Afficher 7 commentaires plus anciensMasquer 7 commentaires plus anciens

dpb le 28 Nov 2020

Modifié(e) : dpb le 28 Nov 2020

Ouvrir dans MATLAB Online

I don't see that as a string in the orignal file, S-S. That's probably a symptom after reading because the column was interpreted as 'char' first because the first entries for the specific column were NA.

If you instead fix the import options object to tell MATLAB to read all as double except the date column, other than the sheer number of variables there doesn't seem to be any problem.

Opening the file in the editor it looks like a normally-formatted csv file other than it is missing the header variable for Date in the first record which is why had to fix it up manually.

ADDENDUM:

>> [i,j]=ind2sub(size(ttII{:,:}),find(ismembertol(ttII{:,:},3913.0175,.0001)))
i =
          3.00
j =
        708.00
>> ttII(:,708)
ans =
  3×1 timetable
       Date        MAERSKADC
    ___________    _________
    14-Feb-2000         NaN 
    15-Feb-2000         NaN 
    16-Feb-2000     3913.02 
>> 
>> format long, format compact
>> ttII(:,708)
ans =
  3×1 timetable
       Date           MAERSKADC    
    ___________    ________________
    14-Feb-2000                 NaN
    15-Feb-2000                 NaN
    16-Feb-2000    3913.01751389947
>> ttII.MAERSKADC
ans =
   1.0e+03 *
                 NaN
                 NaN
   3.913017513899468
>> whos ans
  Name      Size            Bytes  Class     Attributes
  ans       3x1                24  double              
>> 

which shows that reading the file with the default variable type first is the problem.

dpb le 28 Nov 2020

Modifié(e) : dpb le 28 Nov 2020

"wish you luck in creating something useful out of this file!"

Look at the second Answer...it's really quite straightforward -- and if you haven't played with the import object much, quite a bit to be gained in seeing how to help MATLAB significantly in cases like this.

The problem is that the default scanning for variable types is not very in-depth inside the import functions in order to not detract too much from their performance for normal use...and so when it sees the initial 'NA" it just calls that a character variable.

The detectImportOptions function is more powerful and usually gets things right, but even in this case owing to the very short series, it still didn't find the columns like the one pointed out above that should be double.

But, we know all the data are numeric except the date-time column, so if we just tell MATLAB that by fixing up the import options object first, all is well.

This is a general "trick" of much value for specially-formatted files and/or cases like this where default action isn't powerful enough.

Mihai Milea le 29 Nov 2020

Modifié(e) : dpb le 30 Nov 2020

Ouvrir dans MATLAB Online

Thank you so much for the answers .I did this way ,not very elegant but suggested by these answers

Table=readtable('ii.csv',opts);
nDates=size(Table,1);
nIds=size(Table,2);
%% Convert to doubles
for i =1:nDates
  for j=1:nIds
    if isa(Table{i,j},'double')
      closes(i,j)=Table{i,j};
    end;
  end;
end;

dpb le 30 Nov 2020

Almost certainly didn't need to have done that.

You don't show what your next step is going to be, but to duplicate all the data you already have and to do so in a tight double loop is about as inefficient a use of MATLAB as is possible.

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Answer 1

dpb le 28 Nov 2020

0
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/665348-populating-matrix-with-data-from-a-table#answer_558538

Modifié(e) : dpb le 28 Nov 2020

Ouvrir dans MATLAB Online

Since have expanded somewhat, will add additional Answer to bring to forefront -- and illustrate couple other things...

Above we ended up with something like --

opt=detectImportOptions('ii.csv');                              % create base import options object
opt.VariableTypes=strrep(opt.VariableTypes,'char','double');    % all are double except time
opt.PreserveVariableNames='true';                               % and keep original symbol/name
opt.VariableNames(1)={'Date'};                                  % fix Date column name
opt=setvaropts(opt,'Date',"DatetimeFormat",'dd-MMM-uuuu');      % and a more concise display format

Now, since it is a set of time data, use the timetable object instead of just the straight table -- this gets you the date info essentially "for free" in conjunction with the data selection.

ttII=readtimetable('ii.csv',opt);                               % 't' or 'tt' prefix to remind of class

Now you can do addressing however wish for whatever purpose -- remember that "()" addressing on the table will return another table but "{}" or "dot" addressing the data.

With a plain table, though, the Date and data aren't the same type/class so you can't concatenate them--that's the advantage of the timetable; the date comes along for the ride...

So you can do things like--

>> ix=randperm(width(ttII),10)          % pick a random set of columns/symbols
ix =
        512.00       1168.00        899.00        177.00        286.00        463.00        739.00        975.00        159.00        734.00
>> format bank, format compact          % since are stock prices; more legible display format
>> ttII(:,ix)                           % retrieve those -- note result is another table w/ Date
ans =
  3×10 timetable
       Date        AGGLN    EMGSNO    SONIPL    CBGLN    ROGSW     JPRLN    FWBIM    PFCLN    NG_LN    ATLIM
    ___________    _____    ______    ______    _____    ______    _____    _____    _____    _____    _____
    14-Feb-2000    0.50      NaN       NaN      4.25     103.46    25.31     NaN      NaN     1.69     3.22 
    15-Feb-2000    0.50      NaN       NaN      4.15     102.14    25.55     NaN      NaN     1.66     3.24 
    16-Feb-2000    0.49      NaN       NaN      4.05     102.62    25.63     NaN      NaN     1.72     3.33 
>> ttII{:,ix}                           % retrieve as array -- free of the date
ans =
          0.50           NaN           NaN          4.25        103.46         25.31           NaN           NaN          1.69          3.22
          0.50           NaN           NaN          4.15        102.14         25.55           NaN           NaN          1.66          3.24
          0.49           NaN           NaN          4.05        102.62         25.63           NaN           NaN          1.72          3.33
>> 

Read up on tables and timetables -- there is a lot of processing power packaged to go with them -- particularly varfun, rowfun and friends as well as time selection with timerange

Don't sell MATLAB short...takes some time and effort to learn about syntax, but it's extremely powerful when do.

ADDENDUM:

However, you'll undoubtedly run into memory and perhaps performance issues if you try to load years of data for this many variables. I suggest to make the selection of variables needed at the time you read the data and only read in those really need.

You can do this by manipulating the 'SelectedVariableNames' property in the import options object. You could have ended up with the above table by

opt.SelectedVariableNames=opt.SelectedVariableNames(ix);

to have gotten the randomized selection or by any other chosen pattern.

1 commentaire
Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

dpb le 29 Nov 2020

" the Date and data aren't the same type/class so you can't concatenate them-"

Which is the thing then that for the most part you really won't want or need to extract the data into an array, anyway--use the tools for tables instead.

Connectez-vous pour commenter.

Answer 2

dpb le 27 Nov 2020

0
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/665348-populating-matrix-with-data-from-a-table#answer_558138

Ouvrir dans MATLAB Online

WOWSERS!!! You've got something like 1500 variables!!!??? That's definitely hard to deal with by variable name no matter what the language.

But, it's trivial to get the data from the table--

tII=readtable('ii.csv');                        % read the file
tII.Properties.VariableNames(1)={'Date'};       % make convenient date variable name

After this, simply

>> tII.SXXP
ans =
  232.2856
  227.4823
  230.4354
>> 

Returns the data from the first variable; not complicated at all.

You can also defeference by subscript; note that in that case use {} instead of () to return the data as native type; the parentheses will return a table object containing the referenced variable(s).

>> tII{:,2}
ans =
  232.2856
  227.4823
  230.4354
>> 

Given the time nature of the data, I'd suggest probably a timetable is more appropriate.

See the doc for how to reference data from table object -- there's a veritable plethora of options available depending on need--see <Access-data-in-a-table> for the details.

2 commentaires
Afficher AucuneMasquer Aucune

Mihai Milea le 27 Nov 2020

Yes I read that carefully -:)).And yet all I tried gives some sort of errors . For example : Say I have a vector of indices (not all 1500) baptized v. With your example tII(:,v') or tII{:,v'} cannot be used (or I do not know how) to populate that matrix closes .I get all sort of errors :cell cannot be converted to double,etc

dpb le 28 Nov 2020

Ouvrir dans MATLAB Online

You didn't convert the input to double for all the missing data that was interpreted as 'char' data where there are missing data.

Use the import options object to force that to happen...

>> opt=detectImportOptions('ii.csv');
>> unique(opt.VariableTypes)
ans =
  1×3 cell array
    {'char'}    {'datetime'}    {'double'}

shows that some values will be 'char' because there weren't enough samples in the input file for it to be able to determine really was supposed to be numeric. So, since know what want/need--

>> opt.VariableTypes=strrep(opt.VariableTypes,'char','double');
>> unique(opt.VariableTypes)
ans =
  1×2 cell array
    {'datetime'}    {'double'}
>>

Set all of those to also be 'double' before reading...then, I didn't look at the input file to see just why but noticed that the date column title wasn't recognized so may as well fix it now, too...

>> opt.VariableNames(1)={'Date'};

Then,

tII=readtable('ii.csv',opt);

and all your data will be numeric and no cell arrays be left.

It's tough to visualize such large datasets that aren't possible to look at on screen or even in many editors. Is there really a need for so many variables all at one time?

I also didn't look at the variable names in the file -- you'll have to choose which way to treat those -- if you want to use named variables for selection of a specific (set of) symbols, you may want to do as you did earlier and preserve those names despite them not being valid MATLAB names and perhaps harder to type. If not, you have potential to not have easy way to know what MATLAB named them as...again, all depends on just how you intend to/need to address so many variables.

You can create dynamic variable names from the string name property programmatically may be one way or build pick lists and search for where certain names are to return those.

However you choose to go, having 1500 variables is a challenge no matter what system you're using for anything other than fully automated analyses--just not feasible to do by hand.

Connectez-vous pour commenter.

Populating matrix with data from a table

9 commentaires
Afficher 7 commentaires plus anciensMasquer 7 commentaires plus anciens

Réponse acceptée

1 commentaire
Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

Plus de réponses (1)

2 commentaires
Afficher AucuneMasquer Aucune

Voir également

Catégories

Tags

Community Treasure Hunt

Populating matrix with data from a table

9 commentaires Afficher 7 commentaires plus anciensMasquer 7 commentaires plus anciens

Réponse acceptée

1 commentaire Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

Plus de réponses (1)

2 commentaires Afficher AucuneMasquer Aucune

Voir également

Catégories

Tags

Community Treasure Hunt

9 commentaires
Afficher 7 commentaires plus anciensMasquer 7 commentaires plus anciens

1 commentaire
Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

2 commentaires
Afficher AucuneMasquer Aucune