Populating matrix with data from a table
Afficher commentaires plus anciens
Hello .Sorry for this questions but I am not very familiar with the table function.I am attaching a csv file .I am reading this file which basically contains some stock prices for some days .I am trying to populate a matrix with the data .For some reason it looks really cumbersome to use the table function but it might be my stupidity.Before I was using textread,except I do not know in general how many names I have so is hard (or impossible) to do that.I have read the documentation for table but nothing seems to work!
so I do something like this
opts=detectImportOptions('ii.csv');
opts.VariableNamingRule = "preserve";
P=readtable('ii.csv',opts);
nDates=length(P(:,1);
nBlIds=length(P(1,:);
closes=nan*ones(nDates,nBlIds);
% here I write the date in different way
trDates=P(:,1);tradingDates=table2array(trDates);
formatOut='yyyymmdd';
Dates=str2num(datestr(tradingDates,formatOut));
No clue how to populate the matrix using P.... All my attempts failed .Maybe is hard maybe is not but if not so obvious I fail to understand the purpose of this function if you cannot create a simple numerical matrix .Maybe useful for complicated files with weird types of variables .But in this case I just need to put the data(numerical) in some matrix .In Perl or Python this is a breeze ,unfortunately I need to use it with Matlab in this context .Any suggestion ,maybe using some other function will save me a lot of time
9 commentaires
Star Strider
le 28 Nov 2020
The file has some severely serious problelms.
Among others, several entries are numeric strings, such as:
{'3913.0175138994…'}
in (3,709) that is essentially impossible to convert to floating-point values using str2double, str2num, or anything else, and in some instances, it is not even possible to find them using normal indexing approaches, whether attempting to use vectorised indexing or kludging it and looping through them (the given even the size of the example takes a while). I was able to convert all the 'NA' to -9999 (because using NaN makes it impossible to do any calculations with the table), however there are remaining NaN values as well, that I could not either detect or replace.
After about 5 hours working on it, I finally just gave up.
I don't see that as a string in the orignal file, S-S. That's probably a symptom after reading because the column was interpreted as 'char' first because the first entries for the specific column were NA.
If you instead fix the import options object to tell MATLAB to read all as double except the date column, other than the sheer number of variables there doesn't seem to be any problem.
Opening the file in the editor it looks like a normally-formatted csv file other than it is missing the header variable for Date in the first record which is why had to fix it up manually.
ADDENDUM:
>> [i,j]=ind2sub(size(ttII{:,:}),find(ismembertol(ttII{:,:},3913.0175,.0001)))
i =
3.00
j =
708.00
>> ttII(:,708)
ans =
3×1 timetable
Date MAERSKADC
___________ _________
14-Feb-2000 NaN
15-Feb-2000 NaN
16-Feb-2000 3913.02
>>
>> format long, format compact
>> ttII(:,708)
ans =
3×1 timetable
Date MAERSKADC
___________ ________________
14-Feb-2000 NaN
15-Feb-2000 NaN
16-Feb-2000 3913.01751389947
>> ttII.MAERSKADC
ans =
1.0e+03 *
NaN
NaN
3.913017513899468
>> whos ans
Name Size Bytes Class Attributes
ans 3x1 24 double
>>
which shows that reading the file with the default variable type first is the problem.
Star Strider
le 28 Nov 2020
There are a large number of 'NA' as well as the numeric strings that of course do not import as numeric and so are displayed as NaN when read as numeric, even though there are floating-point values in many of them.
For example, the third row of ‘MAERSKA DC’ imports as a string variable '3913.0175138994678' that refuses to convert to a floating-point value.
I will leave you to it, and I sincerely wish you luck in creating something useful out of this file! It has defied all my attempts!
dpb
le 28 Nov 2020
>> ttIIbad=readtimetable('ii.csv'); % read as default import scan finds it...
>> ttIIbad(:,708)
ans =
3×1 timetable
Var1 MAERSKADC
____________________ ______________________
14-Feb-2000 00:00:00 {'NA' }
15-Feb-2000 00:00:00 {'NA' }
16-Feb-2000 00:00:00 {'3913.0175138994678'}
>> ans.MAERSKADC=str2double(ans.MAERSKADC)
ans =
3×1 timetable
Var1 MAERSKADC
____________________ ________________
14-Feb-2000 00:00:00 NaN
15-Feb-2000 00:00:00 NaN
16-Feb-2000 00:00:00 3913.01751389947
>>
"wish you luck in creating something useful out of this file!"
Look at the second Answer...it's really quite straightforward -- and if you haven't played with the import object much, quite a bit to be gained in seeing how to help MATLAB significantly in cases like this.
The problem is that the default scanning for variable types is not very in-depth inside the import functions in order to not detract too much from their performance for normal use...and so when it sees the initial 'NA" it just calls that a character variable.
The detectImportOptions function is more powerful and usually gets things right, but even in this case owing to the very short series, it still didn't find the columns like the one pointed out above that should be double.
But, we know all the data are numeric except the date-time column, so if we just tell MATLAB that by fixing up the import options object first, all is well.
This is a general "trick" of much value for specially-formatted files and/or cases like this where default action isn't powerful enough.
Star Strider
le 28 Nov 2020
Nothing I tried worked.
dpb
le 28 Nov 2020
Yeah, if read in with default, cleaning it up afterwards would undoubtedly be a pain. I didn't really even try that route although I'm not sure why you had such trouble with str2double, it worked on the one suspect column just fine here.
What is good is that since "Father knows best!" that all data are numeric and so can force the issue. After that, "piece o' cake!"
Mihai Milea
le 29 Nov 2020
Modifié(e) : dpb
le 30 Nov 2020
dpb
le 30 Nov 2020
Almost certainly didn't need to have done that.
You don't show what your next step is going to be, but to duplicate all the data you already have and to do so in a tight double loop is about as inefficient a use of MATLAB as is possible.
Réponse acceptée
Plus de réponses (1)
dpb
le 27 Nov 2020
WOWSERS!!! You've got something like 1500 variables!!!??? That's definitely hard to deal with by variable name no matter what the language.
But, it's trivial to get the data from the table--
tII=readtable('ii.csv'); % read the file
tII.Properties.VariableNames(1)={'Date'}; % make convenient date variable name
After this, simply
>> tII.SXXP
ans =
232.2856
227.4823
230.4354
>>
Returns the data from the first variable; not complicated at all.
You can also defeference by subscript; note that in that case use {} instead of () to return the data as native type; the parentheses will return a table object containing the referenced variable(s).
>> tII{:,2}
ans =
232.2856
227.4823
230.4354
>>
Given the time nature of the data, I'd suggest probably a timetable is more appropriate.
See the doc for how to reference data from table object -- there's a veritable plethora of options available depending on need--see <Access-data-in-a-table> for the details.
2 commentaires
Mihai Milea
le 27 Nov 2020
dpb
le 28 Nov 2020
You didn't convert the input to double for all the missing data that was interpreted as 'char' data where there are missing data.
Use the import options object to force that to happen...
>> opt=detectImportOptions('ii.csv');
>> unique(opt.VariableTypes)
ans =
1×3 cell array
{'char'} {'datetime'} {'double'}
shows that some values will be 'char' because there weren't enough samples in the input file for it to be able to determine really was supposed to be numeric. So, since know what want/need--
>> opt.VariableTypes=strrep(opt.VariableTypes,'char','double');
>> unique(opt.VariableTypes)
ans =
1×2 cell array
{'datetime'} {'double'}
>>
Set all of those to also be 'double' before reading...then, I didn't look at the input file to see just why but noticed that the date column title wasn't recognized so may as well fix it now, too...
>> opt.VariableNames(1)={'Date'};
Then,
tII=readtable('ii.csv',opt);
and all your data will be numeric and no cell arrays be left.
It's tough to visualize such large datasets that aren't possible to look at on screen or even in many editors. Is there really a need for so many variables all at one time?
I also didn't look at the variable names in the file -- you'll have to choose which way to treat those -- if you want to use named variables for selection of a specific (set of) symbols, you may want to do as you did earlier and preserve those names despite them not being valid MATLAB names and perhaps harder to type. If not, you have potential to not have easy way to know what MATLAB named them as...again, all depends on just how you intend to/need to address so many variables.
You can create dynamic variable names from the string name property programmatically may be one way or build pick lists and search for where certain names are to return those.
However you choose to go, having 1500 variables is a challenge no matter what system you're using for anything other than fully automated analyses--just not feasible to do by hand.
Catégories
En savoir plus sur Data Type Conversion dans Centre d'aide et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!