Hello,
I'd like to generate a set of indicies for my tall datastore in advance so, when I want to select rows, it will be fast.
Here is what I came up with.
I expect there is a much better way. Please help me improve it.
FYI, the datastore will be several million rows by four columns. I ran the following test on a very small file.
Thanks
Michael
Code:
ds = datastore(fname);
ds.VariableNames = {'DATE','TICKER','FIELD','VALUE'};
ds.SelectedFormats(1:3) = {'%{MM/dd/uuuu}D','%C','%C'};
tds = tall(ds);
gDATE = unique(tds.DATE);
gTICKER = unique(tds.TICKER);
gFIELD = unique(tds.FIELD);
[uniqueDates, uniqueTickers, uniqueFields] = gather(gDATE,gTICKER,gFIELD);
for iTicker = 1:length(uniqueTickers)
gTicker = find(tds.TICKER==uniqueTickers(iTicker));
idxTicker{iTicker} = gather(gTicker);
end
for iField = 1:length(uniqueFields)
gField = find(tds.FIELD==uniqueFields(iField));
idxField{iField} = gather(gField);
end
tTickerIndex = array2table([cellstr(uniqueTickers) idxTicker.'], ...
'VariableNames',{'TICKER','INDEX'});
tTickerIndex.TICKER= categorical(tTickerIndex.TICKER)
tFieldIndex = array2table([cellstr(uniqueFields) idxField.'], ...
'VariableNames',{'FIELD','INDEX'})
tFieldIndex.FIELD = categorical(tFieldIndex.FIELD)
out = tds(tFieldIndex.INDEX{tFieldIndex.FIELD=='EQY_DVD_YLD_IND'}(1:10),:);
gather(out)
Output
tTickerIndex = 1×2 table
TICKER INDEX
_______________ ________________
DFIVX US EQUITY [34112×1 double]
tFieldIndex = 6×2 table
FIELD INDEX
________________________________ _______________
DAY_TO_DAY_TOT_RETURN_GROSS_DVDS [6423×1 double]
DAY_TO_DAY_TOT_RETURN_NET_DVDS [6423×1 double]
DIVIDEND_INDICATED_YIELD [6242×1 double]
EQY_DVD_YLD_IND [6242×1 double]
EQY_DVD_YLD_IND_NET [6242×1 double]
FUND_TOTAL_ASSETS [2540×1 double]
Evaluating tall expression using the Parallel Pool 'local':
- Pass 1 of 2: Completed in 1.1 sec
- Pass 2 of 2: Completed in 0.98 sec
Evaluation completed in 2.8 sec
ans = 10×4 table
DATE TICKER FIELD VALUE
__________ _______________ _______________ ______
06/01/1994 DFIVX US EQUITY EQY_DVD_YLD_IND 2.1484
06/02/1994 DFIVX US EQUITY EQY_DVD_YLD_IND 2.1611
06/03/1994 DFIVX US EQUITY EQY_DVD_YLD_IND 2.1718
06/06/1994 DFIVX US EQUITY EQY_DVD_YLD_IND 2.1654
06/07/1994 DFIVX US EQUITY EQY_DVD_YLD_IND 2.159
06/08/1994 DFIVX US EQUITY EQY_DVD_YLD_IND 2.1442
06/09/1994 DFIVX US EQUITY EQY_DVD_YLD_IND 2.1484
06/10/1994 DFIVX US EQUITY EQY_DVD_YLD_IND 2.138
06/13/1994 DFIVX US EQUITY EQY_DVD_YLD_IND 2.1277
06/14/1994 DFIVX US EQUITY EQY_DVD_YLD_IND 2.1277