Deep learning with partitionable datastores on a cluster
9 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
Christopher McCausland
le 10 Mar 2023
Réponse apportée : Joss Knight
le 16 Mar 2023
Hello,
I have a data store which contains 1000 .mat files. Each file contains a X*4 table which has the following format (see attached). 'X' is typically 700-900. The tf_ridge column is my data, for this study "sleep stage" is my lable of intrest.
MATLAB deep learning expects a n*2 table input; therefore I created a custom read function to read in the data and strip out the extra two colums and make my lable data categorical as shown below in mys custon read function;
% Calling ds as shown
ds = fileDatastore('C:\mydata',"ReadFcn",@custom_load_FN,"FileExtensions",".mat");
I also make a subset of the data for training and test purposes;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Create a subset of the datastore for test train val purposes
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% [Train, val, test] as a whole percentage i.e. [60,20,20]
split = [90,0,10];
[split_idx] = round(length(ds_org.Files)*(split/100));
train_idx = [1:split_idx(1)];
val_idx = [split_idx(1):split_idx(1)+split_idx(2)];
test_idx = [split_idx(1)+split_idx(2):split_idx(1)+split_idx(2)+split_idx(3)];
% Generate subset for train/test split; will inherit ds properties of
% isPartitionable
dstrain = subset(ds,train_idx);
dsval = subset(ds,val_idx);
dstest = subset(ds,test_idx);
% Custom read function to strip out arousal and epoch columns, and make
% lable categorical
function [a] = custom_load_FN(l)
disp('In load function')
load(l);
%disp(l)
data= removevars(data,{'Arousal','Epoch'});
valSet = {'N1' 'N2' 'N3' 'W' 'R'};
data.Sleep_Stage = categorical(data.Sleep_Stage,valSet);
a = data;
end
When I test this with;
tf = isPartitionable(ds)
MATLAB returns a logical 1; so the datastore is partitionable. However on the cluster I get the following error that the datastore is not partitionable.
The input datastore is not Partitionable and does not support parallel operations.
As a work around; I have also tried to use the @load handle and a transform datastore function which is just a rehash of my custom_load_FN however this has been unsuccessful. I am aware of this post, and this one. However it seems like there should be an easier soloution in my case. I just don't have enough experiance of working with datastores to know what this is.
If anyone has advice on how to make this type of datastore into a partitionable datastore with the ExecutionEnvironment="parallel" option for deep learning I would apprshate the advice!
options = trainingOptions("adam", ...
ExecutionEnvironment="parallel",
...
)
Kind regards,
Christopher
3 commentaires
Joss Knight
le 16 Mar 2023
Ah yes, this is just an incorrect error message that was fixed in R2022a. I will answer now.
Réponse acceptée
Joss Knight
le 16 Mar 2023
This error message is incorrect. It should say that your datastore is not PartionableByIndex. This was fixed in R2022a.
As long as your datastore is Subsettable you can now (since R2022b) work around this issue by using this Adapter I knocked together. No promises but it's mostly worked so far.
0 commentaires
Plus de réponses (0)
Voir également
Catégories
En savoir plus sur Pattern Recognition and Classification dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!