Datastore with overlapping read function
Afficher commentaires plus anciens
I have some data stored in 20 spreadsheets. Each spreadsheet has a size of about 4000x200. I want to store the data in a datastore, and feed it into a temporal-CNN in chunks of 100 rows. However, I want the rows to overlap. For example, the first value the datastore will return is a 100x200 array, which corresponds to rows 1:101 in the spreadsheet. The second value it should return is rows 2:102, then 3:103, etc.
The only way that I can seem to do this right now is to read all the spreadsheets into a 80,000x200 array in matlab, and then create a 3D array with size 79,900x100x200, and then use a for loop to iterate through the array, copying over chunks of 100x200 from the 2d array into the 3d array. Finally, I put the 3d array into an arrayDatastore. However, this seems really inefficient, and I have to keep the batch size for the CNN pretty small to avoid errors.
I also tried saving each of the 100x200 arrays into a grayscale png, and then creating an imageDatastore with the 79,900 images. This lets me have a larger batch size, but A) it takes about 8 hours to convert all the data into images, and B) training the CNN takes about 8-10 times longer (4-5 hours instead of 30 mins).
Is there a better way to do this?
Réponses (1)
For example, the first value the datastore will return is a 100x200 array, which corresponds to rows 1:101 in the spreadsheet. The second value it should return is rows 2:102, then 3:103, etc.
An equivalent to that would be to make the CNN fully convolutional (if it isn't already) with input size 4000x200. Then, you could feed an entire spreadsheet as input at once.
11 commentaires
ROSEMARIE MURRAY
le 23 Avr 2022
ROSEMARIE MURRAY
le 23 Avr 2022
It is a convolutional network, so separating the inputs into overlapping blocks is redundant. Consider the simplified example below. You can see that convolving the hypothetical weights with the whole input produces the same data as when you convolve the weights with separate overlapping row-blocks.Therefore your training would just be doing unnecessary repeat computations if you break your spreadsheets up into overlapping blocks, not to mention the extra memory requirements..
input=reshape(1:12,4,3);
block1=input(1:3,:);
block2=input(2:4,:);
w=rand(2); %random convolution weights
conv2(block1,w,'valid')
conv2(block2,w,'valid')
conv2(input,w,'valid') %total input
ROSEMARIE MURRAY
le 23 Avr 2022
Matt J
le 23 Avr 2022
In your example, what is the input size of the first layer of the CNN?
Well that's just it. Convolutional layers don't have a defined input size, because they are convolutional (footnote: they do have edge-padding rules). However, in my example, if you wish, you can think of the input size as 3x3 in the case when I pass in block1 and block2 separately and 4x3 when I pass in the whole input.
The input size of the first layer is a 100x200 array and the output of the last layer is a single categorical
If so, and if your network is fully convolutional, then you should be able to pass in a 4000x200 (containing 3901 sequences) to this same network. The output of the last layer should be a 3901x1 vector of categoricals. The cost function calculation needs to be adjusted to sum the costs of all 3901 classification results.
ROSEMARIE MURRAY
le 25 Avr 2022
The way I'm imagining it, you would go back to the imageInputLayer.The flattening layer and fully connected layer, though, would be removed and replaced with a single convolutional2dLayer with padding=0, stride=1. The number of output channels Nc should be the number of classes and the spatial dimensions of the weights should be 100x200. If you give a 4000x200 input image, and all goes well, the ouput of this should be a 3901(S)x1(S) image with Nc channels.
For the output layers, instead of having a softmax and classification layers, I think yout want a pixelClassificationLayer,
We're viewing the 3901x1 conv layer output as an array of pixels and we want to classify each one.
ROSEMARIE MURRAY
le 25 Avr 2022
Imageinputlayers have to be a preset size, though I am somewhat curious to see what would happen if you omitted it.I speculate that, because you have that batch normalization layer, you might not need the normalization that the imageinputlayer usually applies.
One thing that concerns me a bit is that you only have 1 hidden convolutional layer and no pooling is done. Normally, in CNN classification, you have a series of convolution and pooling layers so your feature map spatial dimensions get smaller with successive layers while the number of channels increase. That way, you don't have so many weights in the output layers to train. Currently, you have 2e4*Nc output weights. Is this specific network architecture something you got from literature?
ROSEMARIE MURRAY
le 26 Avr 2022
Matt J
le 27 Avr 2022
It seems that it is not possible to create a network without an input layer. I can make the network with an imageinputlayer of size 4000x200
You would have to turn off the normalizations that it is doing, in that case. The imageInputLayer is not a convolutional layer you can't get shift-invariant output if normalization is happening.
Catégories
En savoir plus sur Deep Learning Toolbox dans Centre d'aide et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!

