How to provide input without datastore to multiple input deep neural network?
Afficher commentaires plus anciens

I have used the network shown in fig which takes 2 inputs namely video input(no. of images) & second is mfcc of audio signal of same image. I have used fileDatastore commands to store training data and validation data. Would you please guide how to provide training and validation data without filestore? I already have data in 4-D array.
Please provide solution
My aim is to generate mfcc from lip images. i have trained network with lip images & corresponding mffcc then output of both networks are added together and provided to 3rd neural network as shown in fig. I trained the network. But I am unable to find output of network i.e. generated mfcc.
Please guide how to find mffcc from network output.
Also i have combined frames of all videos together then applied images as a input. Instead of that can I provide input as a video signal.
clear all;
close all;
clc;
files={'AVDIGITS_S1_0_01.mp4';'AVDIGITS_S1_0_02.mp4';'AVDIGITS_S1_0_03.mp4';'AVDIGITS_S1_0_04.mp4';'AVDIGITS_S1_0_05.mp4';...
'AVDIGITS_S1_1_02.mp4';'AVDIGITS_S1_1_03.mp4';'AVDIGITS_S1_1_04.mp4';'AVDIGITS_S1_1_05.mp4';
};
mfcc_files={'S1_0_01_mfcc.mp4.avi';'S1_0_02_mfcc.mp4.avi';'S1_0_03_mfcc.mp4.avi';'S1_0_04_mfcc.mp4.avi';'S1_0_05_mfcc.mp4.avi'; ...
'S1_1_02_mfcc.mp4.avi';'S1_1_03_mfcc.mp4.avi';'S1_1_04_mfcc.mp4.avi';'S1_1_05_mfcc.mp4.avi'}
numFiles = numel(files);
index2=1;
for mm=1:numFiles
video = readVideo(files{mm});
fprintf("Reading Video file %d of %d...\n", mm, numFiles)
[v1 v2 v3 v4]=size(video);
audio = readVideo(mfcc_files{mm});
fprintf("Reading Audio file %d of %d...\n", mm, numFiles)
frame_cnt(mm)=v4;
for ii=1:v4
comb_video=video(:,:,:,ii);
comb_audio=audio(:,:,ii);
all_vid_frames(:,:,:,index2)=uint8(comb_video);
all_audio_frames(:,:,:,index2)=comb_audio;
index2=index2+1;
end
end
labels1=categorical([zeros(1,209) ones(1,196)]);
idxTrain =[1:121 357:405];
for kk=1:length(idxTrain)
ind1=idxTrain(kk);
vid_sequencesTrain(:,:,:,kk) = all_vid_frames(:,:,:,ind1);
vid_labelsTrain(kk) = labels1(ind1);
audio_sequencesTrain(:,:,:,kk) = all_audio_frames(:,:,:,ind1);
audio_labelsTrain = labels1(ind1);
end
idxValidation = [122:356];
for kk=1:length(idxValidation)
ind2=idxValidation(kk);
vid_sequencesValidation(:,:,:,kk) = all_vid_frames(:,:,:,ind2);
vid_labelsValidation(kk) = labels1(ind2);
audio_sequencesValidation(:,:,:,kk) = all_audio_frames(:,:,:,ind2);
audio_labelsValidation(kk) = labels1(ind2);
end
[v1 v2 v3 v4]=size(vid_sequencesTrain)
[a1 a2 a3 a4]=size(audio_sequencesTrain)
imgCells = mat2cell(vid_sequencesTrain,v1,v2,v3,ones(v4,1));
imgCells2 = reshape(imgCells,[v4 1 1]);
audioCells = mat2cell(audio_sequencesTrain,a1,a2,a3,ones(a4,1));
audioCells2 = reshape(audioCells,[a4 1 1]);
labelCells = arrayfun(@(x)x,vid_labelsTrain,'UniformOutput',false);
combinedCells = [imgCells2 audioCells2 labelCells'];
%% validation
[vv1 vv2 vv3 vv4]=size(vid_sequencesValidation)
[aa1 aa2 aa3 aa4]=size(audio_sequencesValidation)
imgCellsvald = mat2cell(vid_sequencesValidation,vv1,vv2,vv3,ones(vv4,1));
imgCells2vald = reshape(imgCellsvald,[vv4 1 1]);
audioCellsvald = mat2cell(audio_sequencesValidation,aa1,aa2,aa3,ones(aa4,1));
audioCells2vald = reshape(audioCellsvald,[aa4 1 1]);
labelCells2vald = arrayfun(@(x)x,audio_labelsValidation,'UniformOutput',false);
combinedCellsvald = [imgCells2vald audioCells2vald labelCells2vald'];
%
save('traingData_10April_2023.mat','combinedCells', 'combinedCellsvald');
filedatastore = fileDatastore('traingData_10April_2023.mat','ReadFcn',@load);
trainingDatastore = transform(filedatastore,@rearrangeData);
layers1 = [
imageInputLayer([v1 v2 3],'Name','imageinput')
convolution2dLayer(3,16,'Padding','same','Name','conv_1')
batchNormalizationLayer('Name','BN_1')
reluLayer('Name','relu_1')
fullyConnectedLayer(2,'Name','fc11')
additionLayer(2,'Name','add')
transposedConv2dLayer(3,16,'Name','deconv1');
batchNormalizationLayer('Name','BN_2')
reluLayer('Name','relu_2')
transposedConv2dLayer(3,16,'Name','deconv2');
batchNormalizationLayer('Name','BN_3')
reluLayer('Name','relu_3')
averagePooling2dLayer(2,'Stride',2)
fullyConnectedLayer(2,'Name','fc12')
softmaxLayer('Name','softmax')
classificationLayer('Name','classOutput')];
lgraph = layerGraph(layers1);
layers2 = [imageInputLayer([a1 a2 a3],'Name','vinput')
fullyConnectedLayer(2,'Name','fc21')];
lgraph = addLayers(lgraph,layers2);
lgraph = connectLayers(lgraph,'fc21','add/in2');
plot(lgraph)
options = trainingOptions('adam', ...
'InitialLearnRate',0.005, ...
'LearnRateSchedule','piecewise',...
'MaxEpochs',100, ...
'MiniBatchSize',512, ...
'Verbose',false, ...
'Plots','training-progress',...
'Shuffle','never',...
'ValidationData',trainingDatastore, ...
'ValidationFrequency',1);
net = trainNetwork(trainingDatastore,lgraph,options);
Réponse acceptée
Plus de réponses (0)
Catégories
En savoir plus sur Deep Learning Toolbox dans Centre d'aide et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!