- The call to Y = categorical(charactersShifted) needs to include a valueset that includes all the unique characters in your dataset, Y = categorical(charactersShifted,allUniqueCharacters)
- To make that work with the uniqueCharacters variable you need to convert it to the same class as charactersShifted, a string.
- The endOfTextCharacter will need to be included too, otherwise it'll become an <undefined> category in Y.
- Finally the logic charactersShifted = [cellstr(characters(2:end)')' endOfTextCharacter]; might prepend an empty "" when characters was only 1 character long. That will make Y have length 2, but X have length 1 and you'll get a sequence length mismatch when you try to train.
Generate Text with Deep Learning "Invalid training data. Labels must not contain undefined values" ERROR
5 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
I am using the Generate Text with Deep Learning Matlab example, here
It works fine when I use the Shakespeare text provided in the example, but none of my texts are accepted. I always get the error: "Invalid training data. Labels must not contain undefined values."
My text and code provided below.
filename = 'RWE Nature.txt';
textData = fileread(filename);
textData = replace(textData," ","");
textData = split(textData,[newline]); % USE NEWLINE TO SPLIT TEXT INTO CELLS
% textData = textData(5:2:end);
textData(1:5) % 154 X 1 string array
startOfTextCharacter = compose("\x0002");
whitespaceCharacter = compose("\x00B7");
endOfTextCharacter = compose("\x2403");
newlineCharacter = compose("\x00B6");
textData = startOfTextCharacter + textData;
textData = replace(textData,[" " newline],[whitespaceCharacter newlineCharacter]);
uniqueCharacters = unique([textData{:}]); % '!'(),-.:;?ABCDEFGHIJKLMNOPRSTUVWYabcdefghijklmnopqrstuvwxyz¶·'
numUniqueCharacters = numel(uniqueCharacters); % 62
%
numDocuments = numel(textData); % 154 SONNETS, 89 PARAGRAPHS IN MAYER
XTrain = cell(1,numDocuments);
YTrain = cell(1,numDocuments);
for i = 1:numel(textData)
characters = textData{i};
sequenceLength = numel(characters);
% Get indices of characters.
[~,idx] = ismember(characters,uniqueCharacters);
% Convert characters to vectors.
X = zeros(numUniqueCharacters,sequenceLength);
for j = 1:sequenceLength
X(idx(j),j) = 1;
end
% Create vector of categorical responses with end of text character.
charactersShifted = [cellstr(characters(2:end)')' endOfTextCharacter];
Y = categorical(charactersShifted);
XTrain{i} = X;
YTrain{i} = Y;
end
% textData{1}
inputSize = size(XTrain{1},1);
numHiddenUnits = 200;
numClasses = numel(categories([YTrain{:}]));
layers = [
sequenceInputLayer(inputSize)
lstmLayer(numHiddenUnits,'OutputMode','sequence')
fullyConnectedLayer(numClasses)
softmaxLayer
classificationLayer];
options = trainingOptions('adam', ...
'MaxEpochs',500, ...
'InitialLearnRate',0.01, ...
'GradientThreshold',2, ...
'MiniBatchSize',77,...
'Shuffle','every-epoch', ...
'Plots','training-progress', ...
'Verbose',false);
% Train the network.
'a'
net = trainNetwork(XTrain,YTrain,layers,options);
'b'
% Generate text using the trained network.
generatedText = generateText(net,uniqueCharacters,startOfTextCharacter,newlineCharacter,whitespaceCharacter,endOfTextCharacter)
'end'
function generatedText = generateText(net,uniqueCharacters,startOfTextCharacter,newlineCharacter,whitespaceCharacter,endOfTextCharacter)
numUniqueCharacters = numel(uniqueCharacters);
X = zeros(numUniqueCharacters,1);
idx = strfind(uniqueCharacters,startOfTextCharacter);
X(idx) = 1;
generatedText = "";
vocabulary = string(net.Layers(end).Classes);
maxLength = 500;
while strlength(generatedText) < maxLength
% Predict the next character scores.
[net,characterScores] = predictAndUpdateState(net,X,'ExecutionEnvironment','cpu');
% Sample the next character.
newCharacter = datasample(vocabulary,1,'Weights',characterScores);
% Stop predicting at the end of text.
if newCharacter == endOfTextCharacter
break
end
% Add the character to the generated text.
generatedText = generatedText + newCharacter;
% Create a new vector for the next input.
X(:) = 0;
idx = strfind(uniqueCharacters,newCharacter);
X(idx) = 1;
end
generatedText = replace(generatedText,[newlineCharacter whitespaceCharacter],[newline " "]);
end
0 commentaires
Réponses (1)
Ben
le 28 Nov 2022
There are a few issues to fix this:
I think training should work once you resolve these things. Hope that helps.
Voir également
Catégories
En savoir plus sur Language Support dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!