Multi action agent programming in reinforcement learning

Question

0 votes

Please, how can I program or represent multi action agent in reinforcement learning (DQN), where I could construct the agent but I do not know how can represent it (action with three decision every stage of learning) in step function. The action has three decision that are charging battery, operating first generator and operating second generator. The first part of code below show how I construct the enviroment and in the second part I ask how can I add this actions to the my step function.

Thank you in advance.

first part

clc

ObservationInfo = rlNumericSpec([4 1]);

ObservationInfo.Name = 'EnergSolar States';

ObservationInfo.Description = 'T,SOC,SOF,Temp';

ActionInfo = rlFiniteSetSpec({[-1 0 0],[-1 1 0],[-1 0 1],[-1 1 1],[0 0 0],[0 1 0],[0 0 1],[0 1 1],[1 0 0],[1 1 0],[1 0 1],[1 1 1]});

ActionInfo.Name = 'EnergSolar Action';

env = rlFunctionEnv(ObservationInfo,ActionInfo,'myStepFunctionfuel','myResetFunctionfuel');

obsInfo = getObservationInfo(env);

numObservations = obsInfo.Dimension(1);

actInfo = getActionInfo(env);

statePath = [

imageInputLayer([4 1 1], 'Normalization', 'none', 'Name', 'state')

fullyConnectedLayer(200, 'Name', 'CriticStateFC1')

reluLayer('Name', 'CriticRelu1')

fullyConnectedLayer(200, 'Name', 'CriticStateFC2')];

actionPath = [

imageInputLayer([1 3 1], 'Normalization', 'none', 'Name', 'action')

fullyConnectedLayer(200, 'Name', 'CriticActionFC1')];

commonPath = [

additionLayer(2,'Name', 'add')

reluLayer('Name','CriticCommonRelu')

fullyConnectedLayer(1, 'Name', 'output')];

criticNetwork = layerGraph(statePath);

criticNetwork = addLayers(criticNetwork, actionPath);

criticNetwork = addLayers(criticNetwork, commonPath);

criticNetwork = connectLayers(criticNetwork,'CriticStateFC2','add/in1');

criticNetwork = connectLayers(criticNetwork,'CriticActionFC1','add/in2');

criticOpts = rlRepresentationOptions('LearnRate',0.002,'GradientThreshold',1);

critic = rlRepresentation(criticNetwork,obsInfo,actInfo,...

'Observation',{'state'},'Action',{'action'},criticOpts);

agentOpts = rlDQNAgentOptions(...

'UseDoubleDQN',false, ...

'TargetUpdateMethod',"periodic", ...

'TargetUpdateFrequency',4, ...

'ExperienceBufferLength',100000, ...

'DiscountFactor',0.99, ...

'MiniBatchSize',1000);%500 to 1000

agent = rlDQNAgent(critic,agentOpts);

trainOpts = rlTrainingOptions(...

'MaxEpisodes', 1000, ...

'MaxStepsPerEpisode', 500, ...

'Verbose', false, ...

'Plots','training-progress',...

'StopTrainingCriteria','EpisodeReward',...

'StopTrainingValue',0,...

'ScoreAveragingWindowLength',5);

trainingStats = train(agent,env,trainOpts);

Second part

%Balance eq.

Pg=PL-Ppv-bpr*(Action1);

if(Pg>Z)

if(Pg-Z<=150)

PDG1=Pg(T)-Z;

PDG2=0;

F(T)=A*PDG1+B*Pr;

Pg=Z;

else

if(Pg-Z<350)

PDG2=Pg-Z;

F=A*PDG2+B*Pr2;

PDG1=0;

Pg=Z;

elseif(Pg-Z<500)

PDG2=350;

PDG1=(Pg-Z-PDG2)*Action2;

F=A*(PDG1+PDG2)+B*(Pr1*Action2+Pr2*Action3);

Pg=Pg-Z-PDG1-PDG2;

end

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Follow Question

Answer 1

Emmanouil Tzorakoleftherakis le 13 Juil 2020

0 votes

This example shows how to create an environment with multiple discrete actions. Hope that helps

3 commentaires
Afficher 1 commentaire plus ancien Masquer 1 commentaire plus ancien

Emmanouil Tzorakoleftherakis le 14 Juil 2020

All the elements are in ActionInfo.Elements. Is that what you need?

Nabil Jalil Aklo le 14 Juil 2020

Let me explain what I need in this example:

If I have action vector consist of three elements at time,

ActionInfo = rlFiniteSetSpec({[-1 0 0],[-1 1 0],[-1 0 1],[-1 1 1],[0 0 0],[0 1 0],[0 0 1],[0 1 1],[1 0 0],[1 1 0],[1 0 1],[1 1 1]});

At any time, let the action vector became Action=[-1 0 1] these element represent three decisions to control battery charging, first generator control and second generator control, at mean time I want to apply the first element of this vector on the equation below

SOC=SOC+200*(first element of the action vector)

the question is how can I abstruct the first element from the vector.

Thank you in advance.

Connectez-vous pour commenter.

Multi action agent programming in reinforcement learning

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Réponses (1)

3 commentaires
Afficher 1 commentaire plus ancien Masquer 1 commentaire plus ancien

Catégories

Produits

Tags

Community Treasure Hunt

Multi action agent programming in reinforcement learning

0 commentaires Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Réponses (1)

3 commentaires Afficher 1 commentaire plus ancien Masquer 1 commentaire plus ancien

Catégories

Produits

Tags

Voir également

Community Treasure Hunt

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

3 commentaires
Afficher 1 commentaire plus ancien Masquer 1 commentaire plus ancien