How to get the value of value function in soft actor critic?
4 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
I want to know the way to get the value of value function.
I am using soft actor critic.
Someone tell me the way?
% Soft-actor-critic
clear all;
close all;
Length = 1;
Mass = 1;
Ts = 0.01;
Theta_Initial = -pi;
AngularVelocity_Initial = 0;
SimplePendulum = classPendulum(Length, Mass, Theta_Initial, AngularVelocity_Initial, Ts);
ObservationInfo = rlNumericSpec([2 1]);
ObservationInfo.Name = 'States';
ObservationInfo.Description = 'Theta, AngularVelocity';
ActionInfo = rlNumericSpec([1 1],'LowerLimit',-100,'UpperLimit',-5);
ActionInfo.Name = 'Action';
ActionInfo.Description = 'F';
ResetHandle = @()myResetFunction(SimplePendulum);
StepHandle = @(Action,LoggedSignals) myStepfunction(Action,LoggedSignals,SimplePendulum);
env = rlFunctionEnv(ObservationInfo, ActionInfo, StepHandle, ResetHandle);
obsInfo = getObservationInfo(env);
actInfo = getActionInfo(env);
numObs = obsInfo.Dimension(1);
numAct = numel(actInfo);
device = 'gpu';
% CRITIC
statePath1 = [
featureInputLayer(numObs,'Normalization','none','Name','observation')
fullyConnectedLayer(400,'Name','CriticStateFC1')
reluLayer('Name','CriticStateRelu1')
fullyConnectedLayer(300,'Name','CriticStateFC2')
];
actionPath1 = [
featureInputLayer(numAct,'Normalization','none','Name','action')
fullyConnectedLayer(300,'Name','CriticActionFC1')
];
commonPath1 = [
additionLayer(2,'Name','add')
reluLayer('Name','CriticCommonRelu1')
fullyConnectedLayer(1,'Name','CriticOutput')
];
criticNet = layerGraph(statePath1);
criticNet = addLayers(criticNet,actionPath1);
criticNet = addLayers(criticNet,commonPath1);
criticNet = connectLayers(criticNet,'CriticStateFC2','add/in1');
criticNet = connectLayers(criticNet,'CriticActionFC1','add/in2');
criticOptions = rlRepresentationOptions('Optimizer','adam','LearnRate',1e-3,...
'GradientThreshold',1,'L2RegularizationFactor',2e-4,'UseDevice',device);
critic1 = rlQValueRepresentation(criticNet,obsInfo,actInfo,...
'Observation',{'observation'},'Action',{'action'},criticOptions);
critic2 = rlQValueRepresentation(criticNet,obsInfo,actInfo,...
'Observation',{'observation'},'Action',{'action'},criticOptions);
%ACTOR
statePath = [
featureInputLayer(numObs,'Normalization','none','Name','observation')
fullyConnectedLayer(400, 'Name','commonFC1')
reluLayer('Name','CommonRelu')];
meanPath = [
fullyConnectedLayer(300,'Name','MeanFC1')
reluLayer('Name','MeanRelu')
fullyConnectedLayer(numAct,'Name','Mean')
];
stdPath = [
fullyConnectedLayer(300,'Name','StdFC1')
reluLayer('Name','StdRelu')
fullyConnectedLayer(numAct,'Name','StdFC2')
softplusLayer('Name','StandardDeviation')];
concatPath = concatenationLayer(1,2,'Name','GaussianParameters');
actorNetwork = layerGraph(statePath);
actorNetwork = addLayers(actorNetwork,meanPath);
actorNetwork = addLayers(actorNetwork,stdPath);
actorNetwork = addLayers(actorNetwork,concatPath);
actorNetwork = connectLayers(actorNetwork,'CommonRelu','MeanFC1/in');
actorNetwork = connectLayers(actorNetwork,'CommonRelu','StdFC1/in');
actorNetwork = connectLayers(actorNetwork,'Mean','GaussianParameters/in1');
actorNetwork = connectLayers(actorNetwork,'StandardDeviation','GaussianParameters/in2');
actorOptions = rlRepresentationOptions('Optimizer','adam','LearnRate',1e-3,...
'GradientThreshold',1,'L2RegularizationFactor',1e-5,'UseDevice',device);
actor = rlStochasticActorRepresentation(actorNetwork,obsInfo,actInfo,actorOptions,...
'Observation',{'observation'});
agentOptions = rlSACAgentOptions;
agentOptions.SampleTime = Ts;
agentOptions.DiscountFactor = 0.99;
agentOptions.TargetSmoothFactor = 1e-3;
agentOptions.ExperienceBufferLength = 1e6;
agentOptions.MiniBatchSize = 32;
agent = rlSACAgent(actor,[critic1 critic2],agentOptions);
getAction(agent,{rand(obsInfo(1).Dimension)});
maxepisodes = 10;
maxsteps = 2;
trainingOptions = rlTrainingOptions(...
'MaxEpisodes',maxepisodes,...
'MaxStepsPerEpisode',maxsteps,...
'StopOnError','on',...
'Verbose',true,...
'Plots','training-progress',...
'StopTrainingCriteria','AverageReward',...
'StopTrainingValue',Inf,...
'ScoreAveragingWindowLength',10);
trainingStats = train(agent,env,trainingOptions);
% Play the game with the trained agent
simOptions = rlSimulationOptions('MaxSteps',maxsteps);
experience = sim(env,agent,simOptions);
% Q値 Here I want to get the value of value of function,(Qvalue)
% Is the way correct?
batchobs = rand(2,1,64);
batchact = rand(1,1,64,1);
qvalue = getValue(critic2,{batchobs},{batchact});
%v = getValue(critic2,{rand(2,1)},{rand(1,1)})
%save("kyori30Agent.mat","States")
2 commentaires
Martin Forsberg Lie
le 8 Nov 2021
Modifié(e) : Martin Forsberg Lie
le 8 Nov 2021
SAC is implemented with two critics, and you must choose the critic:
critic = getCritic(agent);
value = getValue(critic(1),{obs},action);
Réponses (0)
Voir également
Catégories
En savoir plus sur Policies and Value Functions dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!