How to reward after one simulation.
3 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
I have a simple pendulum throw a ball and perform reinforcement learning so that the error from the target point becomes small.
The reward is the error (minus absolute value error between the target point and the arrival point of the sphere).
Instead of giving a reward (error) in all states, I want to give only the one with the smallest error (error is close to 0).
In other words, one simulation → calculate the error (reward) when throwing the ball in all observed states → give only the value with the smallest error as a reward.
I want to do it like this. Is there a way?
% Difine of step function
function [NextObservation, Reward, IsDone, LoggedSignals] = myStepfunction(Action,LoggedSignals,SimplePendulum)
global RR
for i=1:200
statePre = [-2*pi/3;0];
statePre(1) = SimplePendulum.Theta;
statePre(2) = SimplePendulum.AngularVelocity;
IsDone = false;
% updating states
SimplePendulum.pstep(Action);
state = [-2*pi/3;0];
state(1) = SimplePendulum.Theta;
state(2) = SimplePendulum.AngularVelocity;
% cariclation of error (reward)
Ball_Target = 20;
Ball_Distance = Ballfunction(SimplePendulum);
R = -abs(Ball_Distance -Ball_Target);
teststep(R) % get the Error (reward) in all observed states
if (state(2) > 0) || (SimplePendulum.Y_Position < 0) %|| (abs(state(2)) > 10)
IsDone = true;
[InitialObservation, LoggedSignal] = myResetFunction(SimplePendulum);
LoggedSignal.State = [-pi ; 0];
InitialObservation = LoggedSignal.State;
state = InitialObservation;
SimplePendulum.Theta =-2*pi/3;
SimplePendulum.AngularVelocity = 0;
end
if IsDone == true
[M,I] = max(RR)
end
LoggedSignals.State = state;
NextObservation = LoggedSignals.State;
end
Reward = max(RR); % Gives the smallest error (reward)
end
0 commentaires
Réponses (0)
Voir également
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!