How to input action in reinforcement learning template environment?

I have modified the template environment to adapt my scenarios. My current action cosists of two vectors. The Action configuration is like the following.
function this = EdgeEnvironment()
% Initialize Observation settings
ObservationInfo(1) = rlNumericSpec([1 10]);
ObservationInfo(1).Name = 'schedule';
ObservationInfo(1).Description = 'schedule';
ObservationInfo(2) = rlNumericSpec([1 20]);
ObservationInfo(2).Name = 'ppath';
ObservationInfo(2).Description = 'ppath';
ObservationInfo(3) = rlNumericSpec([1 1]);
ObservationInfo(3).Name = 'completionTime';
ObservationInfo(3).Description = 'completionTime';
ObservationInfo(4) = rlNumericSpec([1 1]);
ObservationInfo(4).Name = 'computeDuring';
ObservationInfo(4).Description = 'computeDuring';
% Initialize Action settings
ActionInfo(1) = rlNumericSpec([1 10]);
ActionInfo(1).Name = 'schedule';
ActionInfo(2) = rlNumericSpec([1 20]);
ActionInfo(2).Name = 'ppath';
% The following line implements built-in functions of RL env
this = this@rl.env.MATLABEnvironment(ObservationInfo, ActionInfo);
end
The step function was designed like the following.
function [Observation,Reward,IsDone,LoggedSignals] = step(this, Action)
LoggedSignals = [];
% distance
node_distance = zeros(this.device_count, this.device_count);
distance = getDistance(this, node_distance);
% parameter list
parameter_list = getstruct(this, distance);
% the parameter list of device
device_list = get_device_list(this);
% Extract action
[schedule_act, ppath_act]=get_act(Action);
% schedule_act = Action{1,1};
% ppath_act = Action{1,2};
% Unpack state vector
last_schedule = schedule_act;
last_ppath = ppath_act;
last_completionTime = this.State{1,3};
last_computeDuring = this.State{1,4};
% Update system states
[schedule, stay_node_list, completionTime] = ComScheduling(last_completionTime,...
last_schedule, last_ppath, device_list, parameter_list);
[ppath, stay_node_list, completionTime, computeDuring] = PathPlanning(last_completionTime,...
last_ppath, schedule, stay_node_list, device_list, parameter_list);
prob = 1 / (1 + exp((completionTime - last_completionTime)/parameter_list.omega));
dice = rand(1);
if dice <= prob
last_ppath = ppath;
last_schedule = schedule;
last_stay_node_list = stay_node_list;
last_completionTime = completionTime;
last_computeDuring = computeDuring;
completionTime_iter(end + 1) = completionTime;
else
completionTimer_iter(end + 1) = last_computeDuring;
end
ppath = last_ppath;
schedule = last_schedule;
stay_node_list = last_stay_node_list;
completionTime = last_completionTime;
computeDuring = last_computeDuring;
Observation = {schedule, ppath, completionTime, computeDuring};
this.State = Observation;
% Check terminal condition
completionTime = Observation(3);
computeDuring = Observation(4);
IsDone = completionTime < this.completionTime_threshold || computeDuring < this.computeDuring_threshold;
this.IsDone = IsDone;
% Get reward
Reward = -completionTime;
end
We caculate the action value by the following function.
function [schedule_act, ppath_act] = get_act(action)
schedule_act = action{1,1};
ppath_act = action{1,2};
end
When I run the validateEnvironment function, the error is like the following.
I want to know how to fix them.

 Réponse acceptée

Easiest thing you can do is add a break point and display what "action" variable is. It's obviously not a cell array so you cannot access is with braces {} in the "get_act" function. That's why you are getting the error

8 commentaires

Thx, Emmanouil. I have checked "action" variable. It seems that I haven't initialized some configurations of "action". The element of "action{1,1}" and "action{1,2}" should be limited to [1,2,3,4,5,6,7,8,9,10], like the following.
action{1,1}=[1,2,3,4,5,6,7,8,9,10];
action{1,2}=[1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10];
I want to know how to configure "action" like the above.
what are you seeing currently when you display the action?
Yang Chen
Yang Chen le 7 Mar 2023
Modifié(e) : Yang Chen le 8 Mar 2023
I have changed ActionInfo like the following.
ActionInfo(1) = rlFiniteSetSpec({[1 2 3 4 5 6 7 8 9 10],[10 9 1 2 3 4 5 6 7 8]});
ActionInfo(1).Name = 'schedule';
ActionInfo(2) = rlFiniteSetSpec({[1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10]});
ActionInfo(2).Name = 'ppath';
The action display is like the following.
However, in my scenario, Action{1,2} should be made up of all potential 10-digit cases including 1-10, while each number appears one time. In addition, Action{2,2} should be made up of all potential 20-digit cases including 1-10, while each number appears two times. My current ActionInfo just involves one option, not all possible options.
I think the main question here is how to define the action space for multiple combinations of disrete actions, in which case this doc example will help
I have checked the document before. The 10-digt case makes sense. However, I find the 20-digit case is oversized. The detailed coding is like the following.
ActionInfo(1) = rlFiniteSetSpec(num2cell(perms([1 2 3 4 5 6 7 8 9 10]),2)');
ActionInfo(1).Name = 'schedule';
ActionInfo(2) = rlFiniteSetSpec(num2cell(perms([1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10]),2)');
ActionInfo(2).Name = 'ppath';
Apologies, I am not able to follow what you are asking, Initial question was about the error you were getting when trying to access a variable that was not a cell array, then you realized the action space was not set up properly since you had a discrete action space. Now you are saying that part of your action space is "oversized". What exactly do you mean by oversized? What should it look like?
It is about the size of my discrete action space. For example, my action space is like {[1, 2, 3],[1,3,2],[2,1,3],[2,3,1],[3,1,2],[3,2,1]}, which follows all random order of 1-3. When we increase the amount of number to 20, the amount of data size is over the system limitation.
Thanks for clarifying. This is the curse of dimensionality, not much you can do about that other than using a continuous action space unfortunately.

Connectez-vous pour commenter.

Plus de réponses (0)

Catégories

Produits

Version

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by