我在matlab中使用强化学习进行控制在actor的网络最后一层用的是tanhlayer,那么输出的范围应该在-1到1，但是输出的大小却不是

Question

guiyang le 5 Juin 2024

0
Lien

Utiliser le lien direct vers cette question

https://fr.mathworks.com/matlabcentral/answers/2125761-matlab-actor-tanhlayer-1-1

Commenté : guiyang le 13 Juin 2024

actorNet = [

featureInputLayer(numObs, Name="StateInLyr")

fullyConnectedLayer(64)

reluLayer

fullyConnectedLayer(32)

reluLayer

fullyConnectedLayer(numAct)

tanhLayer(Name="ActionOutLyr")

];

图片是actoer的输出，一共是6维，每个维度的输出都不在这个范围

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Answer 1

Krishna le 6 Juin 2024

0
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/2125761-matlab-actor-tanhlayer-1-1#answer_1468311

Hi Guiyang,

If the output of a tanh layer in your network is not within the expected range of -1 to 1, consider the following points:

Minor deviations from the expected range might be due to floating-point precision limits. These are typically negligible.
Check if there's any scaling or modification applied after the tanh output that might alter its range.
Ensure that the tanh layer is indeed the final layer in your network, with no additional operations post-tanh.
Verify that the method used for logging or visualizing outputs is accurate and not introducing errors or not scaling the tanh output.

Also you can follow these troubleshooting Steps:

Test the tanh function with known inputs to confirm its correct behavior.
Double-check the network architecture for unintended layers or operations after the tanh.

These steps should help identify and resolve the issue with the tanh layer output.

Also please follow this documentation to ask question better and get quick answers,

https://in.mathworks.com/matlabcentral/answers/6200-tutorial-how-to-ask-a-question-on-answers-and-get-a-fast-answer

Hope this helps.

1 commentaire
Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

guiyang le 13 Juin 2024

我还是找不到错误的原因，我给出了我的代码，能帮忙检查下吗

clc

clear

%%

%参数设置

dataType = 'double';

%%

%模型参数

Ts=1e-5;

T=0.001;

T1=0;

w=2*pi*50;

Un=1770;

Rn=0.145;

Ln=5.4e-3;

Cd=9e-3;

Udc=3600;

Rd=25;

Kesogi=1;

Kisogi=1;

Kppll=0.7;

Kipll=25;

Kpv=0.5;

Kiv=5;

Kpi=2;

Kii=50;

fp=1:0.01:85;

%%

%创建环境接口

mdl = "deepl_rectifier_model1";

open_system(mdl)

numObs = 10;

obsInfo = rlNumericSpec( ...

[numObs 1], ...

DataType=dataType);

obsInfo.Name = "observations";

obsInfo.Description = "Error and reference signal";

% 创建动作规范

numAct = 6;

actInfo = rlNumericSpec([numAct 1], "DataType", dataType);

actInfo.Name = "vqdRef";

agentblk = "deepl_rectifier_model1/RL Agent";

env = rlSimulinkEnv(mdl, agentblk, obsInfo, actInfo);

actInfo=getActionInfo(env);

env.ResetFcn = @resetReCT;

%%

%建立智能体

% 状态输入路径

statePath = [

featureInputLayer(numObs, Name="StateInLyr")

fullyConnectedLayer(64, Name="fc1")

];

% 动作输入路径

actionPath = [

featureInputLayer(numAct, Name="ActionInLyr")

fullyConnectedLayer(64, Name="fc2")

];

% 通用输出路径

commonPath = [additionLayer(2, Name="add")

reluLayer

fullyConnectedLayer(32)

reluLayer

fullyConnectedLayer(16)

fullyConnectedLayer(1, Name="QValueOutLyr")

];

% 将图层添加到图层图对象

criticNet = layerGraph();

criticNet = addLayers(criticNet, statePath);

criticNet = addLayers(criticNet, actionPath);

criticNet = addLayers(criticNet, commonPath);

% 连接图层

criticNet = connectLayers(criticNet, "fc1", "add/in1");

criticNet = connectLayers(criticNet, "fc2", "add/in2");

%绘制critic

criticDLNet = dlnetwork(criticNet, Initialize=false);

%固定随机种子

rng(0)

%建立critic

critic1 = rlQValueFunction(initialize(criticDLNet),obsInfo, actInfo);

critic2 = rlQValueFunction(initialize(criticDLNet),obsInfo, actInfo);

%建立actor

actorNet = [

featureInputLayer(numObs, Name="StateInLyr")

fullyConnectedLayer(64)

reluLayer

fullyConnectedLayer(32)

reluLayer

fullyConnectedLayer(numAct)

sigmoidLayer(Name="ActionOutLyr")

];

%绘制actoer

actordlNet = dlnetwork(actorNet);

% summary(actordlNet)

% plot(actordlNet)

%构建actor

actor = rlContinuousDeterministicActor(actordlNet,obsInfo,actInfo);

%设置智能体参数

Ts_agent = 0.001;

agentOpts = rlTD3AgentOptions( ...

SampleTime=Ts_agent, ...

DiscountFactor=0.995, ...

ExperienceBufferLength=2e6, ...

MiniBatchSize=256, ...

NumStepsToLookAhead=1, ...

TargetSmoothFactor=0.005, ...

TargetUpdateFrequency=10);

for idx = 1:2

agentOpts.CriticOptimizerOptions(idx).LearnRate = 1e-4;

agentOpts.CriticOptimizerOptions(idx).GradientThreshold = 1;

agentOpts.CriticOptimizerOptions(idx).L2RegularizationFactor = 1e-3;

end

% Actor optimizer options

agentOpts.ActorOptimizerOptions.LearnRate = 1e-3;

agentOpts.ActorOptimizerOptions.GradientThreshold = 1;

agentOpts.ActorOptimizerOptions.L2RegularizationFactor = 1e-3;

%设置噪声参数

%设置噪声的方差和衰减率

agentOpts.ExplorationModel.Variance = 0.05;

agentOpts.ExplorationModel.VarianceDecayRate = 2e-4;

agentOpts.ExplorationModel.VarianceMin = 0.001;

%高斯动作噪声模型来平滑目标策略更新

agentOpts.TargetPolicySmoothModel.Variance = 0.1;

agentOpts.TargetPolicySmoothModel.VarianceDecayRate = 1e-4;

% 使用指定的参与者、批评者和选项创建代理

agent = rlTD3Agent(actor, [critic1,critic2], agentOpts);

%%

%训练的智能体

T2 = 2;

maxepisodes = 1000;

maxsteps = ceil(T2/Ts_agent);

trainOpts = rlTrainingOptions(...

MaxEpisodes=maxepisodes, ...

MaxStepsPerEpisode=maxsteps, ...

StopTrainingCriteria="AverageReward",...

StopTrainingValue=-190,...

ScoreAveragingWindowLength=100);

doTraining = true;

if doTraining

trainResult = train(agent, env, trainOpts);

else

load("rlPMSMAgent.mat","agent")

end

%%

%智能体仿真

sim(mdl);

Connectez-vous pour commenter.

我在matlab中使用强化学习进行控制在actor的网络最后一层用的是tanhlayer,那么输出的范围应该在-1到1，但是输出的大小却不是

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Réponses (1)

1 commentaire
Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

Voir également

Catégories

Tags

Produits

Version

Community Treasure Hunt

我在matlab中使​用强化学习进行控制在​actor的网络最后​一层用的是tanhl​ayer,那么输出的​范围应该在-1到1，​但是输出的大小却不是

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Réponses (1)

1 commentaire Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

Voir également

Catégories

Tags

Produits

Version

Community Treasure Hunt

我在matlab中使用强化学习进行控制在actor的网络最后一层用的是tanhlayer,那么输出的范围应该在-1到1，但是输出的大小却不是

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

1 commentaire
Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens