Game Theory (No-Regret Learning Algorithm)

Question

0 votes

Please guys I want a sample code for implementing the No-Regret Learning Algorithm in Matlab possibly with reference to the following journal: Han, Z., Pandana, C., & Liu, K. J. K. (2007). Distributive opportunistic spectrum access for cognitive radio using correlated equilibrium and no-regret learning. In IEEE Wireless Communications and Networking Conference, WCNC (pp. 11–15). https://doi.org/10.1109/WCNC.2007.8

2 commentaires
Afficher Aucune Masquer Aucune

Chinedu Olebu le 25 Mai 2019

Modifié(e) : Chinedu Olebu le 26 Mai 2019

Ouvrir dans MATLAB Online

Okay, no answers yet? Well, I've been able to come up with as simple code to implement a No Regret Algorithm. However, I don't get the desired curve for the graph of Joint Probababilities against the number of iterations. Please I would need someone to at least put me throught adjusting the code to produce the expected outcome with respect to the joint probabilities.

clear all;
close all
clc;
%% Defining the actions for each player(player 1 and Player 2)
r1 = 0.5;r1_prime = 1;r_1 = 0.5;r_1_prime = 1;
RT1 = [];R_T1 = [];                                                                                                                                                                                                                                                                                                                        
miu = 5; %letting miu as 5;
iterations = 1:500;
P11 = [];
P21 = [];
%% Iterating through the time here
A1_Regret = [];
A2_Regret = [];
p1_r1 = 0.1;
for t = 1:500 %t represents the period of transmission
    D = Ut(r1,r_1) - Ut(r1,r_1_prime)+ Ut(r1,r_1) - Ut(r1_prime,r_1)+ Ut(r1,r_1) - Ut(r1_prime,r_1_prime);
    Regret = (1/t)*(D);
    A1_Regret = [A1_Regret Regret];
    p1_r1_prime = (1/5)*(Regret);
    p1_r1 = 1 - p1_r1_prime;
    P11 = [P11 p1_r1_prime];
end
%% Iterating through the time here
p1_r1 = 0.8;
for t = 1:500 %t represents the period of transmission
    D = Ut(r1_prime,r_1) - Ut(r1_prime,r_1_prime)+ Ut(r1_prime,r_1) - Ut(r1,r_1)+ Ut(r1_prime,r_1) - Ut(r1,r_1_prime);
    Regret = (1/t)*(D);
    A2_Regret = [A2_Regret Regret];
    p1_r1_prime = (1/5)*(Regret);
    p1_r1 = 1 - p1_r1_prime;
    P21 = [P21 p1_r1_prime];
end
%% Iterating through the time here
p1_r1 = 0.8;
P12 = [];
for t = 1:500 %t represents the period of transmission
    D = Ut(r1,r_1_prime) - Ut(r1,r_1)+ Ut(r1,r_1_prime) - Ut(r1_prime,r_1)+ Ut(r1,r_1_prime) - Ut(r1_prime,r_1);
    Regret = (1/t)*(D);
    A2_Regret = [A2_Regret Regret];
    p1_r1_prime = (1/5)*(Regret);
    p1_r1 = 1 - p1_r1_prime;
    P12 = [P12 p1_r1_prime];
end
P22 = [];
for t = 1:500 %t represents the period of transmission
    D = Ut(r1_prime,r_1_prime) - Ut(r1_prime,r_1)+ Ut(r1_prime,r_1_prime) - Ut(r1,r_1)+ Ut(r1_prime,r_1_prime) - Ut(r1,r_1_prime);
    Regret = (1/t)*(D);
    A2_Regret = [A2_Regret Regret];
    p1_r1_prime = (1/5)*(Regret);
    p1_r1 = 1 - p1_r1_prime;
    P22 = [P22 p1_r1_prime];
end
%% This function computes the regret here
function u =  Ut(r,r_minus)
Go = 2.8;
G1 = r + r_minus;
thau = 0.1;
if(G1 < Go)
    m = G1*(1+G1+thau*G1*(1+G1+(thau*G1/2)))*exp(-G1*(1+2*thau));
    n = (G1*(1+2*thau)-(1-exp(-thau*G1))+(1+thau*G1)*exp(-G1*(1+thau)));
    S1=m/n; 
    Ri = r*S1/(r+r_minus);
    u = Ri;
end
end

The Function Ut calculates the utility of the player in question for the strategy combination of the player in question. Each strategy combinatio represents the action of the player. I took p1_r1_prime and p1_r1 as the probability distribution of the regrets of the player.