Minimize the sum of squared errors between the experimental and predicted data in order to estimate two optimum parameters

In my research work, I want to do Vehicle survival fraction, survival rates were calculated using the log-logistic survival function, in equation two unknown parameters a, b is to obtain by minimize the sum of squared errors between the experimental and modeled.
equation of the model is:
S = [1 + (t/a)^b]^(-1)
age of the vehicle t = [0;1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19;20];
registered vehicles of respective age t, N = [6801342;6364669;6104616;5849372;5426898;4969076;4549439;4117610;3714272;3324896;2980512;2652830;2320935;2041282;1818527;1594733;1335572;1053197;800212;566590;376620];
experimental value u_exp = [406183941.2;270789294.1;812367882.4;541578588.2;406183941.2;1353946471;406183941.2;135394647.1;135394647.1;135394647.1;676973235.3;270789294.1;406183941.2;0;270789294.1;406183941.2;0;135394647.1;0;0;135394647.1];
u_mod = @(P) ((1+(t./P(1)).^P(2)).^(-1)).*N;
modeled value = S*N
and I want to calculate the parameters “a” and “b” by minimizing the sum of squared errors between “n exp” and “n model”.
Someone here can help me please?
Thank you already for your help!

9 commentaires

Could you kindly indicate the equation number in the article "Log-logistic distribution for survival data analysis using MCMC" that represents the function you wish to fit into the data?
I deleted my answer because I cannot make sense of this.
To ensure clarity, it would be helpful if you could provide specific details regarding the very specific mathematical function you are referring to (Use LaTeX).
Additionally, it would greatly assist us if you could plot the experimental data, allowing us to visualize the expected appearance on the figure. Since we NOT experts in your field of survival research, we can only offer assistance with the code and suggest relevant MATLAB functions, provided that the technical aspects you explain align with our understanding.
in log-logistic distribution : numerator is 1 and denominator is [1 + (t/a)^b]
and to caculate the modeled values this equation is multiplied by the N = [6801342;6364669;6104616;5849372;5426898;4969076;4549439;4117610;3714272;3324896;2980512;2652830;2320935;2041282;1818527;1594733;1335572;1053197;800212;566590;376620];
so the u_mod = {1/[1+(t/a)^b}*N;
and the parameters a and b are to be estimated by minimizing the squared error between u_exp and u_mod
Take a look here about the difference between curve fitting and distribution fitting:
I have plotted the "Log-Logistic Distribution (LLD)" model for you below based on the formula given in Wikipedia:
, where and
It's now your turn to plot your experimental data so that we can assess whether the LLD model can be fitted to the data. I kindly request you to share the MATLAB-plotted figure.
t = linspace(0, 20, 2001);
a = 2.5*[1:8]; % alpha: 8 intersection points
b = 2.^[1/2 1 3/2 2 5/2 3 3.5]; % beta : 7 curve characteristics
for i = 1:numel(a)
for j = 1:numel(b)
LLD = 1./(1 + 1./((t/a(i)).^b(j)));
plot(t, LLD), hold on
end
end
hold off, grid on
xlabel('t')
ylabel('LLD')
title('Log-Logistic Survival Function')
here is the plot of experimental data
z = [0;1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19;20];
u_exp = [4061839.412;2707892.941;8123678.824;5415785.882;4061839.412;13539464.71;4061839.412;1353946.471;1353946.471;1353946.471;6769732.353;2707892.941;4061839.412;0;2707892.941;4061839.412;0;1353946.471;0;0;1353946.471];
hold on
plot(z,u_exp,'o')
hold off
grid on
Below, I have overlaid the Log-Logistic Distribution (LLD) with your normalized experimental data. It is worth noting that it is possible to find a unique LLD that corresponds to each non-zero data point. Given that there are 4 zeros out of the total 21 data points, it implies that you can determine 17 or fewer unique LLDs. I would appreciate it if you could clarify whether this is your desired outcome from a purely mathematical perspective.
t = linspace(0, 20, 2001);
a = 2.5*[1:8]; % alpha: 8 intersection points
b = 2.^[1/2 1 3/2 2 5/2 3 3.5]; % beta : 7 curve characteristics
for i = 1:numel(a)
for j = 1:numel(b)
LLD = 1./(1 + 1./((t/a(i)).^b(j)));
plot(t, LLD), hold on
end
end
z = [0;1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19;20];
u_exp = [4061839.412;2707892.941;8123678.824;5415785.882;4061839.412;13539464.71;4061839.412;1353946.471;1353946.471;1353946.471;6769732.353;2707892.941;4061839.412;0;2707892.941;4061839.412;0;1353946.471;0;0;1353946.471];
stem(z, u_exp/max(u_exp), 'k', 'linewidth', 1.5)
hold off, grid on
xlabel('t')
ylabel('LLD')
title('Overlay Plots')

Connectez-vous pour commenter.

Réponses (2)

If you have the Statistics and Machine Learning Toolbox, you can use fitnlm to fit this function.
Unless I'm making a mistake in my thinking, though, that is not a very good function to fit your data. As t approaches zero, S approaches 1, regardless of the parameters.
rng default
% Define the data to be fit
t = [0; 1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 11; 12; 13; 14; 15; 16; 17; 18; 19; 20];
N = [3; 2; 6; 4; 3; 10; 3; 1; 1; 1; 5; 2; 3; 0; 2; 3; 0; 1; 0; 0; 1];
% Tabulate the data
tbl = table(t,N);
% Define function that will be used to fit data
% (p is a vector of fitting parameters)
S = @(p,t) (1 + (t./p(1)).^p(2)).^(-1);
% Fit the model
beta0 = [5 5]; % Initial parameter guess
mdl = fitnlm(tbl,S,beta0)
mdl =
Nonlinear regression model: N ~ (1 + (t/p1)^p2)^( - 1) Estimated Coefficients: Estimate SE tStat pValue ________ ______ _______ _________ p1 17.359 5.2563 3.3024 0.0037447 p2 31.338 257.96 0.12148 0.90459 Number of observations: 21, Error degrees of freedom: 19 Root Mean Squared Error: 2.88 R-Squared: -0.364, Adjusted R-Squared -0.436 F-statistic vs. zero model: 4.96, p-value = 0.0185
% Calculate the model values at the empirical x
y_predicted = predict(mdl,t);
% Plot the data and fit
figure
plot(t,N,'*',t,y_predicted,'g');
legend('data','fit')

1 commentaire

This solution was written prior to a significant re-write of the question, and is no longer applicable as a specific solution to the question.
I'm only leaving it here for OP's later reference, if they wish.

Connectez-vous pour commenter.

Taking into account the information provided, this solution is likely the most suitable option I can propose.
%% Experimental Data
t = [0:20];
uex = [4061839.412; 2707892.941; 8123678.824; 5415785.882; 4061839.412; 13539464.71; 4061839.412; 1353946.471; 1353946.471; 1353946.471; 6769732.353; 2707892.941; 4061839.412; 0; 2707892.941; 4061839.412; 0; 1353946.471; 0; 0; 1353946.471]';
y = uex/max(uex); % normalized data
%% Determine the parameters
idx1= find(y == 1);
idx2= find(y == 0);
tt = t; tt(1) = 1e-6; % data processing in t
yy = y; yy(idx1) = 1 - 1e-6; yy(idx2) = 1e-6; % data processing in y
b = 3; % fix beta
a = ((- (tt.^b).*(yy - 1)).^(1/b))./(yy.^(1/b)); % find alpha
%% Log-Logistic Distribution model
LLD = 1./(1 + 1./((tt./a).^b));
%% Sum of squared errors
Err = y - LLD;
sse = sum(Err.^2)
sse = 5.0000e-12
%% Plot result
stem(t , y), axis([-1 21 0 1.1]), hold on
stem(tt, LLD, 'markersize', 12), hold off, grid on
xlabel('t')
legend('Normalized Data', 'Prediction', 'fontsize', 12)
title('Prediction by Log-Logistic Distribution Model')

Catégories

Modifié(e) :

le 2 Avr 2024

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by