How can I scale CDF normal distribution values to match actual data? Calculating R^2?

1 vue (au cours des 30 derniers jours)
Macy
Macy le 15 Fév 2023
Hi everyone, How can I calculate R^2 for the actual data and the normal fit distribution? The problem I am having is my normal fit cdf values are on a scale of 0 to 1, and I would like to scale this so that is matches the scale of the actual data (0 to 2310). Because in the third to last step I must find the difference between the actual and normal predicted data.
Table = readtable("practice3.xlsx");
actual_values = Table.values;
actual_values = sort(actual_values)
actual_values = 10×1
50 80 350 370 450 700 1060 1100 2000 2310
hold on
cdfplot(actual_values); % Plot the empirical CDF
normalfit = fitdist(actual_values,'Normal'); % fit the normal distribution to the data
cdf_normal = cdf('Normal', actual_values, normalfit.mu, normalfit.sigma); % generate CDF values for each of the fitted distributions
plot(actual_values,cdf_normal) % plot the normal distribution
hold off
grid on
predicted_values = cdf_normal %HERE IS THE PROBLEM: cdf_normal ranges from 0 to 1, how can I scale cdf_normal to match the scale of the actual data, which has a max of 2310?
predicted_values = 10×1
0.1530 0.1623 0.2616 0.2701 0.3051 0.4251 0.6078 0.6274 0.9307 0.9699
% Compute R^2, which is 1 - (sum of squared residuals/total sum of squares)
SSR = sum(predicted_values - actual_values).^2;
TSS = sum(((actual_values - mean(actual_values)).^2));
Rsquared = 1 - SSR/TSS % Results in incorrect R value (R should be less than 1)
Rsquared = -12.1334

Réponses (1)

Oguz Kaan Hancioglu
Oguz Kaan Hancioglu le 15 Fév 2023
I think there is a problem in your calculation. Your calculation uses the x value of the actual values and F(x) value of the predicted values.
cdfplot(actual_values); % Plot the empirical CDF
cdfplot empirical CDF using your x-axis values. If you use the handle of the cdfplot you can access the F(x) value of your data. Change this as,
[h,stats] = cdfplot(actual_values); % Plot the empirical CDF
% don't close the cdfplot to use its handle
Fx = h.YData;
After you can use this Fx value in your your calculation.
% Compute R^2, which is 1 - (sum of squared residuals/total sum of squares)
SSR = sum(predicted_values - Fx).^2;
TSS = sum(((Fx - mean(Fx)).^2));
Rsquared = 1 - SSR/TSS % Results in incorrect R value (R should be less than 1)
  2 commentaires
Macy
Macy le 15 Fév 2023
I could not get this too work, I am getting an array of 22 Rsquared values.
Oguz Kaan Hancioglu
Oguz Kaan Hancioglu le 15 Fév 2023
That's caused by the cdfplot function. When you enter the actual_values into this function the cdfplot modifies the values of the actual_values and generates XData. You can examine h.Xdata. You will see that cdfplot writes the same element twice and adds -inf and +inf to your actual_values.
You can get your values ​​by manual indexing.
Fxx = Fx(2:2:20);
The vectors are the same length and correspond to the actual_values. Now you can calculate the R^2 as follow.
Fxx = Fx(2:2:20);
% Compute R^2, which is 1 - (sum of squared residuals/total sum of squares)
SSR = sum(predicted_values - Fxx).^2;
TSS = sum(((Fxx - mean(Fxx)).^2));
Rsquared = 1 - SSR/TSS % Results in incorrect R value (R should be less than 1)
I calculated 0.9450. It worked. However I don't know any idea why cdfplot use the same element twice.
Best regard

Connectez-vous pour commenter.

Produits


Version

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by