# How to use chi2gof within CUPID

8 vues (au cours des 30 derniers jours)
Sim le 22 Juin 2023
Commenté : Sim le 26 Juin 2023
[The same question on the CUPID GitHub]
Two examples of usage of the Matlab's "Chi-square goodness-of-fit test" (chi2gof) function are the following:
First (comparing two frequency distributions):
Population = [996, 749, 370, 53, 9, 3, 1, 0];
Sample = [647, 486, 100, 22, 0, 0, 0, 0];
Population2 = [996, 749, 370, sum(Population(4:8))];
Sample2 = [647, 486, 100, sum(Sample(4:8))];
x = [];
for i = 1:length(Sample2)
x = [x,i*ones(1,Sample2(i))];
end
edges = .5+(0:length(Sample2));
[h,p,k] = chi2gof(x,'Expected',Population2,'Edges',edges)
Second (fit a distribution to data):
bins = 0:5;
obsCounts = [6 16 10 12 4 2];
n = sum(obsCounts);
pd = fitdist(bins','Poisson','Frequency',obsCounts');
expCounts = n * pdf(pd,bins);
[h,p,st] = chi2gof(bins,'Ctrs',bins,...
'Frequency',obsCounts, ...
'Expected',expCounts,...
'NParams',1)
But, how can I use the chi2gof function within CUPID?
Here below an example where I would like to use the Matlab's chi2gof function :
% (1) create a "truncated dataset"
pd = makedist('Weibull','a',3,'b',5);
t = truncate(pd,3,inf);
data_trunc = random(t,10000,1);
% (2) fit a distribution (in this case the "Weibull2") to the "truncated test"
fittedDist = TruncatedXlow(Weibull2(2,2),3);
% (3) estimate the Weibull parameters by maximum likelihood, allowing for the truncation.
fittedDist.EstML(data_trunc);
% (4) plot both the "truncated test" (through the histogram) and the "fitting distribution"
% (in this case the "Weibull2" with Weibull's parameters estimated by maximum likelihood)
figure
xgrid = linspace(0,100,1000)';
histogram(data_trunc,100,'Normalization','pdf','facecolor','blue')
line(xgrid,fittedDist.PDF(xgrid),'Linewidth',2,'color','red')
xlim([2.5 6])
##### 0 commentairesAfficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

### Réponse acceptée

Jeff Miller le 23 Juin 2023
Yes, that is correct. The successive bin probabilities are the differences of the successive CDF values, and the expected number is the total N times the bin probability--just as you have computed it.
##### 2 commentairesAfficher AucuneMasquer Aucune
Sim le 23 Juin 2023
Thanks a lot @Jeff Miller, very kind!! :-)
Sim le 26 Juin 2023
I accepted the @Jeff Miller's answer
"Yes, that is correct. The successive bin probabilities are the differences of the successive CDF values, and the expected number is the total N times the bin probability--just as you have computed it."
since it confirms what I showed in my Answer (please see my two examples called "Test 1" and "Test 2"):
"I might have found a solution that makes sense to me and gives me what I would expect, even though I am not 100% sure it is correct... maybe, experts of CUPID and chi2gof might tell me if this is correct.... Test 1.... Test 2....."

Connectez-vous pour commenter.

### Plus de réponses (1)

Sim le 22 Juin 2023
Modifié(e) : Sim le 22 Juin 2023
I might have found a solution that makes sense to me and gives me what I would expect, even though I am not 100% sure it is correct... maybe, experts of CUPID and chi2gof might tell me if this is correct:
Test 1: I produce an artifical set of data following a distribution (A) and I fit those data with the same distribution (A)
% (1) create a "truncated dataset"
pd = makedist('Exponential','mu',1); % <-- dataset following a distribution (A)
whereToTruncate = 2;
t = truncate(pd,whereToTruncate,inf);
data_trunc = random(t,10000,1);
% (2) fit a distribution to the "truncated test"
fittedDist = TruncatedXlow(Exponential(1),whereToTruncate); % <-- fitting distribution (A)
% (3) estimate the distribution parameters by maximum likelihood, allowing for the truncation.
fittedDist.EstML(data_trunc);
% (4) plot both the "truncated test" (through the histogram) and the "fitting distribution"
figure
xgrid = linspace(0,10,1000)';
num_bins = 50;
hold on
histogram(data_trunc,num_bins,'Normalization','pdf','facecolor','blue')
line(xgrid,fittedDist.PDF(xgrid),'Linewidth',2,'color','red')
hold off
xlim([0 7])
% (5) calculate the Chi-square goodness-of-fit test (chi2gof)
bin_edges = linspace(min(data_trunc), max(data_trunc), num_bins+1);
expected_values = numel(data_trunc) * diff(fittedDist.CDF(bin_edges));
[h,p,st] = chi2gof(data_trunc, 'Expected', expected_values)
% Output Test 1
h =
0
p =
0.55248
st =
struct with fields:
chi2stat: 21.469
df: 23
edges: [2.0001 2.2661 2.5321 2.7982 3.0642 3.3302 3.5963 3.8623 4.1283 4.3944 4.6604 4.9264 5.1925 5.4585 5.7245 5.9906 ]
O: [2368 1798 1344 1107 810 594 442 333 294 212 165 116 113 68 53 37 33 28 15 15 18 11 5 21]
E: [2348.7 1797.1 1375 1052 804.95 615.89 471.24 360.56 275.87 211.08 161.5 123.57 94.548 72.341 55.351 42.35 32.404 ]
Test 2: I produce an artifical set of data following a distribution (A) and I fit those data with a different distribution (B)
% (1) create a "truncated dataset"
pd = makedist('Exponential','mu',1); % <-- dataset following a distribution (A)
whereToTruncate = 2;
t = truncate(pd,whereToTruncate,inf);
data_trunc = random(t,10000,1);
% (2) fit a distribution to the "truncated test"
fittedDist = TruncatedXlow(Normal(0,1),whereToTruncate); % <-- fitting distribution (B)
% (3) estimate the distribution parameters by maximum likelihood, allowing for the truncation.
fittedDist.EstML(data_trunc);
% (4) plot both the "truncated test" (through the histogram) and the "fitting distribution"
figure
xgrid = linspace(0,10,1000)';
num_bins = 50;
hold on
histogram(data_trunc,num_bins,'Normalization','pdf','facecolor','blue')
line(xgrid,fittedDist.PDF(xgrid),'Linewidth',2,'color','red')
hold off
xlim([0 7])
% (5) calculate the Chi-square goodness-of-fit test (chi2gof)
bin_edges = linspace(min(data_trunc), max(data_trunc), num_bins+1);
expected_values = numel(data_trunc) * diff(fittedDist.CDF(bin_edges));
[h,p,st] = chi2gof(data_trunc, 'Expected', expected_values)
% Output Test 2
h =
1
p =
6.4417e-116
st =
struct with fields:
chi2stat: 628.59
df: 26
edges: [2.0001 2.1895 2.3789 2.5682 2.7576 2.947 3.1364 3.3258 3.5152 3.7046 3.8939 4.0833 4.2727 4.4621 4.6515 4.8409 ]
O: [1742 1409 1198 959 798 699 561 463 391 295 266 205 162 135 114 102 86 73 56 51 39 30 22 18 16 20 90]
E: [1386.2 1248.4 1114.2 985.49 863.77 750.27 645.8 550.88 465.67 390.1 323.84 266.42 217.2 175.48 140.5 111.47 87.65 ]
##### 0 commentairesAfficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

### Catégories

En savoir plus sur Hypothesis Tests dans Help Center et File Exchange

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by