How do i fit a histogram properly?

I have a vector of data and i need to build an histogram and fit a normal distribution (the data are supposed to be normal). The fit seems good but the chi square test keeps failing.
I tried this way, loading the data in DATA into the variable e
%first fit
fit=e;
media=mean(fit)
sig=std(fit)
w=sig/3;
nbin=round((max(fit)-min(fit))/(w))
% rebin (if th fit is bad, try to remove data outside the 3Sigma)
% clear fit
% fit=e(e>=(media-3*sig) & e<=(media+3*sig));
% media=mean(fit)
% sig=std(fit)
% w=sig/3;
% nbin=round((max(fit)-min(fit))/(w))
figure('Name','both eyes')
histfit(fit, nbin); %make the histogram e fit the gaussian
fitBoth=fitdist(fit,'Normal'); %make the proper fit to get the parameters
%not sure if fitdist uses the nbin provided or how to pass the value
mu=fitBoth.mu; %get the fit parameters
sigma=fitBoth.sigma;
str= ['\mu=' num2str(mu) newline '\sigma=' num2str(sigma)];
annotation('textbox', [0.785773044110552 0.757296497913367 0.108809663250367 0.141321044546851],'String',str,'FitBoxToText','on', 'FontSize', 18,'EdgeColor','red');
[h,p,st]=chi2gof(fit, 'NBins',nbin, 'CDF',fitBoth) %should use the expected value from the fitdist, right?
The results mu and sigma are compatible with a old work in which the data were normal. However the chi2 test keeps refusing the hypotesis.
The code shown is the latest try, i also tried doing it "manually", getting the counts in the bin with histcounts, but i got stuck trying to get the "expected" values from the fit.
Lastly, the mu and sigma from the fit are exactly the same i got from the mean and std functions, which is suspicious, and once again i don't get how such a "good" fit could make the test fail.
Thank you in advance

Réponses (1)

Using chi2gof to assess curve fitting of a regression may not be appropriate.
T1 = readtable('https://www.mathworks.com/matlabcentral/answers/uploaded_files/924499/DATA.txt', 'VariableNamingRule','preserve')
T1 = 3852×2 table
Var1 Var2 ______ _____ 2.9383 {' '} 2.835 {' '} 2.8468 {' '} 2.8405 {' '} 2.8718 {' '} 2.844 {' '} 2.8777 {' '} 2.9787 {' '} 3.0433 {' '} 3.107 {' '} 3.1335 {' '} 3.1597 {' '} 3.236 {' '} 3.3902 {' '} 3.5122 {' '} 3.6265 {' '}
Var2_NotEmpty = nnz(~ismember(T1{:,2},{' '}))
Var2_NotEmpty = 0
[h,p,stats] = chi2gof(T1{:,1})
h = 1
p = 8.8933e-25
stats = struct with fields:
chi2stat: 129.2784 df: 7 edges: [1.5065 1.8651 2.2238 2.5824 2.9410 3.2996 3.6583 4.0169 4.3755 4.7341 5.0928] O: [26 39 192 626 986 948 673 234 73 55] E: [12.1793 62.8378 236.6779 580.9630 929.9377 970.9949 661.3803 293.7915 85.0633 18.1742]
This appears to me to confirm that the data are normally distributed.
.

3 commentaires

Andrea Carobbi
Andrea Carobbi le 12 Mar 2022
I'm sorry but isn't the null hypotesis that the data are normally distributed and h=1 means the test rejects the hypotesis?
I do not understand rejecting the hypothesis that the data are normally distributed. Every other analysis I can think of indicates that assuming the data are normally-distributed is appropriate.
T1 = readtable('https://www.mathworks.com/matlabcentral/answers/uploaded_files/924499/DATA.txt', 'VariableNamingRule','preserve')
T1 = 3852×2 table
Var1 Var2 ______ _____ 2.9383 {' '} 2.835 {' '} 2.8468 {' '} 2.8405 {' '} 2.8718 {' '} 2.844 {' '} 2.8777 {' '} 2.9787 {' '} 3.0433 {' '} 3.107 {' '} 3.1335 {' '} 3.1597 {' '} 3.236 {' '} 3.3902 {' '} 3.5122 {' '} 3.6265 {' '}
Var2_NotEmpty = nnz(~ismember(T1{:,2},{' '}))
Var2_NotEmpty = 0
figure
histfit(T1{:,1})
[h,p,stats] = chi2gof(T1{:,1})
h = 1
p = 8.8933e-25
stats = struct with fields:
chi2stat: 129.2784 df: 7 edges: [1.5065 1.8651 2.2238 2.5824 2.9410 3.2996 3.6583 4.0169 4.3755 4.7341 5.0928] O: [26 39 192 626 986 948 673 234 73 55] E: [12.1793 62.8378 236.6779 580.9630 929.9377 970.9949 661.3803 293.7915 85.0633 18.1742]
pd = fitdist(T1{:,1},'Normal')
pd =
NormalDistribution Normal distribution mu = 3.3359 [3.31888, 3.35291] sigma = 0.538644 [0.526879, 0.550949]
figure
probplot(T1{:,1});
.
Andrea Carobbi
Andrea Carobbi le 14 Mar 2022
Well, i refuse the hypotesis because the test said me so. I agree with you, the data are indeed normal, even looking at them, but if the test says they're not, i can't present the results saying they are.
My question is primary if i'm doing something wrong, cause i can't tell in any way how these scrips work, even if i look up their code. I mean, i make the histogram with my binnig, good, but the fitdist function what binning are using to fit ( i know histfit uses fitdist but the how are the parameters i get from fitdist calculated)? From the help i read that without data censoring the function calculate the mean and the sigma and stick it to the data, maybe that explain why my results are so suspiciously good. Can i make fitdist use another method, like max likehood or minimize the chi2? Lastly, i can specify a whole set of parameters for the chi2 function, but not the rule and method the function i want to check with it uses.
I tried asking a friend to do the same fit with ROOT and the results a little better than the ones i get from MatLab, so i'm really starting wondering if i'm doing everything wrong here. If what i did is all right, i can accept the results after all and say the data are not normal and move on, i don't need the data to be normal at any cost (the work that said they are had very less data to wrok with), i need to understand if i'm missing something.
I'm really sorry to ask so much, but i've been stuck on these part of the analysis for three weeks now.

Connectez-vous pour commenter.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by