Calculating mean squared error or maybe MISE

6 vues (au cours des 30 derniers jours)
Neuropragmatist le 25 Juil 2019
Commenté : Neuropragmatist le 8 Août 2019
Hi all,
I'm interested in comparing different bivariate histograms to an underlying 2D probability density function.
Additional info that you can skip for time:
My aim is to try and find the optimal bin size and smoothing for the histogram that best represents the known density function. In my field this is a common problem that doesn't really have a clear solution - there are many ways to estimate optimal bin size but I can't find any that also take smoothing into account, furthermore the histogram I want to compare is actually calculated as the ratio of 2 histograms generated with the same parameters but over very different underlying distributions. I have also not found any method for optimising parameters in such a situation. My ultimate aim is to generate histograms using a variety of different approaches and smoothing to try and find the 'best' or at least the best for different scenarios.
My first approach was to generate the histogram and then correlate the result with the PDF sampled at the same points (i.e. the histogram bin centers). Reading the literature a bit more I think I want to use the mean squared error (MSE) instead, but I'm not sure if this is a) appropriate or b) meaningful. Also, the Wikipedia page for MSE lists two equations and I'm not sure which is appropriate in this situation. I'm also worried that I should be calcualting the mean integrated squared error (MISE) instead, but I don't know how to do that for a discrete histogram vs a continuous PDF both of which are 2D. I have Matlab 2018b and all the toolboxes.
Here is the code I have so far:
% generate distribution of points, make histogram of these and get actual PDF underlying this
mu = [100 100];
sigma = [60 50;50 80];
num = 100;
pos1 = mvnrnd(mu,sigma,num); % the points
% in this example we will just have one distribution, but in the real data there are multiple such distributions all summed together
% which makes fitting a continuous function to the real data nearly impossible
bcx = 0:5:200;
bcy = 0:5:200;
[x,y] = meshgrid(bcx,bcy); % the grid over which to generate histogram or evaluate PDF
bcents = [x(:) y(:)];
map1 = mvnpdf(bcents,mu,sigma); % the PDF
map1 = reshape(map1,size(x));
map2 = hist3(pos1,'Ctrs',{bcx(:) bcy(:)}); % the histogram
% plot all three
axis([0 200 0 200])
axis square xy
axis square xy
axis square xy
% calculate MSE
map_pdf = map_pdf .* 25; % scale so sum is unity (i.e probability - multiply by bin area to approximate Riemann sum)
map_hist = map_hist./sum(map_hist(:)); % scale so sum is unity (i.e probability)
mse = sum((map_pdf(:)-map_hist(:)).^2) .* (1/numel(map_pdf))
cor = corr(map_pdf(:),map_hist(:),'rows','pairwise')

Réponses (1)

Ganesh Regoti
Ganesh Regoti le 8 Août 2019
Refer KSdensity which might serve your purpose. Here is the link
  1 commentaire
Neuropragmatist le 8 Août 2019
I don't think that's really relevant, I already have a PDF generated by mvnpdf and I have a histogram generated by histcounts2, the question is about how to compare the two distributions.

Connectez-vous pour commenter.




Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by