Generating random samples from a 2D space matching the probability density function estimated from a discrete set of data

14 vues (au cours des 30 derniers jours)
I have a set of around 12k points in the {a,e} space that looks as in the figure below (yes, there are 12 thousand points, but most of them are concentrated in the bottom left). I have to extract around 600k random points from this space, and I want the resulting set to match the hypothetical 2D probability density function that has led to the initial set of 12k points. ksdensity can estimate the 2D pdf, and I can extract random samples accordingly just as this thread suggests, using randsample, but the problem is that the set of samples will be limited to the discrete points in which I meshed the domain. I could work with, for instance, a 5000x5000 mesh, but this is quite CPU-consuming and I think it leads to considerable overfitting. I wonder if there is any kind of analytical alternative that makes it easier and allows to work in the continuum. This thread suggests something but tbh I don't think it is well justified. Thanks in advance!

Réponses (1)

Vinayak
Vinayak le 17 Mai 2024
Hi Lluc-Ramon,
If you need to generate a lot of random sample points that match a 2D probability density function (pdf) derived from an initial set of points, I suggest using Kernel Density Estimation (KDE) in MATLAB. This approach creates a smooth, continuous approximation of the density, allowing you to sample efficiently without the limitations of a discretized mesh as in case of “ksdensity” and “randsample”.
I have used the kde2d from the fileExchange.
% Generate synthetic data using ChatGPT based on your image
N = 12000;
a_values = [logspace(0, 4, round(0.85 * N)), logspace(4, 5, round(0.15 * N))]';
% Generate corresponding `e_values` with some correlation
e_values = 0.9 * rand(N, 1) .* (log10(a_values) - 0.5);
e_values(e_values < 0) = 0; % Filter out negative values
e_values = e_values / 4; % Normalize `e_values`
% Plot the original data
figure;
scatter(a_values, e_values);
axis xy;
xlabel('a');
ylabel('e');
% Perform KDE
[bandwidth, density, X, Y] = kde2d([a_values, e_values]);
Unrecognized function or variable 'kde2d'.
% Plot the estimated density
figure;
scatter(a_values, e_values);
axis xy;
xlabel('a');
ylabel('e');
title('Estimated 2D PDF');
function samples = sample_kde(bandwidth, X, Y, density, num_samples)
cdf = cumsum(density(:)) / sum(density(:)); % Normalize CDF
random_values = rand(num_samples, 1);
sample_indices = arrayfun(@(x) find(cdf >= x, 1), random_values);
[row, col] = ind2sub(size(density), sample_indices);
a_samples = X(1, col)';
e_samples = Y(row, 1);
samples = [a_samples, e_samples];
end
num_samples = 600000; % Number of samples to generate
samples = sample_kde(bandwidth, X, Y, density, num_samples);
% Extract and plot sampled points
a_samples = samples(:, 1);
e_samples = samples(:, 2);
figure;
scatter(a_samples, e_samples, 1, 'filled');
xlabel('a');
ylabel('e');
title('Random Samples from Estimated 2D PDF');
This approach ensures that the data closely follows the desired distribution pattern.

Produits


Version

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by