Density Preserving Sampling (DPS) - deterministic crossvalidation

Version 1.1.0.0 (2,62 ko) par Marcin
MATLAB implementation of the DPS method, able to save computations when compared to cross-validation
599 téléchargements
Mise à jour 11 déc. 2012

Afficher la licence

The DPS method aims at producing representative splits of data in terms of PDF similarity. DPS is deterministic, so the resultant split is always the same for the same input. The method can serve as a more computationally efficient alternative to CV-based performance estimation. For the result of CV estimation to be more reliable, the whole CV procedure should be repeated multiple times. For example rather than performing a single run of 8-fold CV, which can produce a very unreliable estimate (see the reference below), a number of runs should be performed (typically 10). This means that to obtain the performance estimate, you need to train and test your model 10x8=80 times! However, using DPS you would only need to this 8 times. Assuming quadratic computational complexity of a typical learning algorithm, this can result in quite considerable savings of computational time. This is particularly useful when the performance estimation procedure (10x8 models in the above example) needs to be repeated multiple times in the course of parameter optimization (e.g. selecting optimal order of a polynomial, selecting the number of principal components to use etc.).

The attached figure depicts decision boundaries of a simple parametric classifier trained using a single fold obtained using 8-fold CV (red) and 8-fold DPS (black), superimposed on a scatter plot of the cone-torus dataset. Note how stable the decision boundaries are in the case of DPS and how they differ between various folds in case of CV.

The method has been extensively tested using datasets from the UCI Machine Learning repository, it has also been a part of a recent ISMIS 2011 competition winning solution and has been included in the current release of the PRTools toolbox. I would however welcome any feedback on the performance of DPS when applied to other problems you might be working on.

More information on DPS can be found in the following paper:
Budka, M. and Gabrys, B., 2012.
Density Preserving Sampling: Robust and Efficient Alternative to Cross-validation for Error Estimation.
IEEE Transactions on Neural Networks and Learning Systems, DOI: 10.1109/TNNLS.2012.2222925.

Citation pour cette source

Marcin (2024). Density Preserving Sampling (DPS) - deterministic crossvalidation (https://www.mathworks.com/matlabcentral/fileexchange/39390-density-preserving-sampling-dps-deterministic-crossvalidation), MATLAB Central File Exchange. Récupéré le .

Compatibilité avec les versions de MATLAB
Créé avec R2007b
Compatible avec toutes les versions
Plateformes compatibles
Windows macOS Linux
Catégories
En savoir plus sur Pattern Recognition and Classification dans Help Center et MATLAB Answers

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!
Version Publié le Notes de version
1.1.0.0

Cleaned up the code a little bit.

1.0.0.0