Generate artificial datasets that illustrate the assumptions and characteristics of different methods.

Question

Gonçalo Gabriel le 30 Avr 2024

0
Lien

Utiliser le lien direct vers cette question

https://fr.mathworks.com/matlabcentral/answers/2113386-generate-artificial-datasets-that-illustrate-the-assumptions-and-characteristics-of-different-method

Réponse apportée : SOUMNATH PAUL le 7 Mai 2024

Generate artificial datasets that illustrate the assumptions and characteristics of different methods. Datasets

are ideally bidimensional. Among other dataset properties you can experiment with:

• number of cases

• number of classes

• proportion of classes

• distribution of points within each class (shape of point clouds)

• shape of the border between the class regions, from linear to whatever

• level of noise

• level of overlap between the classes

Consider the methods: logistic regression, LDA, QDA, Decision Tree without pruning, Decision Tree with a

maximum depth of 2, SVM linear, SVM RBF

Find for each of the listed methods a dataset where the respective assumptions are met and assumptions

of the other methods are not met (if possible). In other words a dataset where that method is hard to

beat using cross validation. Explain why this dataset is appropriate for the method. Suggestion:

use datasets with 2 predictors and 2 classes that can also be visualized. This is not mandatory.

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Answer 1

SOUMNATH PAUL le 7 Mai 2024

0
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/2113386-generate-artificial-datasets-that-illustrate-the-assumptions-and-characteristics-of-different-method#answer_1453486

Ouvrir dans MATLAB Online

Hi @Gabriel,

Below, I'll outline how to generate datasets for each method listed (Logistic Regression, LDA, QDA, Decision Trees, and SVMs) and explain why these datasets are particularly suited to each method. We will create scenarios where each method's assumptions are met, and which is specific to the model used

Logistic Regression: Best when data is linearly separable.

% Dataset with identical covariance matrices 
rng(1); % For reproducibility 
X = [randn(100,2)*0.75+ones(100,2); randn(100,2)*0.75-ones(100,2)]; 
Y = [ones(100,1); zeros(100,1)]; 

LDA: The ideal data set are classes with identical covariance matrices and means that differ

% Dataset with identical covariance matrices 
rng(2); 
X = [mvnrnd([1 2], [1 0.5; 0.5 1], 100); mvnrnd([-1 -2], [1 0.5; 0.5 1], 100)]; 
Y = [ones(100,1); zeros(100,1)];

QDA: Works best in classes with distinct covariance matrices

% Dataset with distinct covariance matrices 
rng(3); 
X = [mvnrnd([1 2], [1 0.5; 0.5 1], 100); mvnrnd([-1 -2], [2 -1; -1 2], 100)]; 
Y = [ones(100,1); zeros(100,1)]; 

Decision Tree (Maximum depth 2): Works best in complex decision boundaries with interactions and heterogeneity

% Complex boundaries dataset 
[X, Y] = meshgrid(linspace(-2, 2, 20), linspace(-2, 2, 20)); 
X = [X(:), Y(:)]; 
Y = xor(X(:,1) > 0, X(:,2) > 0); 
Y = double(Y); 

Hope it helps!

Regards Soumnath

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Generate artificial datasets that illustrate the assumptions and characteristics of different methods.

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Réponses (1)

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Voir également

Catégories

Tags

Community Treasure Hunt

Generate artificial datasets that illustrate the assumptions and characteristics of different methods.

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Réponses (1)

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Voir également

Catégories

Tags

Community Treasure Hunt

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens