Generate artificial datasets that illustrate the assumptions and characteristics of different methods. Datasets
are ideally bidimensional. Among other dataset properties you can experiment with:
• number of cases
• number of classes
• proportion of classes
• distribution of points within each class (shape of point clouds)
• shape of the border between the class regions, from linear to whatever
• level of noise
• level of overlap between the classes
Consider the methods: logistic regression, LDA, QDA, Decision Tree without pruning, Decision Tree with a
maximum depth of 2, SVM linear, SVM RBF
Find for each of the listed methods a dataset where the respective assumptions are met and assumptions
of the other methods are not met (if possible). In other words a dataset where that method is hard to
beat using cross validation. Explain why this dataset is appropriate for the method. Suggestion:
use datasets with 2 predictors and 2 classes that can also be visualized. This is not mandatory.