Feature Selection in TreeBagger
Afficher commentaires plus anciens
Hello MathWorks community
I'm currently working with the TreeBagger class to generate some classification tree esembles. Now I would like to know, how it decides wich features are used for splitting the data. If I create for example an esemble of tree stumps with 5000 trees and use it to classify a dataset with two features (e.g. VRQL-Value and maximum frequency), and then check which feature was selected for splitting for every single tree like this:
cellArray={};
for y=1:length(Random_Forest_Model.Trees)
cellArray{y}=Random_Forest_Model.Trees{y}.CutPredictor{1};
end
It happens in some cases, that only one feature was selected for all 5000 trees and the other feature was selected in not a single case (i.e. cellArray looks like this: {'x2', 'x2', 'x2', ..., 'x2', }). This can also happen with multiple features: only one feature is selected, the others are ignored.
Maybe important things to mention about the dataset:
-One feature achieves Values from 1 to 100, the other one from about 200 to 1200
-The classes are imbalanced (class 1: 52 entries, class 2: over 300 entries)
-only the greater class contains the NaNs
-both features contain NaNs
My question now is: how can I achieve, that the TreeBagger uses all features for classification and not only one or how can I in genreal achieve a more balanced selection of features.
Réponse acceptée
Plus de réponses (0)
Catégories
En savoir plus sur Classification Ensembles dans Centre d'aide et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!