How does Treebagger handle missing values?
5 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
I've seen bits and pieces of this answer, such that NaNs get ignored in Treebagger, but no explicit answer. How are the NaNs being ignored? Does the entire row or column containing a NaN get removed? Or if an observation in the training data for an individual tree is missing that variable, is the variable simply not used on that individual tree but still used in other trees in the random forest? Or do the missing values get imputed? If so, with what?
If anyone could give me a definitive answer on what the Treebagger function is doing with them that would be amazing.
0 commentaires
Réponses (1)
Matlab
le 25 Nov 2017
Random forest consists of the decision tree. I think the answer of the question is how fittree resolve the missing value.Actually the question can divide into two parts——training part and prediction part. In default, when it comes to split a node, it will ignore the sample whose testing value is missing in the impurity computation. It also can use another split method surrogate decision splits to deal with the missing value. The details are explained in the help document. When it comes to Prediction, the sample is missing in the testing attribute.I'm not sure about this part. It will produce some copies, and each copy will come along the branch with corresponding probability. The main idea is from the paper 《Induction of the decision tree》
0 commentaires
Voir également
Catégories
En savoir plus sur Classification Ensembles dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!