ClassificationTree with unequal costs

12 vues (au cours des 30 derniers jours)
Wes
Wes le 15 Sep 2011
Hello,
I have a question regarding the predict functionality on a classification tree when the classification costs are not equal. Specifically, when costs are unequal the resulting decision tree contains leaves for which the node class is not equal to the class with the maximum probability (instead, it is based on minimizing the cost). Now, if I use this tree to predict the outcome based on a data set, it should return the node class which is based on the unequal costs, right? Below is a simple example which illustrates the problem (I am using Matlab 2011a). Does the predict function only give the result with the maximum posterior probability, without taking into account the costs?
Thanks, Wes
% simple example
load fisheriris
% unequal cost function for illustration
costMat = [0, 1, 1; 1, 0, 10; 1, 1, 0;];
tree = ClassificationTree.fit(meas,species,'Cost', ...
costMat, 'ClassNames', {'setosa','versicolor','virginica'});
view(tree, 'mode', 'graph');
% look at node 8 (should be the rightmost node labeled 'versicolor')
tree.ClassProb(8,:)
tree.NodeClass(8)
% Note that class prob indicates that virginica is the most likely
% class, but the NodeClass is actually versicolor, because of the
% costs, so far so good!
% Use the tree to predict the results
[l,s,n,c] = predict(tree, meas);
% Look at the labels for examples that ended in node 8
% We expect versicolor based on the label for this node,
% however, they all show virginica
l(n==8)

Réponses (1)

Ilya
Ilya le 15 Sep 2011
Yes, ClassificationTree always predicts class labels based on posterior probabilities. In that, ClassificationTree/predict deviates from classregtree/eval method. Unfortunately, this is not explained in the documentation in sufficient detail.
If you want to predict labels based on costs, you can do what you said
[~,~,n] = predict(tree, meas); tree.NodeClass(n)
Or you can apply the "average cost" correction before growing the tree. This correction is used by a tree grown with costs. Compare:
t1 = ClassificationTree.fit(meas,species,'Cost',costMat,'ClassNames',{'setosa' 'versicolor' 'virginica'})
and
t2 = ClassificationTree.fit(meas,species,'prior',sum(costMat,2),'ClassNames',{'setosa' 'versicolor' 'virginica'})
The trees are identical, but the 2nd tree predicts 'versicolor' for observations that landed on node 8.

Catégories

En savoir plus sur Image Data Workflows dans Help Center et File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by