Adjust classifier performance (sensitivity & specificity)

3 vues (au cours des 30 derniers jours)
Diogo Tecelão
Diogo Tecelão le 7 Août 2020
Réponse apportée : Ayush Aniket le 22 Jan 2025 à 9:36
Hi all,
I'm trying to build a classifier for my highly imbalanced binaty data, where I have the following stats:
Value Count Percent
0 133412 97.62%
1 3247 2.38%
My dataset has 119 features. My question is: how can I balance my classifier sensitivity and specificity results (see more details below)?
In order to deal with my imbalanced data, I'm using the ensemble classifier with the RUSBoost Method, and acessing its performance, like shown in the code below:
%% Set cross validation - holdout
part = cvpartition(classes, 'Holdout', 0.5);
istrain = training(part); % Data for fitting
istest = test(part); % Data for quality assessment
holdout_train_features = features(istrain,:);
holdout_train_classes = classes(istrain);
holdout_test_features = features(istest,:);
holdout_test_classes = classes(istest);
%% Set classifier
% Set template tree
max_mum_splits = round(sum(istrain)/2);
t = templateTree('MaxNumSplits', max_num_splits);
classifier = fitcensemble(holdout_train_features, holdout_train_classes, 'Method','RUSBoost', ...
'NumLearningCycles', 1000, 'Learners', t,'LearnRate', 0.1);
%% Test performance
% Get common classification indicators
[obtained_classes, scores] = predict(classifier, holdout_test_features);
holdout_validation_results = confusionchart(holdout_test_classes, obtained_classes);
TN = holdout_validation_results.NormalizedValues(1,1);
TP = holdout_validation_results.NormalizedValues(2,2);
FP = holdout_validation_results.NormalizedValues(1,2);
FN = holdout_validation_results.NormalizedValues(2,1);
accuracy = (TP + TN)/(TP + TN + FP + FN); % 0.99406
sensitivity = TP/(TP + FN); % 0.86445
specificity = TN/(TN + FP); % 0.99721
PPV = TP/(TP + FP); % 0.88295
NPV = TN/(TN + FN); % 0.9967
% Compute ROC curve
positiveClassIdx = find(classifier.ClassNames == 1);
[X,Y,T,AUC, OPTROCPT] = perfcurve(holdout_test_classes, scores(:,positiveClassIdx), 1);
hold on
scatter(1-OPTROCPT(1),OPTROCPT(2), 'filled')
Which gets me the following:
As can be appreciated, I get an imbalanced value of specificity (very high) and sensitivity (low). My question is: how can I adjust my classifier in order to balance the sensitivity and specificity (and PPV and NPV, of course), so that it matched my desired balance (e.g., what I show in the ROC curve: 0.97 specificity and 0.961 sensitivity)?
Many thanks for your attention,

Réponses (1)

Ayush Aniket
Ayush Aniket le 22 Jan 2025 à 9:36
To balance the sensitivity and specificity of your classifier, especially in the context of imbalanced binary data, you can try the following methods:
1. By default, classifiers often use a threshold of 0.5 to classify instances. Adjusting this threshold can help balance sensitivity and specificity.
threshold = 0.4; % Adjust this value based on ROC analysis
predicted_classes = scores(:, positiveClassIdx) >= threshold;
2. Modify the cost of misclassifications. You can assign higher costs to false negatives or false positives, depending on your goals.
% Define a cost matrix
costMatrix = [0 1; 5 0]; % Example: higher cost for false negatives
classifier = fitcensemble(holdout_train_features, holdout_train_classes, ...
'Method', 'RUSBoost', 'NumLearningCycles', 1000, ...
'Learners', t, 'LearnRate', 0.1, 'Cost', costMatrix);
Refer to the following documentation link to read about the Cost name-value argument:
3. While you're already using RUSBoost, consider combining it with other techniques, such as SMOTE (Synthetic Minority Over-sampling Technique), to balance the dataset before training. The following SMOTE package can help you out:


Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by