Excluding data not of a classes from a KNN classifier
2 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
I have a dataset made of spectral data I’m building a KNN classifier for using the mechine and statstical learning toolbox. This data is going to have new data added that isn’t necessarily from any of the trained classes. I’m now trying to build a system that would detect data that is too dissimilar from any of the trained classes and deny classification. My current idea for this is to use the evidence value generated with the use of weighted distance measures and set a threshold value for this value at which it is declared too low and denied classification. Currently I don’t know how to access this value as the score function only provides normalised results. I’d appreciate any advice either how to access the sum of weighted distances or an alternate approach that would allow me to achieve my goal.
0 commentaires
Réponses (1)
Ayush
le 5 Juin 2024
Hi,
To detect data that is too dissimilar from any of the trained classes in your KNN classifier, you can compute the distances from a new observation to all points in the training set, weighting them as necessary, and then set a threshold for classification denial based on these distances. Refer to the pseudo code below for better understanding:
% Example training data
X_train = [rand(100,2)*10; rand(100,2)*10+5]; % 200x2 matrix of features
Y_train = [ones(100,1); zeros(100,1)]; % 200x1 matrix of labels
% Train KNN
knnModel = fitcknn(X_train, Y_train, 'NumNeighbors', 5); % Adjust 'NumNeighbors' as needed
% New observation
X_new = [7, 7]; % Example new data point
% Compute Euclidean distances
distances = sqrt(sum((X_train - X_new).^2, 2));
% Weighting scheme: Inverse of distance
weights = 1 ./ distances;
weights(isinf(weights)) = 0; % Handle division by zero for exact matches
% Calculate sum of weighted distances
sumWeightedDistances = sum(weights);
% Define threshold
threshold = 0.5; % Example threshold, adjust based on experimentation
% Check against threshold
if sumWeightedDistances < threshold
disp('New observation is too dissimilar, classification denied.');
else
% Proceed with classification
[label, score, cost] = predict(knnModel, X_new);
disp(['Classification accepted. Predicted label: ', num2str(label)]);
end
So, I have used the "fitcknn" function to train my KNN classifier. I have used Euclidean distance to find the distances for new observation data. Finally, based on the threshold value, which could be based on the sum of weighted distances, the minimum distance, or another statistic that makes sense for your application, you can remove the unwanted observation data; for the above example, the sum of weighted distances is used for comparison with a threshold.
The documentation of the "fitcknn" function is as follows:
0 commentaires
Voir également
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!