kfoldLoss() values have inconsistent precision between different iterations of a loop

Question

Leon le 22 Avr 2024

0
Lien

Utiliser le lien direct vers cette question

https://fr.mathworks.com/matlabcentral/answers/2110096-kfoldloss-values-have-inconsistent-precision-between-different-iterations-of-a-loop

Réponse apportée : Shubham le 17 Mai 2024

I am training an RBF SVM with leave-one-out cross-validation using 94 observations and I am surpised to find that the precision of the result of kfoldLoss() isn't consistent when comparing models that have the same loss (or accuracy). For example, an accuracy of 76/94 does not always produce exactly the same value, with a variation of around 1e-15. The error is completely negligible except for comparing values or searching for the maximum, etc. The only thing that should be different is which 76 of the 94 folds are correct, but this should have no effect on the value or precision of the result.

I'm using a parfor loop to test many combinations of features (e.g. 260K combinations) and measuring the accuracy using accuracy = 1 - kfoldLoss(Mdl). I then use max() to find the position of the result with the highest accuracy; however, sometimes this does not work because there can be tiny variations in the precision. How is this even possible?

With 94 observations, there are only 94 possible accuracy levels. In my latest test, the peak accuracy is 76 out of 94, which is 0.808510638297872...etc.

Eight of the models tested have this 76 / 94 accuracy but it isn't stored with the same precision in the same double-precision vector. Precision errors are inevitable, but I would have expected MATLAB to always return the same result for 76 / 94.

I'm using a parfor loop. Could this have something to do with it? Is it possible for one thread to somehow produce a different precision from others? It's an Intel i7-7700 running MATLAB 2024a on Windows 10 .

% Abbreviated code. "combinations" is a cell array with each cell
% containing a vector of the features to select from the training data
accuracy = [];
parfor i = 1:length(combinations)
    
    td_sel = training_data(:, cell2mat(combinations(i)));
    
    Mdl = fitcsvm(td_sel, response_name, 'KernelFunction', 'RBF', 'KFold', 94, 'CacheSize', 'maximal')
    
    accuracy(i) = 1 - kfoldLoss(Mdl);
    
end
[max_val, max_pos] = max(accuracies)

max_val =

0.808510638297872

max_pos =

52793

% Find all values that are very close to this value. But I don't understand
% how the precision (in the storage of the value) can be different
a = find(abs(accuracies - max_val) < 1e-10)
accuracies(a)

a =

6829

6891

6989

13699

21936

22778

45270

52793

ans =

0.808510638297872

accuracies(a) - max_val

ans =

1.0e-15 *

-0.111022302462516

0

accuracies(a) - 76/94

ans =

1.0e-15 *

-0.111022302462516

0

Thanks.

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Answer 1

Shubham le 17 Mai 2024

0
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/2110096-kfoldloss-values-have-inconsistent-precision-between-different-iterations-of-a-loop#answer_1459431

Ouvrir dans MATLAB Online

Hi Leon,

The phenomenon you're observing, where seemingly identical operations result in tiny precision differences, can indeed stem from the parallel execution of your code in a parfor loop, among other factors. Here are some points to consider that might explain the behavior:

1. Floating-Point Arithmetic and Parallelism

Floating-point arithmetic in computers does not always behave in the way we might intuitively expect, especially under parallel computation scenarios. This is due to several factors:

Non-Associativity of Floating-Point Operations: The result of floating-point arithmetic (like addition or multiplication) can depend on the order in which operations are performed. In a parallel computing environment, operations might be executed in different orders across threads due to variations in execution speed, leading to slight discrepancies in results.
Differences in Intermediate Precision: Different threads or processes might use different strategies for maintaining intermediate values, especially if they're utilizing different hardware resources (like different cores or different vectorization capabilities). This can lead to tiny differences in the final results.

2. MATLAB's Parallel Computing Toolbox

When using MATLAB's Parallel Computing Toolbox with a parfor loop, each iteration is executed independently across the available workers in your computing pool. Although each worker is supposed to perform the same operations, the non-deterministic nature of parallel execution can lead to the discrepancies you've observed, especially with floating-point computations.

3. Implications for Your Work

The differences you're seeing are on the order of 1e-15, which is significantly smaller than the precision most applications would require. However, when comparing floating-point numbers or searching for maximum values, these tiny differences can indeed become relevant.

Possilble Workarounds

Rounding: For comparison purposes, you might consider rounding your accuracy values to a certain number of significant digits that makes sense for your application. This can mitigate the effect of tiny discrepancies.

accuracies_rounded = round(accuracies, 15); % Adjust the number of digits as appropriate

Using a Tolerance for Comparisons: Instead of looking for exact matches, use a tolerance when comparing floating-point numbers. It seems you're already doing something similar with find(abs(accuracies - max_val) < 1e-10). Adjusting the tolerance level appropriately can help manage the precision issues.
Analyzing Results with Care: When dealing with floating-point arithmetic, especially in parallel computing environments, always consider the possibility of such tiny discrepancies. Design your algorithms and result analysis to be robust against these minor differences.

In summary, what you're experiencing is a common aspect of floating-point arithmetic in parallel computing environments. Adjusting your approach to comparison and result analysis to account for these nuances will be key in managing the impact on your work.

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

kfoldLoss() values have inconsistent precision between different iterations of a loop

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Réponses (1)

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Voir également

Catégories

Tags

Produits

Version

Community Treasure Hunt

kfoldLoss() values have inconsistent precision between different iterations of a loop

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Réponses (1)

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Voir également

Catégories

Tags

Produits

Version

Community Treasure Hunt

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens