kfoldLoss() values have inconsistent precision between different iterations of a loop

1 vue (au cours des 30 derniers jours)
Leon
Leon le 22 Avr 2024
I am training an RBF SVM with leave-one-out cross-validation using 94 observations and I am surpised to find that the precision of the result of kfoldLoss() isn't consistent when comparing models that have the same loss (or accuracy). For example, an accuracy of 76/94 does not always produce exactly the same value, with a variation of around 1e-15. The error is completely negligible except for comparing values or searching for the maximum, etc. The only thing that should be different is which 76 of the 94 folds are correct, but this should have no effect on the value or precision of the result.
I'm using a parfor loop to test many combinations of features (e.g. 260K combinations) and measuring the accuracy using accuracy = 1 - kfoldLoss(Mdl). I then use max() to find the position of the result with the highest accuracy; however, sometimes this does not work because there can be tiny variations in the precision. How is this even possible?
With 94 observations, there are only 94 possible accuracy levels. In my latest test, the peak accuracy is 76 out of 94, which is 0.808510638297872...etc.
Eight of the models tested have this 76 / 94 accuracy but it isn't stored with the same precision in the same double-precision vector. Precision errors are inevitable, but I would have expected MATLAB to always return the same result for 76 / 94.
I'm using a parfor loop. Could this have something to do with it? Is it possible for one thread to somehow produce a different precision from others? It's an Intel i7-7700 running MATLAB 2024a on Windows 10 .
% Abbreviated code. "combinations" is a cell array with each cell
% containing a vector of the features to select from the training data
accuracy = [];
parfor i = 1:length(combinations)
td_sel = training_data(:, cell2mat(combinations(i)));
Mdl = fitcsvm(td_sel, response_name, 'KernelFunction', 'RBF', 'KFold', 94, 'CacheSize', 'maximal')
accuracy(i) = 1 - kfoldLoss(Mdl);
end
[max_val, max_pos] = max(accuracies)
max_val =
0.808510638297872
max_pos =
52793
% Find all values that are very close to this value. But I don't understand
% how the precision (in the storage of the value) can be different
a = find(abs(accuracies - max_val) < 1e-10)
accuracies(a)
a =
6829
6891
6989
13699
21936
22778
45270
52793
ans =
0.808510638297872
0.808510638297872
0.808510638297872
0.808510638297872
0.808510638297872
0.808510638297872
0.808510638297872
0.808510638297872
accuracies(a) - max_val
ans =
1.0e-15 *
-0.111022302462516
-0.111022302462516
-0.111022302462516
-0.111022302462516
-0.111022302462516
-0.111022302462516
-0.111022302462516
0
accuracies(a) - 76/94
ans =
1.0e-15 *
-0.111022302462516
-0.111022302462516
-0.111022302462516
-0.111022302462516
-0.111022302462516
-0.111022302462516
-0.111022302462516
0
Thanks.

Réponses (1)

Shubham
Shubham le 17 Mai 2024
Hi Leon,
The phenomenon you're observing, where seemingly identical operations result in tiny precision differences, can indeed stem from the parallel execution of your code in a parfor loop, among other factors. Here are some points to consider that might explain the behavior:
1. Floating-Point Arithmetic and Parallelism
Floating-point arithmetic in computers does not always behave in the way we might intuitively expect, especially under parallel computation scenarios. This is due to several factors:
  • Non-Associativity of Floating-Point Operations: The result of floating-point arithmetic (like addition or multiplication) can depend on the order in which operations are performed. In a parallel computing environment, operations might be executed in different orders across threads due to variations in execution speed, leading to slight discrepancies in results.
  • Differences in Intermediate Precision: Different threads or processes might use different strategies for maintaining intermediate values, especially if they're utilizing different hardware resources (like different cores or different vectorization capabilities). This can lead to tiny differences in the final results.
2. MATLAB's Parallel Computing Toolbox
When using MATLAB's Parallel Computing Toolbox with a parfor loop, each iteration is executed independently across the available workers in your computing pool. Although each worker is supposed to perform the same operations, the non-deterministic nature of parallel execution can lead to the discrepancies you've observed, especially with floating-point computations.
3. Implications for Your Work
The differences you're seeing are on the order of 1e-15, which is significantly smaller than the precision most applications would require. However, when comparing floating-point numbers or searching for maximum values, these tiny differences can indeed become relevant.
Possilble Workarounds
  • Rounding: For comparison purposes, you might consider rounding your accuracy values to a certain number of significant digits that makes sense for your application. This can mitigate the effect of tiny discrepancies.
accuracies_rounded = round(accuracies, 15); % Adjust the number of digits as appropriate
  • Using a Tolerance for Comparisons: Instead of looking for exact matches, use a tolerance when comparing floating-point numbers. It seems you're already doing something similar with find(abs(accuracies - max_val) < 1e-10). Adjusting the tolerance level appropriately can help manage the precision issues.
  • Analyzing Results with Care: When dealing with floating-point arithmetic, especially in parallel computing environments, always consider the possibility of such tiny discrepancies. Design your algorithms and result analysis to be robust against these minor differences.
In summary, what you're experiencing is a common aspect of floating-point arithmetic in parallel computing environments. Adjusting your approach to comparison and result analysis to account for these nuances will be key in managing the impact on your work.

Catégories

En savoir plus sur Loops and Conditional Statements dans Help Center et File Exchange

Produits


Version

R2024a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by