fitcsvm with identical variables gives different result on different machines
7 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
I encountered this weird problem and it totally messed up my experiment data (i.e. I can't reproduce same thing on two computers), so I did some testing and find that it's caused by fitcsvm function.
I made a simple steps of reproduction so you can give it a try, if you're curious. It requires Statistics and Machine Learning Toolbox.
- Download the data from this MAT file: https://drive.google.com/file/d/0B-nVQqvDdrrIaVZUSElKUlVzTU0/view?usp=sharing
- The code below, which is a simplified example derived from my research:
clear all
rng(90);
load('bugtestdata.mat')
SVMModel = fitcsvm(inputs,outputs,'KernelFunction','rbf',...
'OutlierFraction',0.2,...
'BoxConstraint',10,'ClassNames',[0,1]);
disp(rand(1))
disp(SVMModel.NumIterations)
disp(SVMModel.Bias)
Note: from what I can tell, fitcsvm function (at least with my inputs) doesn't contain anything that is random seed based. But just in case, I added rng(90) before. It really doesn't have any effect on this bug, though (tested).
So, with this simple code, I can get 2 different results on 5 computers in total (all of them are 64-bit MATLAB)
Result no.1:
0.1531
258
0.2385
Can be reproduced on:
- My laptop: OS: Microsoft Windows 7 Ultimate; Matlab: R2016a
- My uni's supercomputer: OS: Linux 2.6.32-642.3.1.el6.x86_64 #1 SMP Tue Jul 12 18:30:56 UTC 2016 x86_64; Matlab: R2016a
- My uni's lab computer: OS: Win 10; R2016a
Result no.2:
0.1531
349
0.1921
Can be reproduced on:
- A virtual desktop provided by my university: OS: Microsoft Windows 8.1 Enterprise; Matlab: R2016a
- My desktop computer, OS: Win 7; Matlab: R2016a / R2016b
As you can see, they seem totally random: two of my personal computers have same OS, but it gives different answers.
All the MATLAB have academic license.
If anyone can help, it would be very appreciated.
3 commentaires
Walter Roberson
le 19 Sep 2016
I have tried a couple of different configurations here, native or virtual machines; so far I have only seen Result #1. I am loading up a Windows 8 virtual machine now to test on.
Réponse acceptée
Ilya
le 20 Sep 2016
My guess is that gradients for two or more observations become equal within floating-point accuracy during optimization. The solver then picks one observation for update in one setup and another observation in another setup. From that point on, the optimization paths are different.
Such problems arise when you have discrete predictors. Your predictors 3 to 6 have 4 distinct values each. If you add a small amount of white noise to your predictors, I suspect the results returned in all configurations are going to be identical (or almost identical).
Standardizing the data would likely improve learning as well since the standard deviations for predictors 3 and 6 differ by two orders of magnitude.
2 commentaires
Plus de réponses (0)
Voir également
Catégories
En savoir plus sur Analysis of Variance and Covariance dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!