parfor and for loops - different results

Hello all, I have a strange issue that I can't quite figure out. I tried to speed up my code by introducing parrallel processing, I was successful in making my code run faster but the final output of my code changed.
The purpose of my code is to evaluate the auROC curve based on two inputs. The part of my code where I introduced parallel processing was a step where I shuffled my data randomly 1000 times, and calculated 1000 auROC curves. I tried to run this through a parfor loop:
p = randperm(size(binned_raw,1),1000);
parfor ii = 1:1000
binned_raw_shift = circshift(binned_raw,p(ii),1);
[AUROC, TPR, FPR] = get_ROC(binned_raw_shift, binned_behavior);
shuffled_raw(ii,:) = AUROC;
end
As you can see in my code I first generate 1000 random numbers that are less than or equal to the length of my input, and with each loop in the parfor loop I perform a circular shift on my input (binned_raw), and run it through a function I wrote. I collect all 1000 outputs in my matrix shuffled_raw, and I use that as a threshold to determine if any of my original auROC curves were significantly greater than my randomly shuffled data.
The issue I am having is that introducing the parfor loop at this step changes the results compared to when I only use a for loop. I do not exactly see why it should change anything because any indexing and saving of outputs should be controlled by the value of ii. I should note, the changes I observed were that when I used the parfor loop, the results we much more liberal than when I used a normal for loop, stating that a much higher percent of my original data was significantly greater than the shuffled data. Please if anyone can help me figure out why this problem is occurring I would be very greatful. Thanks a bunch.

8 commentaires

Matt J
Matt J le 26 Août 2020
Did you seed the random number generator the same for both thte parfor-loop test and the for-loop test?
Connor Johnson
Connor Johnson le 26 Août 2020
Can you elaborate what you mean by this? The answer is likely no, I didn't.
Rik
Rik le 26 Août 2020
Did you use the rng function before calling randperm? Do any of the functions called call a random function?
Connor Johnson
Connor Johnson le 26 Août 2020
No, I didn't and no they don't.
Rik
Rik le 26 Août 2020
If you don't use rng to fix the random seed, the change of randperm returning the same shuffle are very small for large sizes of binned_raw, so the input isn't the same if you switch the code from for to parfor.
Connor Johnson
Connor Johnson le 26 Août 2020
But I created a list of random numbers before I started the loop (variable p) and then through each loop I circular shift my data a value equal to p(ii), so how would the random seed be any different in this case than when i used a normal for loop? Since it's outisde the loop.
Dana
Dana le 26 Août 2020
The point is, if you run this script twice even just using the for loop both times (no parfor), you'll likely get different answers. So it has nothing to do with the parfor, it's that the initial random draw will be different every time you run the script (unless you use rng to seed the random number generator).
Connor Johnson
Connor Johnson le 26 Août 2020
I disagree that the issue stems from this, as shuffling the data 1000 times and recalculating should not change the overall trend of the random data. I can run this code over and over again with the normal for loop and get the same results.
I should be able to randomly generate 1000 numbers to shift my data (within range of the actual length of my data) and get similar results no matter what the numbers are, as long as they are not repeating.
However, I could still be wrong and I will run some tests. I feel strongly like this is not the case because I can run the for loop version of my code multiple times and get the same results, and run the parfor multiple times and get the same results. However the for and parfor results are very different.

Connectez-vous pour commenter.

Réponses (1)

Matt J
Matt J le 26 Août 2020
Because randperm returns a random result, the sequence p(ii) will be different in two consecutive runs,e.g.,
>> p = randperm(8,4)
p =
6 4 7 3
>> p = randperm(8,4)
p =
8 7 5 4
That would explain why the parfor and the for-loop versions don't give the same output.

2 commentaires

The code for the comparison should look like this:
p = randperm(size(binned_raw,1),1000); %generate only once
for ii = 1:1000
binned_raw_shift = circshift(binned_raw,p(ii),1);
[AUROC, TPR, FPR] = get_ROC(binned_raw_shift, binned_behavior);
shuffled_rawFOR(ii,:) = AUROC;
end
parfor ii = 1:1000
binned_raw_shift = circshift(binned_raw,p(ii),1);
[AUROC, TPR, FPR] = get_ROC(binned_raw_shift, binned_behavior);
shuffled_rawPARFOR(ii,:) = AUROC;
end
Difference = max(abs(shuffled_rawFOR - shuffled_rawPARFOR),'all')
Connor Johnson
Connor Johnson le 26 Août 2020
I will run this and share, lets see.

Connectez-vous pour commenter.

Catégories

Commenté :

le 26 Août 2020

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by