improve and speed up parfor loop

Hello,
I have a code that has a 10000 iteration. The code involves a Monte Carlo simulation using Normal distributions. Number of simulation is 4,000,000. I tried to use parfor to speed up the code. However, when I compare its time to for loop is almost the same.
Is there a way to speed up the code so it works with parfor loop?
Thanks,
Here is my code
clc;
clear;
close all;
...
pool = parpool('local', str2num(getenv('SLURM_TASKS_PER_NODE')));
...
A=readmatrix("x.csv");
runs = 4000000;
results=zeros(10000,1);
meanG=constant;
sdG=constant;
parfor j=1:x
mean=A(j,1); %
sd=A(j,2);
guss=A(j,3); %
for n=1:0.5:40
B=normrnd(mean,sd,[1,runs]);
F=equation
G=normrnd(F*meanG,F*sdG,[1,runs]);
%Other calculation to calculate C
if C>10
d=equation;
break
end
end
record(j)=d;
end

1 commentaire

darova
darova le 17 Avr 2020
Maybe if you can show something more and exaplain what this code does someone can help you

Connectez-vous pour commenter.

Réponses (2)

Matt J
Matt J le 17 Avr 2020
Modifié(e) : Matt J le 17 Avr 2020

0 votes

We can't see all the operations in your loop, but the ones we can see are pretty basic ones. Operations as common and basic as those would probably be coded already to utilize a multicore CPU very efficiently, so there probably isn't much room for improvement with parfor. To get a clearer idea how much improvement is possible, though, we would need to see screen shots of your CPU usage and the usage of all its cores (e.g., from the Task Manager, if you are on a Windows OS).
Some of the randomization steps you are doing though look like they could be hoisted out of the loop, e.g.,
B=normrnd(mean,sd,[81,runs]);
for n=1:0.5:40
F=equation
...
end

9 commentaires

Salam Al-Rubaye
Salam Al-Rubaye le 18 Avr 2020
Thanks, That helps alot. i am using high performance computing cluster. I am requesting 20 cpu and I can assing any memory for it. I thought the parfor will help when I do that by factor of 20. but it did not.
Matt J
Matt J le 18 Avr 2020
Modifié(e) : Matt J le 18 Avr 2020
We need to see what percentage of CPU usage occurs when the ordinary for-loop is running, and what percentage is used on each of the 20 cluster CPUs when parfor is being used.
Salam Al-Rubaye
Salam Al-Rubaye le 18 Avr 2020
According to Cluster, it was 99 % for both parfor and for loop. i am not sure what is the problem.
Matt J
Matt J le 18 Avr 2020
Do you share the cluster? Does the 99% usage represent your jobs, or other peoples' as well?
Salam Al-Rubaye
Salam Al-Rubaye le 18 Avr 2020
Yes, it is only represent the 20 CPU that I have requested.
Matt J
Matt J le 18 Avr 2020
Modifié(e) : Matt J le 18 Avr 2020
But if other users are using the same CPUs then, you might be using only 10% of the 99%.
I do not think so. I am submitting the Job as batch and I request the amount that I need. These tasks I request should not be used by someone else.
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=20
#SBATCH --time=24:00:00
#SBATCH --mem-per-cpu=10GB
#SBATCH --job-name=invertRandArray
#SBATCH --error=parallel.%J.err
#SBATCH --output=parallel.%J.out
Matt J
Matt J le 21 Avr 2020
Modifié(e) : Matt J le 21 Avr 2020
I don't know bash very well, but the nodes=1 suggests to me that you are not running on multiple CPUs. Or, if you are, your for-loop has access to them as well, just as if you were running on a single 20-core CPU. If this is the case, then once again your for loop and your parfor loop have access to the exact same computing hardware, and there is no guarantee that you will get significant speed-up.
It might tell us more if you show us the output of,
>> gcp
Matt J
Matt J le 21 Avr 2020
It might tell us more if you show us the output of,
Never mind this part. Raymond has pointed out that your workers are obviously non-remote.

Connectez-vous pour commenter.

Raymond Norris
Raymond Norris le 21 Avr 2020

0 votes

It's possible that your code is already making use of mulitple cores (i.e linear algebra); therefore, running local Workers may just offset this. Try running MATLAB in single thread mode (-singleCompThread) and then benchmark your code again.
You might consider posting a bit more of you code to provide more guidance for your parfor.
  1. As it's written, A is not a sliced input, it's a broadcast variable, which could impact performance.
  2. Is record(j) supposed to be results(j)?
  3. For a particular iteration of j, what happens if C is never greater than 10 (and d does not get defined)?
  4. Again, without all of the code, it's hard to make the following recommendation, but I would consider refactoring your code as such:
parfor j = 1:x
results(j) = unit_of_work(A,runs,j);
end
function d = unit_of_work(A,runs,j)
mean=A(j,1); %
sd=A(j,2);
guss=A(j,3); %
for n=1:0.5:40
B=normrnd(mean,sd,[1,runs]);
F=equation
G=normrnd(F*meanG,F*sdG,[1,runs]);
%Other calculation to calculate C
if C>10
d=equation;
break
end
end
end

4 commentaires

Matt J
Matt J le 21 Avr 2020
Modifié(e) : Matt J le 21 Avr 2020
It's possible that your code is already making use of mulitple cores (i.e linear algebra); therefore, running local Workers may just offset this.
No, the OP has said that he is running on a cluster.
Raymond Norris
Raymond Norris le 21 Avr 2020
I thought he's running MATLAB on the cluster. You can still run local workers on a remote cluster. Local workers is "local" to where you're running the MATLAB client, not necessarily your desktop.
Matt J
Matt J le 21 Avr 2020
I see, but I think the OPs intention is to have non-local workers.
Doesn't appear that way. Notice the reference to local here:
pool = parpool('local', str2num(getenv('SLURM_TASKS_PER_NODE')));

Connectez-vous pour commenter.

Catégories

En savoir plus sur Third-Party Cluster Configuration dans Centre d'aide et File Exchange

Commenté :

le 21 Avr 2020

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by