Parallelizing computation with memory restrictions

2 vues (au cours des 30 derniers jours)
Henry Shackleton
Henry Shackleton le 19 Juin 2019
There's a program that I would like to run in parallel, as I have about a dozen cores available to me. However, I only have 128GB of RAM, which puts some constraints on how I want to parallelize the code.
A is a list of 50 matrices. Each matrix (and all matrices involved) take up about 1GB of memory, which is where the memory constraint comes in. Schematically, I want to execute the code
for i=1:1000
B = longCalculation(i) % This is the step that takes a lot of time
for j=1:50
shorterCalculation(A{j}, B)
end
end
Since longCalculation takes the longest to run, I would like to parallelize that - i.e., convert the first for loop into a parfor loop. However, each worker needs access to all of A, and I can't just make a copy for each worker due to memory constraints. Paralellizing the second for loop, and only giving each worker access to a small part of A, won't speed up the code that much. Any suggestions on changing/modifying this code so that it can be run in parallel? Thanks!

Réponses (1)

Edric Ellis
Edric Ellis le 20 Juin 2019
Ok, this is somewhat dependent on what it is that you need to do with the results, but here's one way that you can avoid replicating A on each worker, by using a combination of spmd and for-drange. The basic idea is:
  1. Partition A so that each worker stores only a piece
  2. Perform longCalculation in batches
  3. Reduce the result using for-drange and then gplus.
%% Step 1: build A, but ensure each worker only gets a portion.
% Use for-drange to achieve that. This presumes that you can build
% pieces of 'A' directly on the workers.
nA = 50;
nLoop = 1000;
spmd
A = cell(1, nA);
for idx = drange(1,nA)
A{idx} = ones(1000) * idx;
end
end
% At this point, each worker has an independent 'A' where only some of the
% cells are filled in.
%% Step 2: perform the calculations in parallel.
spmd
% Allocate the full output cell array.
output = cell(1, nLoop);
% Loop over the full range, stepped by 'numlabs'
for idx = 1:numlabs:(nLoop+numlabs)
% Each worker performs one longCalculation
myIdx = idx + (labindex - 1);
myB = longCalculation(myIdx);
% Next, we need to work with each 'myB', and perform
% shorterComputation. So, loop over 'numlabs', and use
% labBroadcast to give each worker the value of B.
for bIdx = 1:numlabs
% Make sure we don't exceed the loop range
outIdx = (idx + bIdx - 1);
if outIdx > nLoop
break;
end
% Get the value of B to each worker.
B = labBroadcast(bIdx, myB);
% Reduce the result on each worker using shorterCalculation
partialResult = 0;
for aIdx = drange(1, nA)
partialResult = partialResult + shorterCalculation(A{aIdx}, B);
end
% Combine the overall result into 'output'.
output{outIdx} = gplus(partialResult, 1);
end
end
end
x = output{1};
x = [x{:}]
%% Dummy "longCalculation".
function x = longCalculation(x)
pause(0.1);
x = -x;
end
%% Dummy "shorterCalculation".
function x = shorterCalculation(Ai, B)
x = Ai(1) * B;
end

Catégories

En savoir plus sur Graphics Performance dans Help Center et File Exchange

Produits


Version

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by