How to share a HashMap in parallel computing

2 vues (au cours des 30 derniers jours)
Bananach
Bananach le 27 Avr 2016

Setup

I am trying to parallelize an algorithm that runs the same code on each row of a matrix (and then postprocesses the results.)

There are some computations that occur in the processing of multiple rows (this reoccurence is hard to predict).

Therefore, currently I call an object that performs these computations and saves the results in a HashMap, so when processing row $n$ needs computations that were already done for row $m$ they don't need to be done again.

It does not affect the outcome of the algorithm in which order the rows are processed.

Problem

I am not able to use the HashMap in parallel code, each worker ends up with its own HashMap.

PS

I understand the philosophy behind this behavior. Yet in my example, order does not matter and I would like to circumvent the standard behavior.

Minimal working example:

classdef MyPar <handle
    properties
        map;
    end
    methods 
        function obj=MyPar()
            obj.map=containers.Map('KeyType','double','ValueType','any');
        end
        function y=compute(obj,n)
           if ~obj.map.isKey(n)
               obj.map(n)=sin(n);
               fprintf('Did not find key ''%d''\n',n)
           else
               fprintf('Found key ''%d''\n',n)
           end
           y=obj.map(n);
        end
    end
    methods(Static)
        function R=test()
            c=MyPar();
            Nworkers=3;
            A=ones(Nworkers,2);
            spmd(Nworkers)
               R=c.compute(A(labindex,1))+c.compute(A(labindex,2));
            end    
        end
    end
end

Running MyPar.test() gives

>> MyPar.test();
Lab 1: 
  Did not find key '1'
  Found key '1'
Lab 2: 
  Did not find key '1'
  Found key '1'
Lab 3: 
  Did not find key '1'
  Found key '1'

In this trivial example, I would wish to have a code where two of the workers don't need to do their own computations at all (because the only computation ever done is compute(1))

Réponses (1)

Edric Ellis
Edric Ellis le 3 Mai 2016
There is no way to have the map data structures automatically propagate changes, but you could use the communication functions within spmd to explicitly synchronize the known keys and values.
Whether this is actually a practical option depends a lot on the structure of your computations - you need a spot in the spmd block where all the workers agree that it's time to synchronize. If you can do that, then you could use gop to get the job done, perhaps a bit like this:
spmd
map = containers.Map();
for iteration = 1:1000
% Choose key, look up or compute value:
key = num2str(randi(100));
if ~isKey(map, key)
value = sprintf('Value: %s computed on lab: %d', key, labindex); % dummy computation
map(key) = value;
else
value = map(key);
end
%
% synchronize 'map'.
% Step 1: get all the keys:
allKeys = unique(gcat(keys(map)));
%
% Step 2: get values on each worker
allValues = cell(1, numel(allKeys));
gotValue = false(1, numel(allKeys));
for idx = 1:numel(allKeys)
if isKey(map, allKeys{idx})
gotValue(idx) = true;
allValues{idx} = map(allKeys{idx});
end
end
%
% Step 3: combine all known values
globalValues = gcat(allValues, 1);
gotGlobalValue = gcat(gotValue, 1);
%
% Step 4: put values into map
for idx = 1:numel(allKeys)
row = find(gotGlobalValue(:, idx), 1, 'first');
value = globalValues{row, idx};
map(allKeys{idx}) = value;
end
end
end

Catégories

En savoir plus sur Parallel Computing Fundamentals dans Help Center et File Exchange

Produits

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by