distributed

Create and access elements of distributed arrays from client

Description

A distributed array on the client represents an array that is partitioned out among the workers in a parallel pool. You operate on the entire array as a single entity; however, workers operate only on their part of the array and automatically transfer data between themselves when necessary. A distributed array resembles a normal MATLAB^® array in the way you index and manipulate its elements, but none of its elements exist on the client. Codistributed arrays that you create inside spmd statements are accessible as distributed arrays from the client.

Creation

Use the distributed function or use the "distributed" option of array creation functions such as ones or zeros. For a list of array creation functions that create distributed arrays directly on workers, see Alternative Functionality.

Syntax

D = distributed(ds)

D = distributed(X)

D = distributed(C,dim)

D = distributed(tX)

Description

D = distributed(ds) creates a distributed array from a datastore ds. D is a distributed array stored in parts on the workers of the open parallel pool.

To retrieve the distributed array elements from the pool back to an array in the MATLAB workspace, use the gather function.

example

D = distributed(X) creates a distributed array from an array X.

Use this syntax to create a distributed array from local data only if the MATLAB client can store all of X in memory. To create large distributed arrays, use the previous syntax to create a distributed array from a datastore, or the "distributed"option of array creation functions such as ones, zeros, or any other creation functions listed in Alternative Functionality.

If the input argument is already a distributed array, the result is the same as the input.

example

D = distributed(C,dim) creates a distributed array from the Composite object C, with the entries of C concatenated and distributed along the dimension dim. If you omit dim, then the first dimension is the distribution dimension.

All entries of the Composite object must have the same class. Dimensions other than the distribution dimension must be the same.

example

D = distributed(tX) converts the tall array tX into a distributed array distributed along the first dimension. tX must be defined in a parallel environment that can run distributed arrays. (since R2023b)

example

Input Arguments

expand all

`ds` — Datastore
`TabularTextDatastore` object | `ImageDatastore` object | `SpreadsheetDatastore` object | `KeyValueDatastore` object | `FileDatastore` object | `TallDatastore` object | ...

Datastore, specified as one of the following objects.

Object	Type
`TabularTextDatastore` object	Text files
`ImageDatastore` object	Image files
`SpreadsheetDatastore` object	Spreadsheet files
`KeyValueDatastore` object	MAT files as well as sequence files you produce using `mapreduce`
`FileDatastore` object	Custom format files
`TallDatastore` object	MAT-files and sequence files produced by the `write` function of the `tall` data type
`ParquetDatastore` object	Parquet files
`DatabaseDatastore` (Database Toolbox) object	Database

`X` — Array to distribute
array

Array to distribute, specified as an array.

`C` — Composite object to distribute
`Composite` object

Composite object to distribute, specified as a Composite object.

`dim` — Distribution dimension
scalar integer

Distribution dimension, specified as a scalar integer. The distribution dimension specifies the dimension over which you distribute the Composite object.

`tX` — Tall array to convert
tall array

Since R2023b

Tall array to convert to a distributed array, specified as a tall array. The tall array must be defined in a parallel environment that supports distributed arrays.

Output Arguments

expand all

`D` — Distributed array
distributed array

Distributed array stored in parts on the workers of the open parallel pool, returned as a distributed array.

Object Functions

`gather`	Transfer distributed array, `Composite` object, or `gpuArray` object to local workspace
`write`	Write distributed data to an output location

Several MATLAB toolboxes include functions with distributed array support. For a list of functions in all MathWorks^® products that support distributed arrays, see All Functions List (Distributed Arrays).

Several object functions enable you to examine the characteristics of a distributed array. Most behave like the MATLAB functions of the same name.

`isdistributed`	True for distributed array
`isreal`	Determine whether array uses complex storage
`isUnderlyingType`	Determine whether input has specified underlying data type
`length`	Length of largest array dimension
`ndims`	Number of array dimensions
`size`	Array size
`underlyingType`	Type of underlying data determining array behavior

Examples

collapse all

Create Distributed Arrays from Datastores

This example shows how to create and load distributed arrays using datastore.

First, create a datastore using an example data set. This data set is too small to show equal partitioning of the data over the workers. To simulate a large data set, artificially increase the size of the datastore using repmat.

files = repmat("airlinesmall.csv",10,1);
ds = tabularTextDatastore(files);

Select the example variables.

ds.SelectedVariableNames = ["DepTime", "DepDelay"];
ds.TreatAsMissing = "NA";

Create a distributed table by reading the datastore in parallel. Partition the datastore with one partition per worker. Each worker then reads all data from the corresponding partition. The files must be in a shared location accessible from the workers.

dt = distributed(ds);

Starting parallel pool (parpool) using the 'Processes' profile ... connected to 4 workers.

Finally, display summary information about the distributed table.

summary(dt)

Variables:

    DepTime: 1,235,230×1 double
        Values:

            min          1
            max       2505
            NaNs    23,510

    DepDelay: 1,235,230×1 double
        Values:

            min      -1036
            max       1438
            NaNs    23,510

Create Distributed Arrays

Open Live Script

This example shows how to create and retrieve distributed arrays.

Create a small array and convert it into a distributed array.

Nsmall = 50;
D1 = distributed(magic(Nsmall));

Starting parallel pool (parpool) using the 'Processes' profile ...
Connected to parallel pool with 4 workers.

Create a large distributed array directly on the workers by using a build function.

Nlarge = 1000;
D2 = rand(Nlarge,"distributed");

Retrieve elements of a distributed array back to the local workspace. You can use whos to determine the location of the data in the workspace by examining the Class variable.

D3 = gather(D2);
whos

  Name           Size                Bytes  Class          Attributes

  D1            50x50                20000  distributed              
  D2          1000x1000            8000000  distributed              
  D3          1000x1000            8000000  double                   
  Nlarge         1x1                     8  double                   
  Nsmall         1x1                     8  double

Create Distributed Array from Composite Object

Open Live Script

Start a parallel pool of workers and create a Composite object by using spmd.

p = parpool("Processes",4);

Starting parallel pool (parpool) using the 'Processes' profile ...
Connected to parallel pool with 4 workers.

spmd
    C = rand(3,spmdIndex-1);
end
C

 
C =
 
   Worker 1: class = double, size = [3  0]
   Worker 2: class = double, size = [3  1]
   Worker 3: class = double, size = [3  2]
   Worker 4: class = double, size = [3  3]

To create a distributed array from the Composite object, use the distributed function. For this example, distribute the entries along the second dimension.

d = distributed(C,2)

d =

    0.6383    0.9730    0.2934    0.3241    0.9401    0.1897
    0.5195    0.7104    0.1558    0.0078    0.3231    0.3685
    0.1398    0.3614    0.3421    0.9383    0.3569    0.5250

Examine how the distribution of the data on the workers.

spmd
    d
end

Worker 1: 
  This worker does not store any elements of d.
Worker 2: 
  This worker stores d(:,1).
          LocalPart: [3x1 double]
      Codistributor: [1x1 codistributor1d]
Worker 3: 
  This worker stores d(:,2:3).
          LocalPart: [3x2 double]
      Codistributor: [1x1 codistributor1d]
Worker 4: 
  This worker stores d(:,4:6).
          LocalPart: [3x3 double]
      Codistributor: [1x1 codistributor1d]

When you are finished with the computations, delete the parallel pool.

delete(p);

Parallel pool using the 'Processes' profile is shutting down.

Convert Tall Arrays to Distributed Arrays

Since R2023b

Open Live Script

This example shows how to convert a tall array into a distributed array.

Create a tall table using an example data set. If you have Parallel Computing Toolbox™ installed, when you use the tall function, MATLAB automatically starts a parallel pool of workers unless you turn off the default parallel pool preference. The default cluster uses local process workers on your machine.

size = 2000000;
tt = tall(table((1:size)',randn(size,1),randn(size,1),randn(size,1), ...
    'VariableNames',["Exp","Rep1","Rep2","Rep3"]))

Starting parallel pool (parpool) using the 'Processes' profile ...
Connected to parallel pool with 4 workers.

tt =

  2,000,000×4 tall table

    Exp      Rep1        Rep2         Rep3  
    ___    ________    _________    ________

     1      -1.6003      -1.2403     -1.6124
     2      0.76827      0.33907     0.77811
     3       1.4637      0.74255    -0.91635
     4      -1.2478      0.10478    -0.10097
     5        0.619      -1.2974     0.93445
     6       1.8375       0.2142     -1.1036
     7     -0.44354      -1.1438    -0.89011
     8      -1.3351    -0.036768     0.65196
     :        :            :           :
     :        :            :           :

Convert the tall table into a distributed table. MATLAB partitions the data in the tall table along the first dimension and distributes it to the workers.

dt = distributed(tt);

Display summary information about the distributed table.

summary(dt)

Variables:

    Exp: 2,000,000×1 double
        Values:

            Min         1  
            Max     2e+06  

    Rep1: 2,000,000×1 double
        Values:

            Min    -5.1402 
            Max     4.8763 

    Rep2: 2,000,000×1 double
        Values:

            Min    -4.7961 
            Max     4.9875 

    Rep3: 2,000,000×1 double
        Values:

            Min    -4.8369 
            Max     5.1454

Finally, examine how much data is stored on each worker. The data is partitioned evenly over the workers.

spmd
    dt
end

Worker 1: 
  
  This worker stores dt(1:500000,:).
  
          LocalPart: [500000x4 table]
      Codistributor: [1x1 codistributor1d]
  
Worker 2: 
  
  This worker stores dt(500001:1000000,:).
  
          LocalPart: [500000x4 table]
      Codistributor: [1x1 codistributor1d]
  
Worker 3: 
  
  This worker stores dt(1000001:1500000,:).
  
          LocalPart: [500000x4 table]
      Codistributor: [1x1 codistributor1d]
  
Worker 4: 
  
  This worker stores dt(1500001:2000000,:).
  
          LocalPart: [500000x4 table]
      Codistributor: [1x1 codistributor1d]

Tips

A distributed array is created on the workers of the existing parallel pool. If no pool exists, distributed starts a new parallel pool unless the automatic starting of pools is disabled in your parallel settings. If there is no parallel pool and distributed cannot start one, MATLAB returns the result as a nondistributed array in the client workspace.

Alternative Functionality

This table lists the available MATLAB functions that create distributed arrays directly on the workers. For more information, see the Extended Capabilities section of the function reference page.

`eye`	`distributed.cell`
`false`	`distributed.colon`
`Inf`	`distributed.linspace`
`NaN`	`distributed.logspace`
`ones`	`distributed.spalloc`
`true`	`distributed.speye`
`zeros`	`distributed.sprand`
`rand`	`distributed.sprandn`
`randi`
`randn`

Version History

Introduced in R2008a

expand all

R2023b: Convert tall arrays to distributed arrays

Use the distributed function to convert a tall array to a distributed array and access MATLAB functions that have distributed array support.

In previous releases, when you used a tall array with the distributed function, MATLAB threw an error.

distributed

Description

Creation

Syntax

Description

Input Arguments

`ds` — Datastore
`TabularTextDatastore` object | `ImageDatastore` object | `SpreadsheetDatastore` object | `KeyValueDatastore` object | `FileDatastore` object | `TallDatastore` object | ...

`X` — Array to distribute
array

`C` — Composite object to distribute
`Composite` object

`dim` — Distribution dimension
scalar integer

`tX` — Tall array to convert
tall array

Output Arguments

`D` — Distributed array
distributed array

Object Functions

Examples

Create Distributed Arrays from Datastores

Create Distributed Arrays

Create Distributed Array from Composite Object

Convert Tall Arrays to Distributed Arrays

Tips

Alternative Functionality

Version History

R2023b: Convert tall arrays to distributed arrays

See Also

Topics

distributed

Description

Creation

Syntax

Description

Input Arguments

ds — Datastore TabularTextDatastore object | ImageDatastore object | SpreadsheetDatastore object | KeyValueDatastore object | FileDatastore object | TallDatastore object | ...

X — Array to distribute array

C — Composite object to distribute Composite object

dim — Distribution dimension scalar integer

tX — Tall array to convert tall array

Output Arguments

D — Distributed array distributed array

Object Functions

Examples

Create Distributed Arrays from Datastores

Create Distributed Arrays

Create Distributed Array from Composite Object

Convert Tall Arrays to Distributed Arrays

Tips

Alternative Functionality

Version History

R2023b: Convert tall arrays to distributed arrays

See Also

Topics

`ds` — Datastore
`TabularTextDatastore` object | `ImageDatastore` object | `SpreadsheetDatastore` object | `KeyValueDatastore` object | `FileDatastore` object | `TallDatastore` object | ...

`X` — Array to distribute
array

`C` — Composite object to distribute
`Composite` object

`dim` — Distribution dimension
scalar integer

`tX` — Tall array to convert
tall array

`D` — Distributed array
distributed array